AI Alignment: Ensuring the Aurora Driver is Safe and Human-Like
Engineering | From our leaders | July 26, 2024 | 5 min. read
By Drew Bagnell
Aurora believes the future of self-driving depends on the expansive use of the power of AI combined with rigorous validation and verification. At a time when researchers, industry, and the broader public are generating more questions and curiosity about the potential of AI, we’re taking you under the hood of Aurora’s approach in a series of blog posts.
Previously, Chris unpacked the concept and application of Verifiable AI, while I weighed in on how we build transparent AI systems. Today, I'll talk through how we ensure alignment in AI.
Future of AI in Self-Driving – Part II Alignment
Our approach to ensure the Aurora Driver is safe and human-like
The Aurora Driver is built to be aligned with driving goals. “Alignment” is the buzzword du jour to mean “the AI does what we want it to do”. There are a variety of approaches to ensuring the Aurora Driver delivers the behavior we desire. To achieve the best alignment, Aurora has adopted a Proposer-Ranker architecture for behavior generation1.
In such a system, a broad set of possible behaviors are “proposed” – think of these crudely as ways in which the driver could behave. Each of these proposals then are “costed” or “ranked”; the highest scoring proposal is selected and the Aurora Driver begins following that strategy. After collecting new information from its sensors about the world, the Aurora Driver again proposes and ranks a set of options, repeating this process many times per second.
There is an analogy between recent work on AI language models and the decision making architecture of the Aurora Driver. We can view the proposals of the Aurora Driver as the analog of the “tokens” (or more simply words) a language model produces at each iteration of language model inference. Similarly, the environment, including previous decisions, histories of other actors, and the surrounding world form the context, or what is sometimes called a prompt in language models, that informs the AI what proposal/token to choose next.
This compound approach has a number of benefits. First, it enables “correctness by construction” — partial alignment during proposal generation. It’s important to consider a breadth of options for the Aurora Driver. However, it's possible to only consider options for the Aurora Driver that meet certain constraints.
For instance, there’s no benefit to proposing potential trajectories that aren’t dynamically feasible – that is, that can’t actually be driven by a truck with appropriate steering, throttle, and brakes. This can be seen as similar to only considering responses to an English language query that are grammatically correct. Simple requirements are enforced by construction of the proposals themselves.
Because enforcing complex constraints with a pure “generative” approach has proven difficult in AI, we’ve devised an innovative approach that goes beyond what traditional uses of generative AI might do. Consider the difficulty generative AI diffusion models have ensuring that images of people have the expected number of fingers per hand or matching eye color in synthesized portraits. To avoid such mistakes that can have more far-reaching consequences for embodied AI, we enforce more complex requirements in the “ranking” part of the architecture: for example, the Aurora Driver doesn’t make decisions to avoid an anticipated collision by introducing a new one.
Our approach of proposing, ranking and selecting from alternatives – combining both Generative and Discriminative AI – has recently become increasingly common in other complex problems, like coding and proofs. This ranking – or reward learning as it’s often known – admits multiple tools for closely aligning AI behavior with the requirements of good driving.
Perhaps most straightforwardly, the ranking architecture simplifies encoding some “guardrails” or “invariants” on driving behavior2. Many of these exist to prescribe behavior in very rare scenarios where data-driven AI is weakest, like “wrong-way” traffic on the interstate or managing red-light runners. Expert demonstration might be unavailable or even inaccurate where expert drivers make mistakes because they have never encountered such a situation before. The connection with semantically meaningful internal predictions in the system is crucial: we can only write an invariant like “don’t depart the roadway” if there is a meaningful notion of “roadway”! It’s slow, expensive, and error-prone to learn what can simply be designed into the AI.
Alignment to a Human Style of Driving
However, much of driving is quite subtle. It is important for other road users that the Aurora Driver be as human-like and predictable as possible, and not “robotically” driven by inflexible rules. As such, the Aurora Driver is primarily aligned by data-driven techniques.
Humans who are well trained, focused and attentive are generally quite good drivers. (Which, tragically, is seemingly a decreasing fraction of the time…) We thus use expert human driving data to learn to produce the most probable “tokens” or proposals. Technically, what is learned from expert human driving data is to assign probabilities to each possible proposal, or equivalently to assign a reward3 (a measure of goodness) or cost (a measure of badness) to each given the context. Probabilities and rewards can be viewed as equivalent– different sides of the same coin– under the Maximum Entropy Model of decision making, which forms the foundation of many AI decision-making problems including both language models and the Aurora Driver’s decisions. This problem of learning rewards to maximize probability has a long and interesting history and is sometimes called “Inverse Reinforcement Learning” or “Inverse Optimal Control”.
In a nutshell, the learning goal is to craft a function that takes the context (“prompt”) of the scene and a trajectory and produces a reward for each trajectory that makes human demonstrated driving most probable. When the Aurora Driver’s ranking model chooses decisions, it prefers these higher reward or more probable ones. The core techniques we are applying were pioneered in Maximum Entropy Inverse RL and Learning to Search (LEARCH) that are used now throughout AI alignment.
An image from https://www.ri.cmu.edu/pub_files/2009/7/learch.pdf which developed cost function learning from expert demonstration. It proved out the approach on extensive real-world robotics problems, from outdoor autonomous navigation to learning manipulation strategies.
Our annotation system enables us to provide labels on virtual tests efficiently when human expert demonstration is unavailable or insufficiently informative. This enables us to provide extensive advice on situations that may be rare enough human drivers might not have encountered them. The first approach to learning a reward function based on preferences (choice “A“ should be preferred to choice “B”) and making plans to optimize it was developed to help AIs choose intelligent trajectories over terrain. This technique is often now called “Reinforcement Learning from Human Feedback” (RLHF), and it’s been absolutely instrumental for taking large language models from producing 4chan inspired gibberish to providing (generally) thoughtful answers to our questions.
This image, courtesy of Optimization and Learning for Rough-Terrain Legged Locomotion, demonstrates the first use of detailed preference labels (prefer footstep “choice a” to footstep “choice b”) to learn a planner; in this instance, for AI robot locomotion. This technique of preference learning from expert humans and optimization of those preferences is now called “Reinforcement Learning from Human Feedback” and is used with the Aurora Driver to align its preferences with our requirements.
In summary, alignment is a critical component of how Aurora approaches AI for self-driving. Next up, we’ll conclude by sharing a bit about the bigger picture of AI in self-driving and how the Aurora AI is faring on today’s roadways.
Until next time,
-Drew Bagnell
To learn more about Aurora's approach to Verifiable AI, check out the rest of our series:
- Aurora's Verifiable AI Approach to Self-Driving
- AI Transparency: The Why and the How
- The Future of AI in Self-Driving: Facilitating and Benefitting from Scale