«  View All Posts

AI Transparency: The Why and How

Engineering | From our leaders | July 17, 2024 | 7 min. read

By Drew Bagnell

Increasingly, researchers, industry, and the public are taking AI seriously as the heart of self-driving. It's been gratifying to see Aurora's pioneering approach to validated AI in self-driving become popular across industry and research. When we founded Aurora, we knew that the future of self-driving would depend on expansive use of the power of AI combined with rigorous validation and verification.

For Aurora, it is critical to understand the differences between AI and AI embodied in the physical world – a world that implies interaction with others and feedback. We learned to appreciate the importance of data-quality, simulation, and a comprehensive strategy to verify the correctness of the complete system. Most importantly, years of work in AI have emphasized to us the importance of execution: the hard part isn’t a tweak to a model architecture here or there, it’s the foundational engineering of robust systems design, data, and pipelines to deliver the benefits of self-driving.

We see research groups appreciating the difference between building AI systems that have to make decisions and affect the world and more traditional “supervised” applications. They are beginning to go beyond naive behavioral cloning and adopting the benefits of interactive imitation learning, inverse reinforcement learning and including core techniques like MaxEntIRL, DAGGER training, and preference learning/RLHF (Reinforcement Learning from Human Feedback). The generative AI approaches to self-driving that we pioneered using Graph Neural Networks and attention/transformer style architectures applied to learning behavior are becoming more prevalent.

In short, research groups are adopting the approach that Aurora has publicly discussed using on the road for years.

Given the increasing curiosity and questions in this space, we’ll be taking you under the hood of the approach Aurora takes to verifiable AI that Chris outlined in a previous post. I’ll start that series of posts today by discussing transparency in AI.

The Future of AI in Self-Driving – Part I Transparency

Building transparent systems enable the Aurora Driver to be efficiently verified

It’s become en vogue (again) to talk of self-driving vehicles with an “end-to-end” AI solution. For at least 35 years, researchers have explored naive end-to-end1 self-driving from the (rightly celebrated) work of Dean Pomerleau, to the DAVE work on off-road robotics, to more recent work on self-driving vehicles.2

Aurora has built, fielded, and extensively tested fully end-to-end, undifferentiated internal approaches to driving to understand the advantages and the challenges. We learned many things along the way and some informed the structure of the Aurora Driver. Perhaps most importantly, these works clarify the importance of transparency in AI.

A simple example may help clarify. Imagine a problem is discovered with a self-driving system where it incorrectly fails to proceed at a green light. How should we approach fixing an important problem?

In a naive end-to-end view, where a learned model with undifferentiated internals is trained to output vehicle platform commands, the only recourse is to attempt to find more training data, retrain, and hope to gain confidence that the problem is resolved.

We take a different approach. We want to be able to introspect what went wrong and dig many steps deeper to identify the root of the problem:

  1. Did the Aurora Driver not detect the traffic signal at all?
  2. Did the AI misread the state of the light?
  3. Did the AI mispredict actions of other drivers, for instance, thinking an approaching car might be running its own red light and endangering the Aurora Driver?
  4. Was the behavior engine wrong in its decisions?
  5. (Or one of many more detailed possibilities)

Drew Blog Series - Traffic Light

To make AI more verifiable, we make it more transparent. That is, there are internal “predictions” of the system – e.g., “What was the state of the light? What were other actors likely to do?” We can check these internal states and root cause a failure, then correct that failure and validate that correction.

These predictions3 serve as the lingua franca of driving: semantically meaningful statements about the world or its future evolution that can be verified and, if necessary, improved and lead to correct driving behavior. (You can read more about predictions as internal states here.)

Transparent internal prediction installs a powerful inductive bias that enables the Aurora Driver to be efficiently verified. Inductive biases, like the locality structure of convolutional neural networks or the attention mechanism of transformers, are unnecessary in an “infinite data and compute limit.” In principle, any complete learner, like a multi-layer perceptron, should be able to learn any concept. However, these kinds of structures have proved absolutely essential to fielding nearly all useful learning-based systems. While the structure of our AI is quite general, the internal predictions are built to support the (relatively) narrow AI task of driving extremely well rather than an artificial general intelligence. This approach respects the core pillars of the Aurora way of building valuable embodied AI products safely, quickly, and broadly:

Verifying the Aurora Driver requires exponentially smaller quantities of data as it doesn’t require combinatorially growing complexity to check each internal prediction. For example, when attempting to verify that our light state AI model is able to correctly identify the traffic light color, we don’t need to consider all possible combinations of other vehicle behaviors in the scene. When an error is detected, like not proceeding at a particular green light, we can identify the incorrect predictions, ensure that failure mode is covered (and identify what might have led to such a test escape more broadly as part of our Safety Case Framework), gather the appropriate train and test data, and provide a rigorous analysis that the problem is resolved. This is what our partners, regulators, and public should rightly demand.

The approach of building with clear internal predictions allows us to move faster.  A compound AI4 enables work on development in pods that are effective, because they have clear targets and the freedom to choose the AI techniques that most expediently improves those targets. Our learning-focused approach means we use data to ensure that there aren’t regressions and that we also address newly discovered issues. The approach requires exponentially fewer samples to train those models effectively. By not standing on an artificial principle of building some intellectually “pure”, single undifferentiated model nor hamstringing the Aurora Driver with a single sensing modality, we are delivering a commercially meaningful self-driving product.

Since predictions are key, training and testing those predictions are at the heart of what we do. We need to both figure out what predictions to make and to ensure the quality of those predictions is extremely high. Data quantity matters— we see that in recent large language models; pre-training with more data leads to better results.

But data quality matters even more. We see this in the importance and growth of supervised fine tuning data and preference data in building foundation models. Consistently, Aurora has seen that the higher the quality of data provided, the better the performance of the Aurora Driver. More often than not, the biggest step changes in the Aurora Driver have come not from more sophisticated learning techniques but rather from better-quality data. That applies to everything from high-quality annotation of actors in the environment to high-quality demonstration of what behavior is preferred in a given situation.

Because no vendor can provide the quality of data we need, or take advantage of our advanced AI to enable very high speed/throughput annotations and data quality assurance, we built our own from scratch.

The result is what we believe is the most sophisticated data annotation system in the world. This system uses a powerful offline and acausal version of our autonomy system to pre-annotate and aid human expert QA in everything from what the vehicle should see to what it should do in challenging circumstances on the road. The engineering processes and tooling from coverage analysis, to foundation models that surface interesting events, to automated consistency checks that flag potential errors are the hard work that ensures the quality of our data. The resulting set of tests and training data form the core of our data-driven approach to self-driving. That data quality gives us improvements every day leading to our commercial launch and provides us the confidence that the data ensures the Aurora Driver is really doing what we require.

We also recognized that many events are hard to capture in the real world. Over the years, we invested in our state of the art simulation engine to be able to explore and train our system on very rare and dangerous cases. That simulation engine provides training and validation data for everything from dangerous road conditions and weather to ensuring the Aurora Driver doesn’t regress on the mundane merges and lane changes that make up everyday driving. Our simulation approach integrates neural rendering, advanced light transport simulation (used so effectively by Pixar), and procedural generation to provide the most capable and realistic engine required to test and train our AI.

 

Our state-of-the-art simulation engine for testing for rare and dangerous cases analyzed accident reports from 2018-2022 involving tractor-trailers to showcase the Aurora Driver's expected performance on the Dallas to Houston lane. We simulated those collisions and determined that none of these fatal collisions would have occurred had the Aurora Driver been driving

By building transparent systems, we enable the Aurora Driver to be efficiently verified. This is an important component to Aurora’s approach, but it is only one component. Next up, I’ll describe our approach to alignment. See you soon!

-Drew Bagnell

To learn more about Aurora's approach to verifiable AI, check out the rest of our series:

 


 
1 Some authors use end-to-end merely to indicate that, in principle, gradient signals are available through the entire system even if the system is trained with distinct modules and tasks.
 
2 I’ve even published on it a bit myself: https://www.ri.cmu.edu/pub_files/1999/4/hksrc1999.pdf
 
3 A key part of the driving domain is that these meaningful signals (e.g. traffic signal light state or speeds of other actors in the world) exist — and are often codified in law and "rules of the road". Whether predictions in the AI are explicit (e.g. the actual state of a traffic signal) or implicit embeddings that capture the same, is an important implementation detail. It's critical that these cuts in the system have to be at the right joints and be semantically meaningful. Just as a person might be able to answer "I thought the light was red", but not provide real insight into "why", we cannot expect to introspect arbitrarily deeply into statistical AI models. (Although obtaining deeper insight is an interesting research effort: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html)
 
4 What is oft unappreciated is that the applications we all see as ChatGPT or Gemini are not just undifferentiated end-to-end neural networks. Under the covers, an immense amount of engineering (e.g., making behind the scenes requests to a web search engine, or using Python or some other co-processing algorithms) enables them to work. Furthermore, for the most sophisticated user-facing AI applications, what appears as a single interface is a federation of AIs under the hood, each optimized for specific types of queries.  We see this approach advancing now in many domains (retrieval augmented generation, LLMs with access to “tools”) due to higher performance on specific tasks, greater compute efficiency, higher reliability, greater transparency, reliable validation, and more rapid development as subcomponents can be iterated upon.
 
 

Drew Bagnell

Aurora co-founder and Chief Scientist