No Handoffs, No Fumbles: How Plus’s End-to-End Driving Model Rethinks Autonomy
By Dr. Anurag Ganguli, VP of R&D at PlusAI ▪ Inderjot Saggu, Staff Research Engineer at PlusAI
In Olympic relays, passing the baton is always a moment of high anxiety. The runners may be fast and highly trained, but a fumbled exchange can unravel the race in an instant.
In most self-driving systems, that same tension exists at every moment. The original approach to autonomous vehicle software – often called AV1.0 – breaks the driving task into a chain of AI modules: one to perceive the vehicle’s surroundings, another to predict how every object in the vicinity is likely to behave, and a third to plan the vehicle’s best course of action. Each might perform brilliantly on its own. But connecting them? That’s where errors creep in. Misjudged inputs cascade downstream, uncertainties compound, and the more complex the driving scenario, the more brittle the system becomes.
PlusAI’s SuperDrive™ virtual driver, built on AV2.0 architecture, avoids that fragility by taking a fundamentally different approach. At its core is a transformer-based end-to-end model called Reflex, which learns perception, prediction and planning together. It’s like replacing three separate runners with one highly trained athlete who effortlessly runs the full course.
The result? More robust performance, no cascading errors, and a system that adapts more easily across vehicle platforms and geographies. In this article, we’ll explain why modular AV1.0 stacks are ultimately a dead end, and how PlusAI’s AV2.0 end-to-end solution is combining cutting-edge AI with highway-grade safety.
Where AV1.0 breaks down
In theory, dividing the driving task into separate stages creates clear boundaries and manageable complexity. In practice, those boundaries are friction points. A small uncertainty early in the stack – a pedestrian, half-seen behind a parked car, or a misclassified object – can lead to flawed predictions and sub-optimal or even unsafe vehicle trajectories.
In ideal conditions, such as predictable traffic behaviour and clear weather, that might suffice. But on real highways, where danger lives in the unexpected, those seams become liabilities.
And not every risk is easy to label. In AV1.0 systems, the perception module processes raw sensor data, such as camera images or lidar point clouds, and outputs simplified interpretations: bounding boxes around objects, lane lines, traffic light states, and so on. These abstractions are then passed to the next module in the chain, which tries to predict what those labelled things will do. If perception doesn’t label it, prediction and planning may never see it or react too late.
Reflex: One model, start to finish
Rather than handing off between separate modules, Reflex treats driving as a single, integrated task, from sensing the world to planning safe, efficient movement through it. The model takes in raw sensor data – from cameras, lidar, radar, GPS and vehicle feedback – and learns to interpret and act on that information in real time.
It’s trained to imitate expert human drivers, using more than 6 million miles of real-world data gathered from PlusAI’s trucking fleet. These sensor-rich recordings capture not just what the road looked like, but how professional drivers responded in each situation, providing high-quality behavioural demonstrations across a vast range of conditions.
Transformers are especially well-suited to this kind of learning. Unlike older model types that process information in narrow chunks, transformers attend to the full context of a scene – near and far, fast and slow – and draw connections across space and time. That global awareness is critical when navigating dynamic, unpredictable environments such as public roads.
Once trained, Reflex takes raw sensor inputs and generates a range of possible future trajectories, each reflecting a viable path through the environment.
Still, before any action is taken, these options are checked by PlusAI’s Guardrails system, an independent safety layer that filters out unsafe plans. (For more on how these systems work together, see here.)
Transparency built in
End-to-end models often raise questions over explainability. If everything happens inside one model, how do you know what the system is paying attention to, or why it made a particular decision? And in a safety-critical domain like autonomous driving, understanding the model’s reasoning is essential not only for validation but also for building trust.
PlusAI tackles this directly. While the current model is trained end-to-end, it is also designed to be interpretable. For example, the model produces auxiliary outputs alongside its driving decisions. These include perception elements such lane markings, traffic lights, and object detections, showing that the model understands these fundamental concepts internally.
Second, because the system is transformer-based, it supports attention visualisation, which means PlusAI engineers can inspect what parts of the scene the model is focusing on at different moments – whether that’s a merging vehicle, a nearby cyclist, or a stop sign half occluded by another truck. These “attention maps” offer a valuable window into the model’s reasoning.
We consider this a win-win: a system that combines holistic learning with the transparency that engineers and safety teams need.
Smarter training for the unexpected
Training a powerful end-to-end driving model isn’t just about having lots of data, but about having the right data. And on any given day, roads can offer unique and unusual scenarios that aren’t well represented in typical training data. These scenarios, often safety-critical, are known as “edge cases”. This is where PlusAI’s Data Engine comes in.
This offline system helps identify and retrieve edge-case training data at scale. It uses large foundational models – including vision-language models (VLMs) and large language models – to search through PlusAI’s extensive fleet dataset and surface specific scenarios of interest, such as “a person stepping out from a parked car” or “debris in the road.”
By pulling out the right examples from real-world driving data across multiple continents, the Data Engine helps strengthen the model’s ability to handle rare or challenging situations. It’s a key part of how PlusAI ensures its system doesn’t just perform well in the common cases – but keeps learning from the edge cases that matter most for safety.
When real-world data isn’t enough, the team can also use simulation tools to generate synthetic data that provides realistic edge cases that improve model robustness and expand coverage beyond what real-world data alone can offer.
Returning to our metaphor, it’s how the athlete trains smarter, not harder.
Reasoning in real time
While the Data Engine works offline to improve the system’s learning over time, PlusAI also deploys a complementary intelligence layer that operates in real time. This is the Reasoning module, a VLM that runs alongside the end-to-end driver to provide high-level guidance on the fly.
Construction zones, improvised signage, and non-standard human behaviour can all challenge even the best-trained model. The Reasoning module doesn’t drive the vehicle, but it can flag anomalies, suggest caution, and make sense of novel scenarios much like a supervising human would.
We explored Reasoning in more depth in this companion article, but its role here is simple: to act as an advisor, adding an extra layer of judgement in unusual conditions.
Why Reflex works everywhere
One of the standout advantages of PlusAI’s end-to-end architecture is its ability to generalise across geographies, road types, and vehicle platforms. Because Reflex learns directly from raw sensor data paired with expert human driving behaviour – rather than relying on hard-coded rules – it captures the deeper structure of driving itself. This makes it easier to adapt to new environments, even when signage, road markings, or driving conventions differ.
As more data is gathered in each new domain, the model improves without needing to be re-architected. This flexibility is built in by design, from the transformer architecture that attends to the full scene context to the model calibration inputs that account for different truck setups.
The result is a system that doesn’t just perform well where it was trained – it learns faster, travels better, and scales more easily.
The latest iteration of PlusAI’s end-to-end model is currently undergoing advanced testing and validation, in a range of simulated and real-world driving conditions.
No handoffs, just driving
For too long, autonomy has relied on baton-passing, but the world’s roads demand awareness, consistency, and adaptability from start to finish.
PlusAI’s end-to-end Reflex model is built to meet that challenge as one integrated driver, guided by strategic Reasoning when necessary, always subject to rigorous safety Guardrails, and improving with every mile on the road.
This is what scalable, trustworthy autonomy looks like: an end-to-end system that never drops the baton.