One of the biggest problems with self-driving systems is that they can see the road perfectly and still make uncertain short-term decisions in chaotic city traffic. The advanced systems struggle to keep up with the complex and fluctuating road situations. But a new study argues that what these cars need isn’t better vision, but better memory.
In the peer-reviewed article HELD (Knowledge-based prediction of trajectories from successive driving frames with vision-language models)Tongji University researchers and collaborators developed a system that helps autonomous vehicles “remember” past driving scenes before deciding what to do next.
How does this new self-driving technology work?
The method, called KEPT, takes front-camera videos, compares them to a large library of previous real-world driving clips, and then predicts a safer short-term trajectory based on both the current scene and recalled examples from the past. The core idea is pretty intuitive. Instead of asking an AI model to react to every situation as if it had never seen anything like it before, KEPT allows it to remember similar moments from previous rides.
These examples are then fed into a vision language model as part of a structured argumentation process. This is important because researchers say large visual language models can otherwise hallucinate, ignore physical constraints, or suggest movements that seem plausible on paper but aren’t good for a real car. So KEPT basically acts like guard rails to keep the model grounded in what similar traffic situations would look like in the real world.
Is it better than traditional autonomous systems?
The researchers tested KEPT against the widely used nuScenes benchmark and said it outperformed both traditional end-to-end scheduling systems and newer vision language-based schedulers on open-loop metrics. It even managed to reduce prediction errors and lower potential collision indicators while keeping retrieval fast enough to remain practical for real-time driving.
This may seem like it’s an obvious choice for next-generation self-driving cars, but it’s not road-ready yet. Still, the broader idea is compelling. If autonomous cars can combine real-time perception with meaningful memory of how similar situations played out before, they could end up making decisions that feel less fragile and more humane.




