MIT is teaching cars to navigate using a simple map and visual data, just like humans
MIT is working on a new way for self-driving cars to get around in unfamiliar areas by imitating the way human drivers navigate. Called Variational End-to-End Navigation and Localization, it uses basic maps and video cameras to analyze and navigate a new location for which it hasn't been programmed with detailed databases.
The development of autonomous vehicles has demonstrated how difficult the task of developing self-driving capabilities is and shed light on the vast gulf between human and machine intelligence – not just in terms of computing power, but also in how each one solves problems.
For example, humans find it very easy to navigate in strange, complex locations with little more than a rough map and their eyes to go on, while autonomous vehicles in even familiar areas tend to rely on very complex arrays of sensors to generate detailed maps and databases for localization, mapping, object detection, motion planning, and steering control. Such maps can be as large as four terabytes for a city the size of San Francisco.
Humans can work with very basic information and then apply this to very complex situations. All a person needs is a simple map, such as that on a GPS device, and it's possible to relate this to what they see around them. Seemingly irrelevant data can also be disregarded or incorporated into navigation depending on the situation – like someone who can walk from Trafalgar Square to Liverpool Street Station in London by noting the pubs along the way.
According to the MIT team, Variational End-to-End Navigation and Localization is designed to mimic the human approach by learning from a human driver and then using that information to adapt to new situations with only a simple map and video cameras. The idea is that the machine will be able to take the approximations of the map and then correct them, fill in the details, and determine its position so it can correct its course to the desired destination.
To teach the computer, the team had a human driver operate an automated Toyota Prius while several cameras and a basic GPS collected data about suburban streets, their road structures, and obstacles. Unlike the more conventional approach that relies on very complex machine reasoning and databases, the MIT approach learns from visual cues. This means it doesn't need detailed instructions when it goes into a new area – it just needs a basic map.
Led by Daniela Rus, director of the Computer Science and Artificial Intelligence Laboratory (CSAIL), the MIT team has developed an end-to-end navigation system that differs in that, like a human, it is designed specifically to seek a destination rather than concentrating on just following the road. It does this by taking what it has learned from the human driver, then applying a statistical method to predict a full probability distribution taking into account all the possible steering commands at a particular point in time.
MIT says that this prediction is based on a machine learning model called a convolutional neural network (CNN) that learns how to steer by processing images collected during training with the human driver. This way, it knows how to handle different kinds of roads and junctions, including T-shaped intersections.
"Initially, at a T-shaped intersection, there are many different directions the car could turn," says Rus. "The model starts by thinking about all those directions, but as it sees more and more data about what people do, it will see that some people turn left and some turn right, but nobody goes straight. Straight ahead is ruled out as a possible direction, and the model learns that, at T-shaped intersections, it can only move left or right."
Variational End-to-End Navigation and Localization also allows the car to take into account other visial clues, like signs, road lines, and other markers, to figure out what kind of road it's on and predict crossings, as well as how to steer in a particular situation. In addition, it can analyze street patterns to help it determine where it is. A line of high-probability matches with what it sees and the map indicates a correct fix on its location. In this way the four-terabyte maps for one medium city can be reduced to a 40-gigabyte database for the entire planet.
It's also a system that is much more forgiving when there is a mismatch of data, being able to handle sensor failures and noisy inputs.
"Our objective is to achieve autonomous navigation that is robust for driving in new environments," says Rus. "For example, if we train an autonomous vehicle to drive in an urban setting such as the streets of Cambridge, the system should also be able to drive smoothly in the woods, even if that is an environment it has never seen before."
The research was presented in a paper at the 2019 International Conference on Robotics and Automation in Montreal. The video below discusses Variational End-to-End Navigation and Localization.