Innovative technology allows cars to see around corners
July 6, 2024
0
Researchers use shadows to model 3D scenes, including objects that are hidden from view. This technique could lead to safer autonomous vehicles, more efficient AR/VR headsets, or faster
Researchers use shadows to model 3D scenes, including objects that are hidden from view. This technique could lead to safer autonomous vehicles, more efficient AR/VR headsets, or faster warehouse robots.
Imagine driving through a tunnel in an autonomous vehicle, only to unknowingly have an accident stop traffic in front of you. You often have to rely on the vehicle in front of you to know when to start braking. What if your vehicle could see the vehicle in front and brake sooner?
Researchers at MIT and Meta have developed a computer vision technique that could one day allow an autonomous vehicle to do just that. They’ve introduced a method that uses images from a single camera to create physically accurate 3D models of an entire scene, including areas where the field of view is obstructed. Their technique uses shadows to determine what’s in occluded parts of the scene.
Plato-NeRF is a computer vision system that combines lidar measurements with machine learning to reconstruct a 3D scene, including hidden objects, from a single camera using shadows. Here, the system accurately simulates a rabbit sitting in a chair, even when the view is obstructed. Credit: Provided by researchers, edited by MIT News
They call their approach PlatoNeRF, which is based on Plato’s allegory of the cave; a passage from the Greek philosopher’s Republic in which prisoners locked in a cave see the reality of the outside world based on the shadows cast by the cave walls.
By combining lidar (light detection and ranging) with machine learning, PlatoNeRF can produce more accurate 3D geometry reconstructions than some existing AI methods. PlatoNeRF also does a better job of seamlessly reconstructing scenes where shadows are hard to see, such as scenes with high lighting or dark backgrounds.
Improving AR/VR and Robotics with PlatoNeRF
In addition to improving the safety of autonomous vehicles, PlatoNeRF could make AR/VR headsets more efficient by allowing the user to model the geometry of a room without having to walk around and take measurements. It could also help warehouse robots find items faster in a cluttered environment.
“Our original idea was to bring together these two things that had been done before in different disciplines—multibounce lidar and machine learning.” “When you bring these two worlds together, you find a lot of new opportunities to explore and get the best of both worlds,” says Zofi Klinghoffer, a graduate student in MIT Media Arts & Sciences and a research associate in the Camera Culture Group at Media Labs at MIT and lead author of the PlatoNeRF paper.
Klinghoffer wrote the paper with his advisor Ramesh Raskar, an associate professor of media arts and sciences and director of the Camera Culture Group at MIT; senior author Rakesh Ranjan, director of artificial intelligence research at Meta Reality Labs; Siddharth Somasundaram, a research associate in the Camera Culture Group; and Xiaoyu Xiang, Yuchen Fan, and Christian Richardt from Meta. The work will be presented at a conference on computer vision and pattern recognition.
Advanced 3D reconstruction using lidar and machine learning
Reconstructing a full 3D scene from a single camera image is a challenging problem. Some machine learning approaches use generative AI models that try to guess what’s in closed spaces, but these models can hallucinate objects that aren’t actually there. Other approaches try to determine the shapes of hidden objects using shadows in a color image, but these methods can struggle when the shadows are hard to see.
The MIT researchers developed these approaches for PlatoNeRF using a new sensing model called single-photon lidar. Lidars image a 3D scene by emitting pulses of light and measuring the time it takes for that light to return to the sensor. Because single-photon lidars can detect individual photons, they provide higher-resolution data.
The researchers use a single-photon lidar to illuminate a target point in the scene. Some of the light bounces off that point and returns directly to the sensor. However, most of the light is scattered and reflected by other objects before returning to the sensor. PlatoNeRF relies on these second light reflections.
By calculating how long it takes for light to bounce twice and then return to the lidar sensor, PlatoNeRF captures additional information about the scene, including depth. The second light reflection also contains shadow information.
The system tracks secondary light rays (those reflected from the target point to other points in the scene) to determine which points are in shadow (due to a lack of light). Based on the location of these shadows, PlatoNeRF can infer the geometry of hidden objects.
By sequentially illuminating 16 points, Lidar captures multiple images that are used to reconstruct the entire 3D scene.
“Every time we illuminate a point in the scene, we create new shadows. Because we have all these different light sources, we have a lot of light rays scattered around, so we create an area that is occluded and outside the visible eye,” Klinghoffer says.
Combination of multi-hop lidar and neural radiation fields
The key to PlatoNeRF is the combination of multi-bounce lidar with a special type of machine learning model known as a Neural Radiation Field (NeRF). NeRF encodes scene geometry into neural network weights, giving the model a powerful ability to interpolate, or predict, new scene appearances. This interpolation capability also results in extremely accurate scene reconstructions when combined with multi-bounce lidar, Klinghoffer says.
“The biggest challenge was figuring out how to combine those two things. We had to really think about the physics of how light is transmitted through multi-hop lidar and how to model that with machine learning,” he says.
They compared PlatoNeRF with two traditional alternative methods: one using lidar only, the other using NeRF with color imaging only.
They found that their method can outperform both methods, especially when the lidar sensor has lower resolution. This will make their approach more practical for deployment in the real world, where low-resolution sensors are common in commercial devices.
“About 15 years ago, our group invented the first camera that ‘sees’ around corners, using multiple light reflections or ‘light echoes’. These methods used special lasers and sensors, and used three reflections of light. Since then, lidar technology has become more popular, which led to our research into cameras that can see through fog. This new work uses only two light reflections, which means the signal-to-noise ratio is very high and the quality of the 3D reconstruction is impressive,” says Raskar.
In the future, the researchers want to try tracking more than two light bounces to see how this can improve scene reconstruction. Additionally, they are interested in combining PlatoNeRF with color image measurements to apply deeper learning techniques and extract texture information.
“While camera-captured shadow images have long been studied as a tool for 3D reconstruction, this work returns to the problem in the context of Lidar by showing significant improvements in the accuracy of reconstructed hidden geometry. The work shows how clever algorithms can create extraordinary capabilities when combined with ordinary sensors, including the Lidar systems that many of us now carry in our pockets,” said David Lindell, an associate professor of computer science at the University of Toronto, who was not involved in this study.
As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.