Apple's Depth Pro model 3D maps 2D images in a fraction of a second

By Paul Ridden

October 14, 2024

"Depth Pro synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details"

Apple

View 1 Image

1/1

"Depth Pro synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details"

Apple

Apple's Machine Learning Research wing has developed a foundational AI model "for zero-shot metric monocular depth estimation." Depth Pro enables high-speed generation of detailed 3D depth maps from a single two-dimensional image.

Our brains process visual information from two image sources – our eyes. Each has a slightly different view of the world, and these are combined into a single stereo image, with the differences also helping us to gauge how close or far objects are.

Many cameras and smartphones look at life through a single lens, but three dimensional depth maps can be created using information hidden in metadata of 2D photos (such as focal lengths and sensor info) or estimated using multiple images.

The Depth Pro system doesn't bother with all that though, yet is able to generate a detailed 3D depth map at 2.25 megapixels from a single image in 0.3 seconds via a standard graphics processing unit.

The AI model's architecture includes something called a multi-scale vision transformer to simultaneously process the overall context of an image as well as all the finer details like "hair, fur, and other fine structures." And it's able to estimate both relative and absolute depth, meaning that the model can furnish real-world measurements to allow, for example, augmented reality apps to precisely position virtual objects in a physical space.

The AI is able to do all this without needing resource-intensive training on very specific datasets, employing something called zero-shot learning – which IBM describes as "a machine learning scenario in which an AI model can recognize and categorize unseen classes without labeled examples." This makes for quite a versatile beast.

As for applications, beyond the AR scenario mentioned above, Depth Pro could make for much more efficient photo editing or even lead to real-time 3D imagery using a single-lens camera, and prove useful for helping machines like autonomous vehicles and robots to better perceive the world around them in real-time.

The project is still at the research stage, but perhaps unusually for Apple, the code and supporting documentation are being made available as open source on GitHub, allowing developers, scientists and coders to take the technology to the next level

A paper on the project has been published on the Arxiv server, and there's a live demo available for anyone who wants to experience the current version for themselves.

Source: Apple

Apple's Depth Pro model 3D maps 2D images in a fraction of a second

Tags

Most Viewed

Apollo laser takes down 200 drones unplugged

5,200 holes carved into a Peruvian mountain left by an ancient economy

Toyota's tiny, barebones IKEA pickup could be its most versatile ever

FREE NEWSLETTER