AI & Humanoids

Multiverse simulation: Robotic AI is about to accelerate sharply

Multiverse simulation: Robotic AI is about to accelerate sharply
Simulated vision data like this, built on sophisticated real-world physics engines and converted into photorealistic video, is set to rapidly accelerate the development of physical AIs that control robotics, humanoids and autonomous vehicles
Simulated vision data like this, built on sophisticated real-world physics engines and converted into photorealistic video, is set to rapidly accelerate the development of physical AIs that control robotics, humanoids and autonomous vehicles
View 3 Images
Simulated vision data like this, built on sophisticated real-world physics engines and converted into photorealistic video, is set to rapidly accelerate the development of physical AIs that control robotics, humanoids and autonomous vehicles
1/3
Simulated vision data like this, built on sophisticated real-world physics engines and converted into photorealistic video, is set to rapidly accelerate the development of physical AIs that control robotics, humanoids and autonomous vehicles
Multiverse-style simulations can game out the effects of a theoretically infinite number of decision paths for AI robots
2/3
Multiverse-style simulations can game out the effects of a theoretically infinite number of decision paths for AI robots
AI-powered robots will learn exponentially faster, as each real-world situation can be split into an infinite number of variants for training
3/3
AI-powered robots will learn exponentially faster, as each real-world situation can be split into an infinite number of variants for training
View gallery - 3 images

The AI behavior models controlling how robots interact with the physical world haven't been advancing at the crazy pace that GPT-style language models have – but new multiverse 'world simulators' from Nvidia and Google could change that rapidly.

There's a chicken-and-egg issue slowing things down for AI robotics; large language model (LLM) AIs have enjoyed the benefit of massive troves of data to train from, since the Internet already holds an extraordinary wealth of text, image, video and audio data.

But there's far less data for large behavior model (LBM) AIs to train on. Robots and autonomous vehicles are expensive and annoyingly physical, so data around 3D representations of real-world physical situations is taking a lot longer to collect and incorporate into AI models.

This is one of the reasons why Tesla was so keen to get self-driving hardware into as many of its cars as possible, as early as possible, to give the company a head start on data collection that could position it as the leader in autonomous vehicles.

But recent announcements from Nvidia and Google Deepmind suggest this data bottleneck will soon be eliminated, unlocking a massive acceleration of physical AI development.

Multiverse-style simulations can game out the effects of a theoretically infinite number of decision paths for AI robots
Multiverse-style simulations can game out the effects of a theoretically infinite number of decision paths for AI robots

Multiversal AI acceleration through real-world data simulation

The idea is to generate enormous amounts of reliable training data through the use of multiverse-style world simulators that can take a single real-world situation – or even just a text prompt, then create a virtual model of it, and then split it into a theoretically infinite number of slightly different situations.

So if you've got six cameras' worth of data from an autonomous car, for example, driving down a street on a nice summer's day, you could take that data, virtualize it to create a 3D world representation, and then use it to generate a huge number of slightly different situations. You could recreate the same situation at 100 different times of the day and night, under 100 different weather conditions that might include rain, snow, heavy wind or dense fog.

You could then split out virtual worlds for each of these time and weather scenarios, in which other vehicles on the road, or pedestrians, or animals, or objects, act slightly differently, creating an entirely new situation for your autonomous car to react to. If something drops, you can simulate it bouncing away in 100 different directions. You can simulate all sorts of edge cases that are incredibly unlikely in the real world.

And of course, you can split out different worlds from each of these, in which the autonomous car itself reacts and chooses different courses of action.

You can then take that simulated 3D world representation, and work backwards to generate high-quality simulated video feeds for all six of your original car's cameras – and data feeds for whatever other sensors your robotic system might have.

And hey presto: your single original chunk of data can turn into thousands, or millions of similar, but slightly different training scenarios, all generated using advanced physics and materials simulators.

“The ChatGPT moment for robotics is coming," said Jensen Huang, founder and CEO of Nvidia, announcing the launch of the company's Cosmos world simulation model during his keynote at CES. "Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not all developers have the expertise and resources to train their own. We created Cosmos to democratize physical AI and put general robotics in reach of every developer.”

The Cosmos model can also operate in real time, according to the video below, "bringing the power of foresight and multiverse simulation to AI models, generating every possible future to help the model select the right path."

NVIDIA Cosmos: A World Foundation Model Platform for Physical AI

Obviously, the data and processing requirements for this sort of thing will be absolutely epic, and nVidia has attempted to help address this with its own Cosmos Tokenizer, which can turn images and videos into tokens that AI models can process using about 1/8th the amount of data required by today's leading tokenizers, unlocking a 12X speed boost in processing.

As the world's leading AI hardware provider, nVidia already has a solid chunk of the emerging robotics industry on board with the Cosmos initiative. Companies like 1X, Figure AI, Fourier and Agility are adopting Cosmos to accelerate the training of humanoid robots, and Xpeng, Uber, Waavi and Wayve are among the autonomous car companies that are getting involved.

Meanwhile, Google Deepmind is launching its own similar initiative – albeit apparently a decent step behind nVidia. Former OpenAI Sora lead Tim Brooks, who now leads Deepmind's video generation and world sim team, made the following post on X yesterday:

In the job descriptions linked, the Google team points out that this kind of physical world simulation will be a critical step on the path to artificial general intelligence (AGI): "We believe scaling pre-training on video and multimodal data is on the critical path to artificial general intelligence. World models will power numerous domains, such as visual reasoning and simulation, planning for embodied agents, and real-time interactive entertainment."

Friends, it can be hard to know what's significant in the firehose of announcements around AI progress, and nigh-on impossible to keep track of everything that's going on. But to put this stuff in context, where LLMs like GPT are rapidly coming for white-collar jobs, LBMs embodied in robots – be they humanoid, vehicle-oid or in some other shape designed for a specific environment – are coming for anything more blue-collar, or that involves more interaction with the physical world.

The technology in this sector is already absolutely incredible, barely distinguishable from magic, and it promises to fundamentally and profoundly change the world over the coming years and decades. This multiverse simulation gear looks like it'll significantly accelerate progress toward the utopian vision of the post-labor economy... Or whatever less palatable outcome we might get instead.

Source: nVidia / Google Deepmind

View gallery - 3 images
3 comments
3 comments
Ancliff
NOW THE FUTURE IS HERE, AND VERY UNEVENLY DISTRIBUTED VISAVIS ROBOTS AND HUMANS !!
THEY ARE GOING TO LEARN FAST
Daishi
Moravec's paradox from 1988: "it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility"

This has turned out to be quite true recently. Solving that is one reason for this, another reason is that AI models in pre-training have already been given all published data on the Internet in pre-training and hit a plateau of diminishing returns shortly after. The simulated environments are useful for more than just reinforced learning they provide a potential source of synthetic training data to pick up where human generated data leaves off.

For those following LLM space OpenAI's "reasoning" models are scaled in a 3rd, interesting way which has allowed this plateau to be overcome and is part of why OpenAI is saying they are now able to achieve AGI: https://i.imgur.com/FT1BqC9.png

I think a good analogy to describe time-test scaling (reasoning) is a bit like the difference between instructing an employe to complete a task and asking that same employee to spend more time on the task ensuring accuracy and quality of the output. Through this method it is possible to throw more inference compute to get LLM's to deliver better results. This is important because only a few months ago many experts believed that an entirely different architecture would be required as LLM's hit a wall but through reasoning OpenAI has proven "there is no wall".
Global
Just plug in all the built in & add on dash cams live and it becomes a real world scenario, no longer virtual. Immense data streams, and server farms to handle the information flow, way beyond the 5G networks we currently have. But not insurmountable as Elon has already in place the capacity for android & apple phones data access world wide, space based communication.