Robotics

Video: GPT-enhanced humanoid speaks and reasons as it works

Video: GPT-enhanced humanoid speaks and reasons as it works
Thanks to a collaboration between Figure and OpenAI, the Figure 01 humanoid robot can now converse with people in real-time, and act on requests to do stuff
Thanks to a collaboration between Figure and OpenAI, the Figure 01 humanoid robot can now converse with people in real-time, and act on requests to do stuff
View 1 Image
Thanks to a collaboration between Figure and OpenAI, the Figure 01 humanoid robot can now converse with people in real-time, and act on requests to do stuff
1/1
Thanks to a collaboration between Figure and OpenAI, the Figure 01 humanoid robot can now converse with people in real-time, and act on requests to do stuff

Figure has demonstrated the first fruit of its collaboration with OpenAI to enhance the capabilities of humanoid robots. In a video released today, the Figure 01 bot is seen conversing in real-time.

The development progress at Figure is nothing short of extraordinary. Entrepreneur Brett Adcock only emerged from stealth last year, after gathering together a bunch of key players from Boston Dynamics, Tesla Google DeepMind and Archer Aviation to "create the world's first commercially viable general purpose humanoid robot."

By October, the Figure 01 was already up on its feet and performing basic autonomous tasks. By the turn of the year, the robot had watch-and-learn capabilities, and was ready to enter the workforce at BMW by mid-January.

We got to see it on the warehouse floor last month, just before Figure announced a successful Series B funding round along with a collaboration agreement with OpenAI "to develop next generation AI models for humanoid robots." Now we get a taste for what that means.

Figure Status Update - OpenAI Speech-to-Speech Reasoning

Adcock confirmed in an X post that Figure 01's integrated cameras send data to a large vision-language model trained by OpenAI, while Figure's own neural networks also "take images in at 10 Hz through cameras on the robot." OpenAI is also responsible for the ability to understand spoken words, and all of this influx of information is translated into "fast, low level, dexterous robot actions" by Figure's neural net.

He confirmed that the robot was not teleoperated during the demo, and that the video was filmed at actual speed. All up, a remarkable achievement for a partnership that's less than two weeks old – "our goal is to train a world model to operate humanoid robots at the billion-unit level," said Adcock. At this rate, we won't have to wait long.

Source: Figure

13 comments
13 comments
Thony
I wish many improvements in many other fields of science could go at least as fast.... But hey, maybe that will be the case now ? (keep on dreaming)
Faint Human Outline
We are getting nearer to the gaps being closed across the market. If this gets traction in one industry, the pace of adoption may be sooner than two years. The data collected from one industry will transition to the next and so on.
Tristan P
Kinda impressive, freaky and fascinating - all at once.
Smokey_Bear
When I saw this earlier in the day, my jaw hit the floor. Figure has moved up from#2 to #1, sorry Tesla.
I didn't think Openai's partnership with Figure would be adapted so quickly, they have helped 1x, but 1x's robots don't hold a candle to Figures.
It's also got impressive hands, crazy that they are already moving at human speed, I thought they would remain slow until it got refined enough for quicker speeds...clearly I was wrong.
Can't wait to see their next video, and I'm sure a Teslabot video won't be far behind.
Rustgecko
Imagine this technology in 10, 20 and 30 years and how it will transform our societies.
michael_dowling
Someone pinch me,and I will wake up to learn this robot was teleoperated,really.. Household robots could be the next big thing,and I mean multi-billion dollar thing. They would have to be leased by most users,but with the help of OpenAI,a home general purpose robot could be just around the corner.
Daishi
I've seen some people on X say there will be some big announcements in robotics this year. People are saying this year will be for robotics what last year was for language models. This is definitely one of those announcements. In the same way Sora needs to work out the physics of how objects interact in order to generate convincing video it could likely work in reverse by allowing it a deeper understanding of the physical world. This is similar to how neural nets trained to recognize and label images were later used to generate them.
The Alchemist
I think the scariest part of this video is that it sounds like Gavin Newsom.
McDesign
Simply stunning, and stunning speed of advances!
Cynthia
Why are they using the AI on a server to manage 1 billion units instead of installing the AI in each individual unit? This centralized AI will contribute to humanity's fears of an AI takeover with a billion robot army. Just saying... Don't do it that way.
Load More