Figure has demonstrated the first fruit of its collaboration with OpenAI to enhance the capabilities of humanoid robots. In a video released today, the Figure 01 bot is seen conversing in real-time.
The development progress at Figure is nothing short of extraordinary. Entrepreneur Brett Adcock only emerged from stealth last year, after gathering together a bunch of key players from Boston Dynamics, Tesla Google DeepMind and Archer Aviation to "create the world's first commercially viable general purpose humanoid robot."
By October, the Figure 01 was already up on its feet and performing basic autonomous tasks. By the turn of the year, the robot had watch-and-learn capabilities, and was ready to enter the workforce at BMW by mid-January.
We got to see it on the warehouse floor last month, just before Figure announced a successful Series B funding round along with a collaboration agreement with OpenAI "to develop next generation AI models for humanoid robots." Now we get a taste for what that means.
Adcock confirmed in an X post that Figure 01's integrated cameras send data to a large vision-language model trained by OpenAI, while Figure's own neural networks also "take images in at 10 Hz through cameras on the robot." OpenAI is also responsible for the ability to understand spoken words, and all of this influx of information is translated into "fast, low level, dexterous robot actions" by Figure's neural net.
He confirmed that the robot was not teleoperated during the demo, and that the video was filmed at actual speed. All up, a remarkable achievement for a partnership that's less than two weeks old – "our goal is to train a world model to operate humanoid robots at the billion-unit level," said Adcock. At this rate, we won't have to wait long.
Source: Figure