Robotics

Video: Eve humanoid voice-prompted to perform back-to-back multi-tasking

Video: Eve humanoid voice-prompted to perform back-to-back multi-tasking
"We’ve built a voice-controlled natural language interface to chain short-horizon capabilities across multiple small models into longer ones," says 1X's VP of AI, Eric Jang. "With humans directing the skill chaining, this allows us to accomplish the long-horizon behaviors shown in this video."
"We’ve built a voice-controlled natural language interface to chain short-horizon capabilities across multiple small models into longer ones," says 1X's VP of AI, Eric Jang. "With humans directing the skill chaining, this allows us to accomplish the long-horizon behaviors shown in this video."
View 1 Image
"We’ve built a voice-controlled natural language interface to chain short-horizon capabilities across multiple small models into longer ones," says 1X's VP of AI, Eric Jang. "With humans directing the skill chaining, this allows us to accomplish the long-horizon behaviors shown in this video."
1/1
"We’ve built a voice-controlled natural language interface to chain short-horizon capabilities across multiple small models into longer ones," says 1X's VP of AI, Eric Jang. "With humans directing the skill chaining, this allows us to accomplish the long-horizon behaviors shown in this video."

OpenAI-backed robotics company 1X has released a video of a bunch of wheeled service robots seamlessly moving from one simple task to another as they tidy up an office space, prompted into action by a voice-controlled natural language interface.

Halodi Robotics was founded in 2014 to develop general purpose robots to work alongside humans in the workplace. Originally headquartered in Norway, the company set up a second base of operations in California in 2019, which is when we first came across a pre-production prototype of a wheeled humanoid called Eve.

Halodi became 1X and partnered with OpenAI in 2022 "to combine robotics and AI and lay the foundation for embodied learning." Though the company does have a bipedal in the pipe, as well as human-like hands, much of the development focus at the moment seems to be on training Eve to be useful around the workplace, where the bots will "understand both natural language and physical space, so they can do real tasks throughout your workplace and your world."

1X now reports that a natural language interface has been created that allows an operator to control multiple humanoids using voice commands, with the robot helper then stringing together a bunch of learned actions to complete complex tasks.

Voice Commands & Chaining Tasks | 1X AI Update

Back in March, the company advised that it had managed to develop an autonomous model that crammed a large number of tasks into a single behavioral AI model – including taking items out of a shopping bag and then deciding where to put them, wiping up spills and folding shirts.

1X noted that improving the behavior of a single task within a relatively small multi-task model could adversely impact the behaviors of other tasks within that model. This could be fixed by increasing the parameter count, but at the expense of increased training time and slower development.

Instead, building a voice-controlled natural language interface into the mix allows operators "to chain short-horizon capabilities across multiple small models into longer ones." These single-task models can then be merged into goal-conditioned models as development moves toward a unified model with the ultimate aim of automating high-level actions using AI.

"Directing robots with this high-level language interface offers a new user experience for data collection," said the company's Eric Jang in a blog post. "Instead of using VR to control a single robot, an operator can direct multiple robots with high-level language and let the low-level policies execute low-level actions to realize those high-level goals. Because high-level actions are sent infrequently, operators can even control robots remotely."

1X states that the Eve humanoids in the video above are not tele-operated, all actions are controlled by a neural network. There are no computer-generated graphics either, or "cuts, video speedups, or scripted trajectory playback." The next step will be to integrate vision-language models such as GPT-4o, VILA and Gemini Vision into the system.

Source: 1X

4 comments
4 comments
Global
Can it teach people to not be so lazy slobs?
Username
Complete fail on the spilled coffee
Smokey_Bear
username - lol, yeah, it's basically at the same level of your average teenager.
Rocket
LOL , failed the spilled coffee and dropped the sweater on the floor , lots of work to do team