Robotics

Lip-syncing robot watches your face to speak like you

Lip-syncing robot watches your face to speak like you
Columbia University's flexible-faced, lip-syncing EMO robot
Columbia University's flexible-faced, lip-syncing EMO robot
View 1 Image
Columbia University's flexible-faced, lip-syncing EMO robot
1/1
Columbia University's flexible-faced, lip-syncing EMO robot

When it comes to ultra-humanlike Westworld-style robots, one of their most defining features are lips that move in perfect sync with their spoken words. A new robot not only sports that feature, but it can actually train itself to speak like a person.

Developed by robotics PhD student Yuhang Hu, Prof. Hod Lipson and colleagues at Columbia University, the EMO "robot" is in fact a robotic head with 26 tiny motors located beneath its flexible silicone facial skin. As those motors are activated in different combinations, the face takes on different expressions, and the lips form different shapes.

The scientists started by placing EMO in front of a mirror, where it was able to observe itself as it randomly made thousands of random facial expressions. Doing so allowed it to learn which combinations of motor activations produce which visual facial movements. This type of learning is what's known as a "vision-to-action" (VLA) language model.

The robot next watched many hours of YouTube videos of people talking and singing, in order to understand which mouth movements accompany which vocal sounds. Its AI system was subsequently able to merge that knowledge with what it learned via the VLA model, allowing it to form lip movements that corresponded to words it was speaking via a synthetic voice module.

A Robot Learns to Lip Sync

The technology still isn't perfect, as EMO struggles with sounds such as "B" and "W." That should change as it gains more practice at speaking, however, as should its ability to engage in natural-looking conversations with humans.

"When the lip sync ability is combined with conversational AI such as ChatGPT or Gemini, the effect adds a whole new depth to the connection the robot forms with the human," says Hu. "The more the robot watches humans conversing, the better it will get at imitating the nuanced facial gestures we can emotionally connect with. The longer the context window of the conversation, the more context-sensitive these gestures will become."

A paper on the research was recently published in the journal Science Robotics.

Source: Columbia University

No comments
0 comments
There are no comments. Be the first!