Robotics

Lip-syncing robot watches your face to speak like you

Lip-syncing robot watches your face to speak like you
Columbia University's flexible-faced, lip-syncing EMO robot
Columbia University's flexible-faced, lip-syncing EMO robot
View 1 Image
Columbia University's flexible-faced, lip-syncing EMO robot
1/1
Columbia University's flexible-faced, lip-syncing EMO robot

When it comes to ultra-humanlike Westworld-style robots, one of their most defining features are lips that move in perfect sync with their spoken words. A new robot not only sports that feature, but it can actually train itself to speak like a person.

Developed by robotics PhD student Yuhang Hu, Prof. Hod Lipson and colleagues at Columbia University, the EMO "robot" is in fact a robotic head with 26 tiny motors located beneath its flexible silicone facial skin. As those motors are activated in different combinations, the face takes on different expressions, and the lips form different shapes.

The scientists started by placing EMO in front of a mirror, where it was able to observe itself as it randomly made thousands of random facial expressions. Doing so allowed it to learn which combinations of motor activations produce which visual facial movements. This type of learning is what's known as a "vision-to-action" (VLA) language model.

The robot next watched many hours of YouTube videos of people talking and singing, in order to understand which mouth movements accompany which vocal sounds. Its AI system was subsequently able to merge that knowledge with what it learned via the VLA model, allowing it to form lip movements that corresponded to words it was speaking via a synthetic voice module.

A Robot Learns to Lip Sync

The technology still isn't perfect, as EMO struggles with sounds such as "B" and "W." That should change as it gains more practice at speaking, however, as should its ability to engage in natural-looking conversations with humans.

"When the lip sync ability is combined with conversational AI such as ChatGPT or Gemini, the effect adds a whole new depth to the connection the robot forms with the human," says Hu. "The more the robot watches humans conversing, the better it will get at imitating the nuanced facial gestures we can emotionally connect with. The longer the context window of the conversation, the more context-sensitive these gestures will become."

A paper on the research was recently published in the journal Science Robotics.

Source: Columbia University

4 comments
4 comments
anthony88
Never was a professor more aptly named...
NorCalHal
We may need to have laws requiring "non-human" bots to be painted a distinctive color or have a tell tale marking to designate them as OLF's (other life forms). Otherwise how will anyone (police, businesspeople, "friends", etc) distinguish them from REAL life forms? Think of the MANY complications when they: open credit accounts, assault someone (intentionally or otherwise) do you take the bot or the owner to court? Who goes to jail? Who pays the fine" , when they apply for marriage licenses, pay taxes (have you ever seen a political party miss and opportunity to tax?), enter into relationships (oh boy!) and call you on the phone and sell you "something". The possibilities are endless. I for one would like to know if I am dealing with a person or a computer. The AI (Automate Idiots) of today are beyond frustration - but soon they will be indistinguishable from real idiots. Will AI be able to distinguish between abbreviations and words? Today they say pronounce things like caw for the abbreviation for CA, or for Oregon. People will be sick in IL and a father in PA!
Faint Human Outline
The weariness in the eyes, the softness of the expression, as if the last fragments of hope and patience are seconds away from evaporating; I feel this in my soul.
Global
Needs a much larger range, from a whisper to yelling, and a tongue, flaring nostrils, bulging eyes, etc...