System creates lip-synced video from audio clips

System creates lip-synced vide...
The system has been trained on hours of Barack Obama speeches
The system has been trained on hours of Barack Obama speeches
View 1 Image
The system has been trained on hours of Barack Obama speeches
The system has been trained on hours of Barack Obama speeches

It's already possible to create a digital copy of someone's voice, enabling users to produce an audio file of them saying things that they never actually said. Listeners still might not be fooled, though, as there wouldn't be footage of the person speaking those words. Well … University of Washington researchers have now created a system that converts audio clips into lip-synced videos of the speaker.

In order for the system to work, it needs to analyze approximately 14 hours of existing footage of the person speaking – the researchers are hoping to reduce that figure significantly, perhaps down to one hour. Utilizing a neural network, it learns which of their mouth shapes accompany which speech sounds.

When the system is subsequently provided with a "target video" of the person (in which they could be talking about anything), along with an audio file of them speaking the desired words, it pairs the two together. It does so by dropping the video's original audio, replacing it with the desired audio, and mapping a computer-animated version of the speaker's mouth in place of their mouth in the video.

The end result is that people hear them speaking the desired words, and apparently see their mouth doing so, also. Although there's certainly the potential for treachery, the researchers have developed the technology with other uses in mind.

"Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings, as well as futuristic ones such as being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio," says assistant professor Ira Kemelmacher-Shlizerman. "This is the kind of breakthrough that will help enable those next steps."

You can see and hear the system in use, in the following video.

Source: University of Washington

Teaser -- Synthesizing Obama: Learning Lip Sync from Audio

Who thinks this is a good idea?
Fairly Reasoner
Well, that's just great.
Well, it's the end of credibility - bad enough Op-Ed pages are ghost written by people with agendas, now they will be able to make false videos (for any reason or side) that will muddy reality further.
On the upside, it could make for the Hollywood machine to break down if anyone with a good story can create realistic looking faces (no uncanny valley here!) speaking their lines without having to pay high priced actors for the service.