Computers

Samsung AI brings the Mona Lisa (or any other picture) to life

Samsung AI brings the Mona Lisa (or any other picture) to life
Mona Lisa comes to life thanks to Samsung's generative adversarial networks research
Mona Lisa comes to life thanks to Samsung's generative adversarial networks research
View 4 Images
Mona Lisa comes to life thanks to Samsung's generative adversarial networks research
1/4
Mona Lisa comes to life thanks to Samsung's generative adversarial networks research
The technology can operate using a single source photo, since it's got millions of other faces already stored in its training algorithms
2/4
The technology can operate using a single source photo, since it's got millions of other faces already stored in its training algorithms
A greater number of training frames gives better results
3/4
A greater number of training frames gives better results
The technology can take a video of one person talking, and map it onto another person's head
4/4
The technology can take a video of one person talking, and map it onto another person's head
View gallery - 4 images

Using the latest trend in artificial intelligence – adversarial learning – Samsung's AI Center in Moscow has demonstrated that it can take a single image of a person and turn it into a talking head. And if watching the Mona Lisa come to life doesn't send chills down your spine, you need to check your pulse.

The system takes a number of images of a person – and that number could be just one, or more for better results – and runs it through an off-the-shelf "face landmark tracker" to work out where the eyes, eyebrows, nose, lips and jawline are. It does the same for another "driving" source video, going frame by frame to track the motion of these face landmarks.

There's a separate meta-learning stage, in which different AI networks are trained to do different jobs, using an enormous video dataset of talking heads. An Embedder network takes source frames and their landmark tracking data to create vectors, while a Generator network learns to take vectors and images, and generate short videos in which the still faces are animated to move according to the vector movement.

The technology can take a video of one person talking, and map it onto another person's head
The technology can take a video of one person talking, and map it onto another person's head

The third "Discriminator" network sets up the adversarial relationship – it learns to look at videos of moving faces, and tell which ones are real videos, and which ones have been faked up by the Generator network. So you've got two networks working against each other – one trying to fool the other, the other trying to spot fakes.

These networks start out really bad at their jobs, but as they perform their jobs millions of times, they begin to improve, and the competition between the two networks is what drives both to continue getting better. The Discriminator network isn't looking for the same things a human fake-spotter might be looking for, but it doesn't matter – whatever it's looking for, it keeps getting better at discriminating, so the Generator network has to keep getting better to keep fooling it.

This is another glimpse at the very exciting potential of Generative Adversarial Networks, which are popping up all over the AI world. But to truly appreciate it, you need to watch the video below. Skip to 4:16 if you want to get straight to seeing how the model performs with single-shot stills of Marilyn Monroe, Salvador Dali, Rasputin and Einstein, and then on to paintings.

The technology can operate using a single source photo, since it's got millions of other faces already stored in its training algorithms
The technology can operate using a single source photo, since it's got millions of other faces already stored in its training algorithms

Driven by three different driver videos, each face displays three very different personalities – it's a nod to just how much an actor can change their perceived personality by learning to use their face and body muscles in different ways. And seeing the Mona Lisa come to life might bring a smile to your lips – until you consider what developments like this mean for ever more realistic and easier to produce deepfakes.

The Samsung AI team's research paper is available at Arxiv.

Source: Samsung via Pocketlint

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

View gallery - 4 images
4 comments
4 comments
RXStephen
Finally. Max Headroom!
Juanjo
Wellcome tools to make deep fake news and images and videos. Another weapon for the cyber wars. It is a sad news.
AI is a very powerful tool, but it is aproximately 100 times more powerful making bad than good, the same rate as we humans use to do.
This is going to be worse than an hydrogen bomb in 20 years from now. I wish to be wrong, but...
Buzzclick
It's not perfect (yet) but it is remarkable. As you say Loz, it's most striking when a subject from a painting, that we've come to know all these years as this still image, comes to life. It's freaky.
No doubt, this tech will get further refined and used for nefarious means as well as for its entertainment value. We've come to expect it in our ever-present hyper media fake news reality of today. Is it real or fake? Not just the story line, but the actual talking head as well. We must question everything:
"Mr. Vladimir Putin please step right up and say something controversial for our cameras in response to what Xi Xing Ping said yesterday".
IvanWashington
jeez almighty... angels and ministers of grace, defend us!