Next-gen deepfakes can falsely put words in people's mouths

By James Holloway

September 11, 2018

"Finding ways to detect them will be important moving forward," says Carnegie Mellon researcher Aayush Bansal

Carnegie Mellon University

View 1 Image

1/1

"Finding ways to detect them will be important moving forward," says Carnegie Mellon researcher Aayush Bansal

Carnegie Mellon University

Deepfake. It's a word that's entered the modern lexicon for all the wrong reasons. Combining the phrase deep learning with the word fake, the AI image-processing technique can superimpose the likeness of one person onto a video of someone else. Perhaps inevitably, the technology has become synonymous with pornography, with graphic videos apparently but falsely depicting celebrities now banned by Twitter, Reddit and even Pornhub. Now, new research at Carnegie Mellon could take deepfakes to the next level with a technique called Recycle-GAN, which can take the detailed content of one video or performer and apply it to another, keeping the style of the latter intact.

It's easier to understand when you see it in action, so here's a quick (wholly safe for work) example taking the visual content from a film of Martin Luther King Jr. and applying it to a video of Barack Obama.

MLK to Barack Obama

The first thing you'll notice is that Recycle-GAN, like other deepfake technology, is only visual. It doesn't transpose sound. But the technique is impressive all the same, marking an evolution in the AI methods used to transfer content from one video to another.

Carnegie Mellon's work builds on a type of AI algorithm called a GAN, which stands for generative adversarial network. As you might expect, a GAN uses a so-called generator capable of generating video content in the style of a source video. But crucially, this works alongside a discriminator that assesses the generated content against the original, and scores its consistency. With the two working against each other (hence adversarial), better results are achieved.

An iteration of the technique, known as cycle-GAN, converts the new content back to the style of the source material in an attempt to assess the quality of the conversion. In a neat analogy, the researchers compare this to gauging the quality of a translation from English to Spanish by translating the resulting Spanish back into English. But even with this extra step, results aren't perfect, and visual imperfections are by no means unusual.

With Recycle-GAN the researchers are going one better by factoring in time. Where GANs and cycle-GANs are purely visual, Recycle-GAN analyses those visual changes over time. Doing so creates additional constraints in the visual processing which, as counter-intuitive as it may sound, is what you want. It reduces the options in such a way that good results are more likely.

To their credit, the researchers are quick to point out the possible nefarious uses of their approach, and not only in the realm of pornography. We're getting much closer to the point at which convincing video "evidence" of someone's words or deeds can be entirely fabricated. "Finding ways to detect them will be important moving forward," says Carnegie Mellon researcher Aayush Bansal, in a university press release. It's refreshing honesty, as such academic press releases often leave out the less wholesome implications of a particular branch of research.

But as you'd expect, the researchers also point out more positive applications. Because the process needs no human input, it could prove tremendously helpful to video producers who may wish to apply a certain visual style to their work, converting black and white to color, for example.

It's not limited to videos of people, either. In the example below, timelapse films of different types of flowers opening have been synchronized using Recycle-GAN.

flower to flower via Recycle-GAN

The researchers even suggest Recycle-GAN could help in the development of autonomous cars in visually taxing conditions. By identifying hazards in daytime scenes, Recycle-GAN can theoretically convert them to, say, nightime or stormy scenes with those identifications intact.

The research is being presented at the European Conference on Computer Vision in Munich today. You can see more example videos on the project webpage.

Source: Carnegie Mellon University

Next-gen deepfakes can falsely put words in people's mouths

Tags

Most Viewed

Toyota and Lexus no longer most reliable carmakers, says Consumer Reports

France runs fusion reactor for record 22 minutes

Laser-wielding device is like an anti-aircraft system for mosquitoes

FREE NEWSLETTER