Voice-imitation advance means we can't trust what we see or hear anymore
Imagine a world where anyone could create a photo-realistic fake video in which whoever they choose can be made to say whatever they want. Add to that the ability to write a script and have a machine recite it back with the perfectly indistinguishable intonation of the person featured. We are officially moving closer to that world.
A Montreal-based AI startup has recently revealed a new voice imitation technology that could signal the end of trusting your ears, meaning pretty soon there could be a cloud of doubt over literally every "recording" you see and hear.
Developed by three PhD students at the University of Montreal, Lyrebird is a deep learning algorithm that reportedly needs only a 60-second sample of a person's voice to be able to generate a synthesized copy. While the company touts applications such as speech synthesis for people with disabilities, it's clear this technology is opening a Pandora's box of future complications.
Recognizing the controversial applications Lyrebird has a dedicated "Ethics" page on its website, openly discussing the potentially dangerous consequences of the technology. The company intends to release the technology publicly and make it available to anyone, with the idea being that demonstrating so visibly how voices can be artificially faked, we will all learn to become skeptical of audio recordings we hear in the future.
The Lyrebird technology recalls a similar announcement from Adobe in late 2016 revealing an experimental project called VoCo. Euphemistically dubbed, "Photoshopping Voiceovers," VoCo is a technology that can learn a person's speech patterns and replicate that person saying anything the user desires.
Adobe stressed in the presentation that a form of watermarking would be added to the voice data, allowing for altered passages to be clearly identifiable under forensic examination. But this seems like a tokenistic security addendum that is likely to be easily circumvented.
While these early demonstrations of both the Lyrebird and the VoCo systems sound slightly stilted and mechanical, they do show a proof-of-concept leading us towards a not-too-distant future where anyone's voice could be easily made to say just about anything. Fast forward this technology a couple of generations and it's not hard to imagine the results being indistinguishable from the person being imitated.
In addition to these voice imitation technologies, we've seen a variety of recent research looking to develop AI that can create photo-realistic pictures and video of anyone, living or dead. After Star Wars: Rogue One digitally reincarnated a long-passed actor, the question was asked, how long would we even need real actors to continue making movies?
In 2016, a team of researchers invented a system called Face2Face. The technology allowed for the capture of a source face whose movements and expressions are mimicked in real time by the famous face on screen. The demonstration video that was released showed famous politicians from Vladimir Putin to Barack Obama being controlled like puppets.
Picture this technology being combined with the Lyrebird or VoCo systems, and we see a future where real-time audio-video transmissions could be created showing any person saying anything. This is the specter of fake news made frighteningly real. With the speed that content already flies around platforms like Facebook, we are quickly entering a brave new world where nothing can be trusted.
In a future where the words and images of people we trust can be easily and realistically faked, how are we to know what to believe?
Sure, right now all these technologies are reasonably primitive, and savvy viewers may be able to easily pick the AI constructions, but at the rate the algorithms are evolving it won't be long before the line between real and fake has been entirely demolished. Stay tuned for a future of mass skepticism and amusing viral videos featuring politicians saying the darnedest things.