OpenAI Jukebox begins creating new Elvis and Sinatra songs, with lyrics
"It's Christmas time, and you know what that means," sings Sinatra, "Ohh, it's hot tub time!" A neural net has begun composing entire songs modeled on the prior works of artists in a wide range of genres, including lyrics and vocal performances.
"Some people like to go skiing in the snow,
But this is much better than that,
So grab your bathrobe and meet me by the door,
Ohh, it's hot tub time!"
That vocal tone is unmistakeable. As is the trademark lazy phrasing of the performance. You could easily be fooled into thinking it was a genuine Frank Sinatra song if you weren't listening closely, until it begins to fall apart. Of course it's not real, it's the complete creation of a neural net that has digested and analyzed the works of Sinatra, and a number of other artists, and is now trying to write new songs that mimic their musical styles, instrumentation, lyrics and vocal performances to create new music.
Jukebox is an artificial intelligence project by OpenAI, a research lab in San Francisco backed by Microsoft and co-chaired by Elon Musk, dedicated to the mission of ensuring that AI "benefits all of humanity." Prior OpenAI projects have focused on video game esports, robotics and the GPT-2 text generation algorithm, which is so good at producing believable writing that 72 percent of people in one study thought they were reading genuine New York Times articles. For reference, when presented with real NYT articles, the same group only thought they were the real deal 83 percent of the time.
Jukebox aspires one day to fool people with music, as well. Having listened to around 1.2 million songs, categorized by artist, album genre, year, moods and keywords, it has begun composing. Using lyrics co-written by a language model and members of the OpenAI team, the team began requesting songs in the style of certain performers, genres and time periods.
The Jukebox AI algorithm works in an interesting way, compressing songs to three different levels of sample compression as it "listens" to them. The top level is super-compressed, losing most of its detail, taking in information on pitch, melody and volume; the algorithm is taking in the overall structure of the song. The lower, less compressed levels add detail and timbre to the sounds, improving quality.
Then, when it comes time to take its lyric sheet and write a song, Jukebox does things the same way, generating a low-resolution backbone for the song, and then filling in detail, color and timbre on top as it goes.
Jukebox won't be fooling people just yet; it comes out with things that sound a fair bit like the original artists in terms of style and vocal tone. But the way it renders words is very hit and miss; you can hear it blending recordings from different microphones and swimming in and out of moments of clarity. But every now and then, you get a spooky moment where it sounds like the original artist emerging from the mess and it delivers a phrase straight from the heart. And sometimes, the music does something really interesting that sounds like the kind of idea you could use.
The system is still in an embryonic stage, but it's fair to say it's already making a significant contribution to the annals of AI-produced music, an area in which there are many projects tackling computer composition from different angles and approaches. This is the first we've heard that learns and mimics the performance style of famous artists, synthesizing their vocal tones, tendencies and phrasing around newly-generated lyrics.
Check out a group of curated (and much less successful uncurated) samples at the Jukebox website.