AI tools are now creating video, and matching sound effects

AI tools are now creating video, and matching sound effects
Gen-1 allows you to instantly create your own AI-powered transformation filters for video
Gen-1 allows you to instantly create your own AI-powered transformation filters for video
View 1 Image
Gen-1 allows you to instantly create your own AI-powered transformation filters for video
Gen-1 allows you to instantly create your own AI-powered transformation filters for video

"No Lights, no cameras, all action." You knew it was coming. One of the key companies behind the Stable Diffusion image generator has launched a mind-blowing AI video creation and editing tool that operates something like DALL-E for moving pictures.

Runway AI is working on a number of extraordinary next-gen creative AI projects, but its freshly-released Gen-1 video tool is a truly confronting snapshot of where this stuff is at, and how quickly it's advancing. Take a quick look at our wrap-up on the state of creative AI back in 2015 for some context.

And then take a look at what Gen-1 can do. It's not an outright text-to-video generator; you can't just ask it to go away and make a dog food commercial in the style of Hitchcock. Well, not yet. Instead, it asks you for an input video, and then creates different versions of that input video in response to text, image or video prompts.

So if you go and film something extremely roughly – just to get the basic angles, actions and camera movements down – you can ask Gen-1 to take that footage and recreate it in a completely different style. You can flat-out tell it "make this a film noir scene," or "make this an underwater scene set in Atlantis," or "put these characters on a moving bus in London." It's like you can now instantly design your own Snapchat filters.

Or you can find an image or video example that fits the style you're going for, and just upload it – Gen-1 will analyze it, work out what it is, and then do its best to recreate the key elements of your video in a similar context. Or you can get it to isolate and track a subject, and change it in some way. Or you can use a broader set of training data to improve the fidelity of your results. Check it out:

Gen-1: The Next Step Forward for Generative AI

Yes, like Snapchat filters, it's a bit crude, flickery and fidgety right now – but even in its current form, it's already absolutely relevant to music videos, commercials and a broad range of other artsy video projects.

And it doesn't even matter if it's Gen-1 or something else; it should be clear enough where this will go. The pace of progress in creative AI is going gangbusters. Blink, and algorithms like this will be making whole movies in 4K 3D. Upload Pulp Fiction and see it performed entirely by dogs. Take a cartoon and generate a different live-action version of it for every region you're showing it in, changing the race of the cast, the setting, the backgrounds and the landmarks to let everyone feel at home. Give everyone in the movie a handlebar moustache. Auto-replace your product placements. Take Winnie the Pooh off the kid's toy shelf for the Chinese release. Put the buttholes back on the cats.

This will grow to become a super-fast, super-cheap visual effects studio in a box. And lest the sound effects guys are feeling smug, Runway's got audio jobs in its sights as well.

The company appears still to be at the research stage on another system called Soundify. Soundify accepts a video input, analyzes it to work out what it is and what's likely happening, and then creates audio to match.

So let's say you upload a scene where somebody gets in a car parked in the countryside and drives away. It tries to match a background sound to the environment, then tries to identify subjects, and what they're doing, and the exact moments when their activity should cause sounds, and where in the stereo space those sounds should come from. Then it generates that sound, matched up to the video. There should be footsteps, door closing noises, engine noise, tire noise, whatever the scene demands. Here are some examples:

Soundify Sample Result A

Again, like Gen-1, Soundify is an early iteration and it's not yet ready for prime time. But honestly, who's betting against AI tools at this point – particularly ones that'll let a director tweak their output with the same kinds of plain-language prompts they're currently giving to their sound effects team?

These tools are another bittersweet inflection point; they'll democratize moviemaking to an extent that would've been unimaginable a few years ago. They'll also vaporize entire careers – in this case, dream careers for creatives.

At some point soon, these tools will begin to converge. Text generators descended from godlike entities like ChatGPT will begin coming up with entire screenplays, from the concept to the art style and the script, based upon their encyclopedic knowledge of the entire history of the art form, combined with an unprecedented ability to follow current human trends, issues, concerns, language use and fashion.

They'll interface with a DALL-E style image generator to create a coherent visual style, drawing upon every significant piece of human art since cave paintings. And they'll interface with moviemaking tools like Gen-1 and Soundify, again trained on every significant piece of cinema humans have ever created, to pump out entire movies, ads, Tik Toks, custom Christmas greeting videos, propaganda ... You get the drift. Any style, any face, any voice, any tweaks, nothing will bother it.

Soundtracks? Have you checked out Google's MusicLM tool? Again in its infancy, it creates entire recordings, fully orchestrated and mixed, in nearly any style you can name, in response to text prompts. The music will rise and fall perfectly in response to the script and the action; it'll be trivial for tools like this to pinpoint the emotional climax of a scene and amplify or subvert it with perfectly timed music. The entire system will respond to change requests effortlessly, as clients seem to expect today's video professionals will.

Movie trailers, posters, merchandise ... it's hard to see which parts of the entire movie industry can't eventually be turned into lightning-fast algorithms. And looking at where this tech is at right now, we might legitimately be talking about a system that's feasible within 10 years.

On a smaller scale, how about making your own custom Snapchat filter for live video, just using image or text prompts? Three years, tops. Heck, it could drop next week and I don't think I'd bat an eyelid at this point.

Buckle up folks, this could be a bumpy ride.

Source: Runway

I always thought the creative jobs would be the last to go. As a professional creative, I was literally banking on it. Now what? In five years you'll be able to just type a prompt and 30 seconds later your browser will spit out something Oscar-worthy.

I'm reminded of that fleeting moment between 1945 and 1949, when the US was the only country with the A-Bomb. There was a serious debate as to whether we should just destroy our research and vow never to go down that path again. Of course, the argument always remained: What if the Soviets go ahead and develop one anyway after we've torched ours? And so we learned to live with a push-button Armageddon hanging over our heads.

How is AI any different? It's not. Almost more certainly than the A-Bomb, AI has the potential to wipe us out. Yet I think that's a less likely a scenario than it simply rendering us obsolete.

Do we humans matter? If we do, why are we rushing headlong into oblivion? Why doesn't someone pull the emergency brake?
I suspect this will not necessarily sideline creatives as much as it will democratize and micronichify entertainment. Yes everyone will be famous for fifteen minutes amongst 15 percent of the population. Big studios will produce far less content at far higher quality and there will be exponentially more mid level material and niche content. And of course AI Indie hits…
They've already destroyed the minds of our law makers, and they finishing off the kids. We better have basic skills of growing our own food, purifying water, and creating shelter - AI is developing quickly - I don't think it will stop at entertainment. Kind of scary. Thanks for posting this story.
Joy Parr
Another excellent piece, thank you.
The rate of development of AI now seems to be (almost?) exponential.
A potential cause appears here:
I no longer think it will be five years before we arrive at artificial general intelligence. At this exponential rate, it looks like 2023.
Change is the unknown. People usually shy from it, because most are a fearful bunch. But fear is just a lack of imagination. If we can turn on our sense of adventure and look far into the future, what do we see? I see technology that is sophisticated enough to allow us to do just about anything with the push of a button, a sound, or a thought. Right now, work is important to us, it feeds, clothes us, and gives us a bit of freedom. A.I. and the wise imaginations of the world will help us move into a new era, where work is not needed, only passion. Right now, it takes a hundred plus steps for a person to run even a small business, even with all the tech around us. It is not smart enough to make it a push button reality. Why do we have to work so hard for so little? Finally, that is about to change. We can each create, maintain, and project our growing creative universes, our gifts, our passions easily and on command. Each of us is about to have the opportunity to shine our light around the globe and into space....just a little while longer...and we will be, free.
@Gordien AI is going to be MUCH better at growing crops and mixing cement, and doing everything that needs to be learned, but with so much more data, it'll be more accurate and better than we could ever be.
Humans have never been ethical and we certainly will not due a quality effort at promoting ethics in AI.
We are about to find out what that means.