AI-generated video was a complete joke less than a year ago; look what it can do now. OpenAI has announced its new Sora model, which turns descriptive text into video, and calling it a frighteningly massive leap forward feels like an understatement.
Sora creates "realistic and imaginative scenes" from text prompts, meaning you can type in a scene with as much detail as you care to give it, and it'll go away and generate high-resolution video to match. In this way, it's similar to a lot of previous video generators we've seen in the last year or so.
But to give you a sense of the progress in this field, take a look at where the state of the game was in March 2023, then check out how far it had come by April 2023, then take a quick refresher on Google's Lumiere system from last month.
Now, take a look at what OpenAI is doing halfway through February 2024 with its new Sora system, and take a moment to appreciate the breathtaking pace of advancement. Here's a bunch of examples, with the prompts that led to them.
Good grief. For the most part, the physics of these scenes works uncannily well. The details and motion are realistic enough that you'd easily mistake many for real footage if you weren't hunting for mistakes – or noticing that it's realistic footage of something that doesn't actually exist.
It can also, according to OpenAI, save characters, locations and styles so they can be used across multiple scenes, pointing toward where this is eventually going: the ability to generate entire stories, shows or movies.
On the other hand, there's still plenty of room for improvement, and as with all creative AI systems, the results can be hilariously weird – especially when you ask for something particularly absurd.
And sometimes, it can come up with an unexpectedly artistic surprise or two.
OpenAI says it's "red teaming" Sora now – that is, throwing naughty prompts at it, trying to get it to do things it's not allowed to, so that all known ways of making that happen can be blocked. This will undoubtedly hobble it and make it a worse product, but it'll make Sora more copyright-compliant, less likely to generate "dangerous" content and more family-friendly.
But good lord, in a matter of months we've gone from this:
AI-generated video of Will Smith attempting to eat spaghetti without making a mess astounds with comedic horror
— neonpulse (@neonpulsedaily) April 1, 2023
Open source "text2video" ModelScope AI made the viral sensation possible, but it seems like poor Will Smith couldn't catch a break - or a noodle. 😂🍝 pic.twitter.com/fDbUS6FlQx
To this:
These AI systems, friends, are toddlers. Unbelievable as they already are, they're honing their talents at a rate unlike we've ever seen in any other area of technology. It's all happening so fast that I genuinely have no idea if we're close to the top of a curve here, with a long, slow, slog ahead to clean up all the little weird edge bits, or if we're just at the start of the AI acceleration curve.
Maybe, as some people think, we're close to the point where it's as good as it'll ever get, and as more and more of the internet becomes AI-generated, it'll eat so much of its own faeces that the quality will start to degrade and plummet.
All I know is this: we all better learn to love change, because humanity has never seen a transformation like the AI revolution before. The wheel, the light bulb, the combustion engine, the aeroplane, the computer, the internet ... None of them ever accelerated or proliferated like this, and none of them threatened our position at the top of the food chain. We're in uncharted territory.
Source: OpenAI