OpenAI Sora: Wake up humans, the world has changed again

OpenAI Sora: Wake up humans, the world has changed again
Wake up, human, the world has changed again
Wake up, human, the world has changed again
View 1 Image
Wake up, human, the world has changed again
Wake up, human, the world has changed again

AI-generated video was a complete joke less than a year ago; look what it can do now. OpenAI has announced its new Sora model, which turns descriptive text into video, and calling it a frighteningly massive leap forward feels like an understatement.

Sora creates "realistic and imaginative scenes" from text prompts, meaning you can type in a scene with as much detail as you care to give it, and it'll go away and generate high-resolution video to match. In this way, it's similar to a lot of previous video generators we've seen in the last year or so.

But to give you a sense of the progress in this field, take a look at where the state of the game was in March 2023, then check out how far it had come by April 2023, then take a quick refresher on Google's Lumiere system from last month.

Now, take a look at what OpenAI is doing halfway through February 2024 with its new Sora system, and take a moment to appreciate the breathtaking pace of advancement. Here's a bunch of examples, with the prompts that led to them.

Prompt: A Samoyed and a Golden Retriever dog are playfully romping through a futuristic neon city at night. The neon lights emitted from the nearby buildings glistens off of their fur.
Prompt: The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

Good grief. For the most part, the physics of these scenes works uncannily well. The details and motion are realistic enough that you'd easily mistake many for real footage if you weren't hunting for mistakes – or noticing that it's realistic footage of something that doesn't actually exist.

Prompt: Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.

It can also, according to OpenAI, save characters, locations and styles so they can be used across multiple scenes, pointing toward where this is eventually going: the ability to generate entire stories, shows or movies.

Prompt: The story of a robot’s life in a cyberpunk setting.

On the other hand, there's still plenty of room for improvement, and as with all creative AI systems, the results can be hilariously weird – especially when you ask for something particularly absurd.

Prompt: Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.

And sometimes, it can come up with an unexpectedly artistic surprise or two.

Prompt: A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera.

OpenAI says it's "red teaming" Sora now – that is, throwing naughty prompts at it, trying to get it to do things it's not allowed to, so that all known ways of making that happen can be blocked. This will undoubtedly hobble it and make it a worse product, but it'll make Sora more copyright-compliant, less likely to generate "dangerous" content and more family-friendly.

But good lord, in a matter of months we've gone from this:

To this:

Prompt: Aerial view of Santorini during the blue hour, showcasing the stunning architecture of white Cycladic buildings with blue domes. The caldera views are breathtaking, and the lighting creates a beautiful, serene atmosphere.

These AI systems, friends, are toddlers. Unbelievable as they already are, they're honing their talents at a rate unlike we've ever seen in any other area of technology. It's all happening so fast that I genuinely have no idea if we're close to the top of a curve here, with a long, slow, slog ahead to clean up all the little weird edge bits, or if we're just at the start of the AI acceleration curve.

Maybe, as some people think, we're close to the point where it's as good as it'll ever get, and as more and more of the internet becomes AI-generated, it'll eat so much of its own faeces that the quality will start to degrade and plummet.

All I know is this: we all better learn to love change, because humanity has never seen a transformation like the AI revolution before. The wheel, the light bulb, the combustion engine, the aeroplane, the computer, the internet ... None of them ever accelerated or proliferated like this, and none of them threatened our position at the top of the food chain. We're in uncharted territory.

Source: OpenAI

"AI-generated video was a complete joke less than a year ago; look what it can do now."
I'd be inclined (if I didn't already realise it) to think that the same is probably true of progress with autonomous cars despite what the click-bait-obsessed/hysterical mass-media is saying.
Captain Danger
I like the cat picture
Looks like the Holodeck from Star Trek will soon be doable.
White Rabbit
As with any AI system, its products reflect the "values" of the sources of its learning. Note that there is nothing in the prompts that suggest that the Tokyo scene be at night, nor Lagos in the evening. Perhaps 'cyberpunk' conjures dark themes, but the "Blue Hour" looks more like midnight!
These are not insurmountable obstacles at the technical level, but the "Why?" question ought to receive more scrutiny. An AI's standards for what is a good/appropriate/suitable/... representation of Tokyo, or anything else, are informed by its sources - even if aspects of those standards have been explicit. There's no reason to believe that the Sora model was intentionally given a predilection for dark images, but the frequency with which they appear provides an insight into a frequently ignored "elephant in the room".
White Rabbit
Oops! Omitted a word.
Intended to say-
even if aspects of those standards have NOT been explicit.
Brian M
Feel sorry for cats, they are going to lose a lot of YouTube time to AI generated videos!

On the plus side could end up creating some incredible worlds/environments for games at a cost that would have been too expensive previously, even for the big game companies let alone minor/niche game producers.

It might even make 3d worlds as envisioned by Meta more attractive.

Hopefully John and Jane public will realize not everything they see or hear is real or true, not sure if it ever has been!
Kudos to Loz Blain for including the prompts used to generate these images. None of the other articles I've seen on OpenAI Sora have done so.
I'm led to believe that cooling AI video processors consumes an enormous amount of fresh water. This implies a temporary bottleneck to endless hi-rez AI video. I expect Nvidia or someone will eventually solve the cooling problem, but it may take a few years. Frankly, the task of driving a car seems far simpler than composing frame after frame of photorealistic video, especially if you want it in real time.
AI photos took about a year to go from producing silly results to being almost photorealistic. The next year AI video made that transition. Initially Sora will be kind of expensive to run and over the next year it will get cheaper as costs come down and other big AI companies achieve this level of detail. The pace of progress in this space has been fascinating to watch.
The end of our reality (or at least our true memories) is approaching. In a few years the internet will be full of fake videos and photos fabricated by AI and year by year we have less chance to recognize if they are real or fake. After some decades children would not be able to recover any credible information about their ancestors life because all they would see is an incredible amount of AI generated media that have actually no connection to the real life.