Google Genie and the AIs set to revolutionize gaming

February 26, 2024

Future game designers will be like godlike directors, calling visions into existence using interactive AI experience generation

Image generated by DALL-E

View 2 Images

1/2

Future game designers will be like godlike directors, calling visions into existence using interactive AI experience generation

Image generated by DALL-E

2/2

Deepmind Genie turns images into playable games in a single step. And it ain't nothing compared to what's coming

Google DeepMind

Google DeepMind's Genie turns images into playable video games in one step – but it's just the latest in a rapidly converging list of technologies that point to a bizarre sci-fi future of interactive entertainment, designed and run by real-time AIs.

DeepMind's Genie AI is a relatively small 11 billion-parameter model, trained on more than 200,000 hours of video of people playing 2D platformer-style games, without human supervision. These are fairly formulaic, so perhaps it's no surprise that Genie has figured out the mechanics and action physics involved – even though the video streams contained no information about when a button or control was pressed.

As a result, this model accepts a single image – be it a photo, sketch, or AI-generated picture – and turns it into a playable game, responsive to user controls. Image to rudimentary interactive environment in a single step.

I am really excited to reveal what @GoogleDeepMind's Open Endedness Team has been up to 🚀. We introduce Genie 🧞, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts. pic.twitter.com/TnQ8uv81wc
— Tim Rocktäschel (@_rockt) February 26, 2024

Don't get too hung up on the quality of the 'games' you're seeing; Genie is a research project, not a final product. It was trained on super-low-resolution videos at minuscule 160 x 90-pixel resolution and just 10 frames per second, and it generates 'games' at similarly low resolution that operate for just 16 seconds at a miserly one frame per second.

But with the basic idea now proven, every indication is that Genie will improve significantly with scale; throw in longer, higher-resolution video clips and sic a ton of compute on this system and the results will begin to leap in quality the way we're seeing them do so in literally every nook and cranny of the AI space.

So in a sense, Genie is not the real story here. The story is much broader, and it can be summed up like this: Everything you're seeing from advanced text-to-video AIs like OpenAI's jaw-dropping Sora demo from last week is starting to converge with 3D interactive worlds, AI-generated characters and GPT-style natural language models, with VR hardware advancing at pace as well.

The repercussions will be absolutely colossal, a fundamental shift in not just gaming, but entertainment overall. Let me throw some building block videos into the pot here that point to where things are heading.

Take a look at this video from 2021. It shows an AI that, two-and-a-half years ago, had watched enough Grand Theft Auto V to be able to recreate a blurry, stripped-down facsimile of the game, complete with a drivable car, in real time.

Playing a Neural Network's version of GTA V: GAN Theft Auto

Again, that was a couple of years ago, and we've all seen the berserk pace of progress here. The takeaway from this video is: AI game generation will certainly not stop at Genie's 2D platformers. It's long had the ability to do this kind of thing in 3D, and essentially it's just a matter of where the focus is pointed at a given time. Gaming is heading toward a place where everything you see, hear and do will be generated by an AI in real time.

Secondly – and this is also perhaps old news, but it's an important building block here. We've written before about AI-generated video game NPCs, whose looks, personalities, goals and knowledge you can tweak using natural language, and with whom players can converse either verbally or through text with no limits on conversation topics.

If you haven't seen this stuff in action, it's getting faster, more responsive and better all the time. Check out what Alystria AI has done using Cyberpunk 2077, Ghost of Tsushima, Red Dead: Redemption 2 and other open-world titles as a baseline, making some of the world's most iconic characters fully AI-interactive within the context of the game.

Talk to GPT-4 Powered NPCs in any Game!

In the above examples, the original character actors' voices have not been preserved, but that's frankly trivial now from a tech standpoint if contract arrangements allow it. There are apps you can download right now to clone your own voice, or anyone else's – it's a good time to start setting up code words with your older relatives, because bad-faith actors need very little of your voice to start cloning it and ringing them asking for money.

Given the hundreds of hours of high-def voice recordings that go into video game production, there's massive opportunities for game studios to train voice models. We wouldn't be surprised to see a flood of AI-enhanced re-releases of older games coming through in which players can hold boundless natural conversations with iconic NPCs as they play.

Now let's take a quick refresher on OpenAI's Sora, which as of this minute strikes us as the world's most advanced text-to-video generator – although by the time we hit publish, it may well have been eclipsed. Here's one of many more recent videos released since Sora's debut last week.

You are not ready for this.

New Sora videos just dropped and they are wild.

100% AI (minus the sound).

10 new videos: 🧵👇

1. Scuba diver discovering a futuristic shipwreckpic.twitter.com/A2Itlehvl4
— Min Choi (@minchoi) February 24, 2024

Sora isn't just generating the most staggeringly photorealistic videos we've ever seen coming out of an AI, it's capable of creating persistent characters, styles and environments. That is, scenes in which the camera might look around, then look back and objects are still there. Characters that can be kept consistent between different scenes. That sort of thing.

And it's also developing, simply by ingesting so much video from the world around it, a staggering understanding of how physics works in the real world, and how objects, surfaces and substances relate to and interact with one another. Here's Sora's attempt at creating a helmet-cam view of a Formula 1 race set in San Francisco.

Mind blow

with this new example of video generated by #Sora.

Filmmakers have nothing to worry about

"an f1 driver races through the streets of san francisco during the day, the driver's pov is captured from a helmet cam. the golden gate bridge and the cityscape can be seen… pic.twitter.com/zQZgdQjENq
— Paul Æ Blundell (@PAUL__BLUNDELL) February 25, 2024

Look closely, and it's janky as hell, with silly mistakes everywhere. But we're not talking about what's here now, we're looking at the near-future point toward which all of this stuff is converging. Sora shows us the shocking level of quality at which you can generate video given enough training, compute and processing power, and videos like the above are simply what it's capable of in 2024.

Next we can quickly pull in audio and sound effects, which we saw last week, again in a relatively early and janky form, from ElevenLabs.

We were blown away by the Sora announcement but felt it needed something...

What if you could describe a sound and generate it with AI? pic.twitter.com/HcUxQ7Wndg
— ElevenLabs (@elevenlabsio) February 18, 2024

So basically, whatever you're generating visually, another AI can take and put an audio track onto. Easy.

And of course, if you want a soundtrack, AI music generation is also moving at a shocking pace. Here's a random example I found – it's pop music, not a soundtrack, but it shows how easy it is now to throw some lyrics into a pot and generate an entire song, complete with vocals.

AI Music Newsletter #12

Suno V3 dropped.

So I put in some lyrics, made a bunch of variations, and cried a bit.

That's it, that's the newsletter. 🧵👇👇 pic.twitter.com/EAJmdRWClc
— AmliArt (@amli_art) February 23, 2024

In the broader interactive entertainment scenario we're building, you can take it as read: soundtracks can absolutely be generated in near-real time, in a way that's responsive to action. And there's no reason why NPC characters won't soon be composing songs about what you've been up to in the game and singing them to you, again in a way that's totally interactive.

So let's look at the building blocks we've got here:

AI-generated playable games with responsive controls
Real-time neural generation of interactive game worlds
Language-based generation and tuning of fully-interactive NPC characters
Text-to-video generation of super high quality visuals, in just about any style, with persistent styles, characters and environments
Video-and-text-to-audio foley and sound effects generation
AI soundtrack generation

Throw those together with rapidly improving language models like GPT, with their ability to create and respond to narratives while also driving a range of other AI technologies, and you get a very different picture of what video-game design will be in the not-too-distant future.

This video was generated using text-to-video AI model Sora by OpenAI.

Prompt: Epic gaming pic.twitter.com/pCql7zIl8n
— kokaha＠休暇中 (@kokahashinda) February 26, 2024

You'll be able to start with nothing, or with a sketch or two, and have AI generate an interactive world, which even to begin with will probably be extraordinarily beautiful.

Then, like a digital God, you'll be able to say, "Let there be tree," and there will be tree, and if it is not good, you'll be able to request a different tree. You'll be able to create your characters just by painting a verbal picture: "I want a talking donkey with a Mexican accent and a chip on his shoulder. No, more sassy. Let's give him an air of danger and a penchant for epic storytelling about his shady past as a merchant sailor. Lose the sombrero, let's go with a cowboy-style handkerchief. His hidden motive in this story is that he's looking for his sister, who he believes may be held by ninjas in the castle on top of that hill."

The term 'gaming' hardly covers what we're talking about here; you'll be able to verbally design an experience, then play through and interact with it, adjusting things like a director instead of like a programmer. Given enough computing resources, you'll be able to generate entire games this way; shareable single- or multi-player expressions of your own individual imagination that others can enjoy and potentially iterate on with their own touches.

Place this in a VR context with real-time neural generation capabilities, and a GPT-X level ability to manage the overall experience and generate narratives and ... well, you've got the Holodeck from Star Trek, or for that matter, "the simulation." Entire interactive worlds, populated with interactive characters of yoru choosing, where anything you desire can happen in response to real-time requests. Who's turning on Netflix or the PS7 when an interactive version of whatever you can think of is available?

"Cinematic trailer for a group of adventurous puppies exploring ruins in the sky"

Video generated by #Sora pic.twitter.com/FNZmvstONj
— Tim Brooks (@_tim_brooks) February 21, 2024

One shudders to think what happens when this stuff is controlled by corporations or advertisers, who will have an unprecedented ability to steer your experiences in ways that benefit them.

This won't all happen overnight. Clearly, hardware is probably the main limiting factor at this point. There are only so many GPUs in the world to train and run this stuff on, although new chips are being invented and put into production specifically to drive the AI industry's push toward artificial general intelligence and beyond.

So that need is being addressed as fast as human commerce is capable of doing it – but we're probably not within 12 months of seeing Sora-quality video creation in real time, so there's a little room to breathe there.

Major leaps in hardware, connectivity and energy storage would be needed to run this stuff through a compact VR headset, as well as further work around haptic feedback mechanisms that'll embody players even more within these experiences.

Taking things toward the limits of where we can see things going, maybe the best way to get these extraordinary visions, sounds and sensations into our brains is directly through wires, skipping our fallible sense organs altogether.

Check out our latest video to learn more about our PRIME Study! 🧠📱 pic.twitter.com/7zTMFzdZsF
— Neuralink (@neuralink) November 22, 2023

Brain-computer interface technology is already further advanced than many people realize, and while most of it is currently targeted at medical use, Elon Musk has been clear from the beginning that the eventual point of Neuralink is to create a connection between humans and AIs. This connection will allow us to get much more information back and forth than we can accomplish through the low-bandwidth bottlenecks of keyboards, voice recognition and even language itself. The goal? Brain-to-AI communication both ways, at the speed of thought.

And we're seeing other tech coming through that's focused on monitoring and responding to humans at an even deeper level than thought: emotionally responsive technology that makes your real-time feelings another input that a system could react to in real time, pushing excitement to the peak or playing your heart strings to perfection, then knowing exactly when a moment is dying so the pace of an experience can be perfectly optimized to the user.

As for the AIs themselves ... No matter how much we at New Atlas do our best to keep up with what's happening in this space, I don't think we, or the vast majority of people have any idea how quickly these things are really advancing. Sora is a good example; we get the impression OpenAI had that one in the bag for several months before it decided to make an announcement, and it chose to drop it just to stomp all over Google's Gemini 1.5 release.

It worked; with our limited human resources and our policy only to use human writers, we had to choose which to cover, and Gemini didn't get a guernsey.

Ok, I've been waiting a while before saying it confidently, but I now understand that the Gemini-1.5 Pro 🌟had its deserved spotlight stolen last week by Sora.

After experimenting with it for a while, I believe it represents the most significant advancement in LLM capabilities… pic.twitter.com/tlesXGWoAR
— Itamar Golan 🤓 (@ItakGol) February 18, 2024

Gemini 1.5 is its own game-changer, and we couldn't even get to it. The rate of world-changing progress in AI is absolutely unprecedented, not just in our lifetime but probably in the history of humanity.

So when we see Google's Genie, embryonic and low-res as it is today, it's all part of a giant, building tsunami of disruption and convergence that's bringing science fiction into fact at a dizzying rate. I keep saying it: Buckle up, folks – these head-spinning concepts will keep coming at an accelerating rate.

This is not just DeepMind and OpenAI, it's an entire nascent industry with massive investment pouring into it, that hasn't yet begun to hit its stride. Different sides of AI are coming crashing together more and more often, and starting to converge with a range of other technologies that themselves are advancing, even if it's at a slower rate.

Every little piece of the world that these things learn to understand and replicate for our amusement is a step towards embodied intelligences in humanoids as well as other types of robots. Each is also a step toward artificial general intelligence – and very soon thereafter, artificial super-intelligence. These two concepts seemed ludicrously far off in the future just a year or two ago, but I wouldn't bet against either being announced in the next 12 months.

The world of 2030, just six years away, is becoming a complete mystery to me. I have no idea what skills I should be teaching my five- and 10-year-old kids to prepare them. Do you? Honest question; I'll be checking the comments section!

Source: DeepMind and many others.

6 comments

Daveb February 26, 2024 10:31 PM

Loz, really appreciate the way you're able to take this all in and summarize where it's all going... but you are scaring the crap out of me. Not even sci fi authors will be able to keep up. Do you think we could ask AI to write a dystopian novel about the bleak existence awaiting humanity when all creative endeavour becomes an ever tightening whirlpool of incestuous AI inbreeding, bereft of any original human inspiration?

Thony February 27, 2024 12:29 AM

As soon as there was "fluant" conversations with AIs several years ago i was at first thrilled about interactions with game's NPCs..... or mobs with more realistic behaviour... Now I also kinda fear it.

Daishi February 27, 2024 01:22 AM

I would be shocked if the next gen gaming consoles didn't support small language models running locally for the NPC dialogue use-case as well as text to speech and speech to text. This is a good article and covers just some of what is happening in this space. Before AI "takes over" there will be a decent period of time where it simply boosts productivity of the people who have a good command over using it. In the past there was a gap between people good and bad at using search engines to get what they need, the gap in productivity with people good at putting AI to work for them and people who are not will be much larger. The hard part is things are moving so fast that traditional academia is not equipped to keep up with the changes. Human labor will be reshaped but it isn't going away in my lifetime. People with a broad range of knowledge who have a passion for learning and are able to adapt with it will be OK. The quote "An AI may not replace your job but someone using one probably will" applies to a lot of things. Software has been eating the world for a while and this is an extension of that.

Paulm February 27, 2024 01:56 AM

Top article Loz. I'm not a gamer but pretty much anything you put out is worth reading. The whole AI thing is mind blowing and you nicely capture the enormity of the changes underway. Hopefully Cormac McCarthy's The Road isn't the vision that awaits

Sven Bakker February 27, 2024 02:34 AM

Great article! And i don't think you will be teaching your kids new skills. Kids tend to be sponges and they will grow up with this tech wich they will find normal and not at all jawdropping ludicrous. But won't this divide the world even more? The poor the rich, the tech-savvy and tech-ignorant. I surely hope that the wise and all knowing AI will guide us poor monkeys to an utopia for all.

JøhP February 28, 2024 01:33 AM

What you should teach your children? The same you always should. General problem solving skills and critical thinking.
My personal general advice would be, not to embark on an academic education right after middle school, but learn a trade first.
That way we get a better grip on reality and a way to pay for a higher education without getting indebted, if that is really what we want and are so minded.

Google Genie and the AIs set to revolutionize gaming

Tags

Most Viewed

Apollo laser takes down 200 drones unplugged

Laser-wielding device is like an anti-aircraft system for mosquitoes

World's largest deposit holds 99.999% of all gold on Earth

FREE NEWSLETTER