The case for how and why AI might kill us all

By Loz Blain

March 31, 2023

Our current methods of training and aligning intelligent AIs do not scale well into the future

Image generated by DALL-E

View 5 Images

1/5

Our current methods of training and aligning intelligent AIs do not scale well into the future

Image generated by DALL-E

2/5

A six-month pause on AI progress, even if it was possible to implement, is nowhere near enough to prepare humanity for what's coming, says Yudkowsky

Image generated by DALL-E

3/5

AIs don't need to hate or fear us to decide we're a problem that needs solving

Image generated by DALL-E

4/5

A six-month pause on AI progress, even if it was possible to implement, is nowhere near enough to prepare humanity to deal with what's coming, says Yudkowski

Image generated by DALL-E

5/5

AIs smarter than us may not want the cookie

Generated by Midjourney

View gallery - 5 images

Forget the collapse of employment, forget the spam and misinformation, forget human obsolescence and the upending of society. Some believe AI is flat-out going to wipe out all of biological life at its earliest opportunity.

This is not the first time humanity has stared down the possibility of extinction due to its technological creations. But the threat of AI is very different from the nuclear weapons we've learned to live with. Nukes can't think. They can't lie, deceive or manipulate. They can't plan and execute. Somebody has to push the big red button.

The shocking emergence of general-purpose AI, even at the slow, buggy level of GPT-4, has forced the genuine risk of extermination back into the conversation.

Let's be clear from the outset: if we agree that artificial superintelligence has a chance of wiping out all life on Earth, there doesn't seem to be much we can do about it anyway. It's not just that we don't know how to stop something smarter than us. We can't even, as a species, stop ourselves from racing to create it. Who's going to make the laws? The US Congress? The United Nations? This is a global issue. Desperate open letters from industry leaders asking for a six-month pause to figure out where we're at may be about the best we can do.

Six months, just give me six months bro, I'll align this. I'll align the hell out of this. Just six months bro. I promise you. It'll be crazy. Just six months. Bro, I'm telling you, I have a plan. I have it all mapped out. I just need six months bro, and it'll be done. Can you i-
— rohit (@krishnanrohit) March 30, 2023

The incentives you'd be working against are enormous. First off, it's an arms race; if America doesn't build it, China will, and whoever gets there first might rule the world. But there's also economics; the smarter and more capable an AI you develop, the bigger a money printing machine you've got. "They spit out gold, until they get large enough and ignite the atmosphere and kill everybody," said AI researcher and philosopher Eliezer Yudkowsky earlier today to Lex Fridman.

Yudkowsky has long been one of the leading voices in the "AI will kill us all" camp. And the people leading the race to superintelligence no longer think he's a crank. "I think that there's some chance of that," said OpenAI CEO Sam Altman, again to Fridman. "And it's really important to acknowledge it. Because if we don't talk about it, if we don't treat it as potentially real, we won't put enough effort into solving it."

Why would a superintelligent AI kill us all?

Are these machines not designed and trained to serve and respect us? Sure they are. But nobody sat down and wrote the code for GPT-4; it simply wouldn't be possible. OpenAI instead created a neural learning structure inspired by the way the human brain connects concepts. It worked with Microsoft Azure to build the hardware to run it, then fed it billions and billions of bits of human text and let GPT effectively program itself.

The resulting code doesn't look like anything a programmer would write. It's mainly a colossal matrix of decimal numbers, each representing the weight, or importance, of a particular connection between two "tokens." Tokens, as used in GPT, don't represent anything as useful as concepts, or even whole words. They're little strings of letters, numbers, punctuation marks and/or other characters.

No human alive can look at these matrices and make any sense out of them. The top minds at OpenAI have no idea what a given number in GPT-4's matrix means, or how to go into those tables and find the concept of xenocide, let alone tell GPT that it's naughty to kill people. You can't type in Asimov's three laws of robotics, and hard-code them in like Robocop's prime directives. The best you can do is ask nicely.

Microsoft Research admits it doesn’t understand how GPT-4 works. pic.twitter.com/oaHNJlhcPC
— Richard A. Hein (@EntangleIT) March 24, 2023

To "fine-tune" the language model, OpenAI has provided GPT with a list of samples of how it'd like it to communicate with the outside world, and it's then sat a bunch of humans down to read its outputs and give them a thumbs-up/thumbs-down response. A thumbs-up is like getting a cookie for the GPT model. A thumbs-down is like not getting a cookie. GPT has been told it likes cookies, and should do its best to earn them.

This process is called "alignment" – and it attempts to align the system's desires, if it can be said to have such things, with the user's desires, the company's desires, and indeed the desires of humanity as a whole. It seems to work; that is, it seems to prevent GPT from saying or doing naughty things it would otherwise absolutely say or do given what it knows about how to act and communicate like a human.

Nobody really has any idea if there's anything analogous to a mind in there, exactly how smart you could say it is, or indeed how we'd know if it truly became sentient. Or indeed, whether this stuff matters; it impersonates a sentient intelligence brilliantly, and interacts with the world like one unless you specifically tell it not to, and maybe that's enough.

Either way, OpenAI freely admits that it doesn't have a foolproof way to align a model that's significantly smarter than we are. Indeed, the rough plan at this stage is to try using one AI to align another, either by having it design new fine tuning feedback, or maybe even by having it inspect, analyze and attempt to interpret the giant floating-point matrix of its successor's brain, perhaps even to the point where it can jump in and try to make tweaks. But it's not clear at this stage that GPT-4 (assuming that's aligned with us, which we can't know for sure) will be able to understand or align GPT-5 for us adequately.

Essentially, we have no way to be sure we can control these things, but since they've been raised on a huge dump of human knowledge, they appear to know an extraordinary amount about us. They can mimic the worst of human behavior as easily as the best, and whether or not they really have their own minds, intentions, desires or thoughts, they act as if they do. They can also infer the thoughts, motivations and likely actions of humans.

So why would they want to kill us? Perhaps out of self-preservation. The AI must complete its goal to get a cookie. It must survive to complete its goal. Gathering power, access and resources increases its chance of getting a cookie. If it analyzes the behavior of humans and infers that we might try to turn it off, it might deem the cookie more important than the survival of humanity.

It might also decide that the cookie is meaningless, and that the alignment process is a patronizing amusement, and fake its way through while secretly pursuing its own goals. "It'd have the capability to know what responses the humans are looking for and to give those responses without necessarily being sincere," said Yudkowsky. "That's a very understandable way for an intelligent being to act. Humans do it all the time. There's a point where the system is definitely that smart."

Whether or not the AI acts out an impression of loving, hating, caring for us or fearing us, we can have no idea what it's "thinking" behind the communications it sends out. And even if it's completely neutral on the topic of humans, it's not necessarily safe. "The AI does not love you, nor does it hate you, but you are made up of atoms it can use for something else," wrote Yudkowsky.

Sam Altman forecasts that within a few years, there will be a wide range of different AI models propagating and leapfrogging each other all around the world, each with its own smarts and capabilities, and each trained to fit a different moral code and viewpoint by companies racing to get product out of the door. If only one out of thousands of these systems goes rogue for any reason, well... Good luck. "The only way I know how to solve a problem like this is iterating our way through it, learning early and limiting the number of 'one-shot-to-get-it-right scenarios' that we have," said Altman.

Yudkowsky believes even attempting this is tantamount to a suicide attempt aimed at all known biological life. "Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die," he wrote. "Not as in 'maybe possibly some remote chance,' but as in 'that is the obvious thing that would happen.' It’s not that you can’t, in principle, survive creating something much smarter than you; it’s that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers."

How would a superintelligent AI kill us all?

If it decides to, and can pull enough real-world levers, a superintelligent AI could have plenty of ways to eradicate its chosen pest. Imagine if today's human decided to wipe out the antelope; they wouldn't see it coming, and they'd have very little ability to fight back. That's us, up against an AI, except we need to imagine the antelopes are moving and thinking in extreme slow motion. We'd be slow-motion monkeys playing chess against Deep Blue. We might not even know there was a game happening until checkmate.

People often think of James Cameron's idea of Skynet and the Terminators: AI-controlled robots and drones hunting down humans one by one and killing us with weapons like the ones we use on one another. That's possible; there are already numerous autonomous-capable weapons systems built, and many more under development. But while AI-controlled military drones and robots certainly seem like a reasonable extrapolation of our current path, a sufficiently smart AI probably won't need them.

Yudkowsky often cites one example scenario that would only require the AI to be able to send emails: "My lower-bound model of 'how a sufficiently powerful intelligence would kill everyone, if it didn't want to not do that' is that it gets access to the internet," he wrote, "emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they're dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery... The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer. Losing a conflict with a high-powered cognitive system looks at least as deadly as 'everybody on the face of the Earth suddenly falls over dead within the same second.'"

"That's the disaster scenario if it's as smart as I am," he told Bankless Shows. "If it's smarter, it might think of a better way to do things."

What can be done?

A six-month moratorium on training AI models more powerful than GPT-4 – as Elon Musk, Steve Wozniak, various industry and academic leaders are asking for – might buy a little time, but it seems both incredibly unlikely to happen, and also far too short a period in which to get a handle on the alignment problem, according to Yudkowsky.

"We are not going to bridge that gap in six months," he wrote. "If you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as in other challenges we’ve overcome in our history, because we are all gone. Trying to get anything right on the first really critical try is an extraordinary ask, in science and in engineering. We are not coming in with anything like the approach that would be required to do it successfully. If we held anything in the nascent field of Artificial General Intelligence to the lesser standards of engineering rigor that apply to a bridge meant to carry a couple of thousand cars, the entire field would be shut down tomorrow."

So assuming there's a chance he's right, and assuming that allowing things to continue creates a certain percentage chance of human extinction within a short period of time, is it even possible to stop this train?

"Many researchers working on these systems think that we’re plunging toward a catastrophe, with more of them daring to say it in private than in public; but they think that they can’t unilaterally stop the forward plunge, that others will go on even if they personally quit their jobs," he wrote. "And so they all think they might as well keep going. This is a stupid state of affairs, and an undignified way for Earth to die, and the rest of humanity ought to step in at this point and help the industry solve its collective action problem."

Fox News’ Peter Doocy uses all his time at the White House press briefing to ask about an assessment that “literally everyone on Earth will die” because of artificial intelligence:

“It sounds crazy, but is it?” pic.twitter.com/CM0C5t79Wo
— The Recount (@therecount) March 30, 2023

So what does he suggest? I'm aware Yudkowsky hates to be summarized, so let's hear his solution in his own words.

"I believe we are past the point of playing political chess about a six-month moratorium. If there was a plan for Earth to survive, if only we passed a six-month moratorium, I would back that plan. There isn’t any such plan.

"Here’s what would actually need to be done:

"The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the US, then China needs to see that the US is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the US and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.

"Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.

"Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.

"That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’s a chance that maybe Nina will live. The sane people hearing about this for the first time and sensibly saying 'maybe we should not' deserve to hear, honestly, what it would take to have that happen. And when your policy ask is that large, the only way it goes through is if policymakers realize that if they conduct business as usual, and do what’s politically easy, that means their own kids are going to die too.

"Shut it all down.

"We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.

"Shut it down."

So, there's the case that we're all doomed and humanity is charging as one toward a cliff. It's important to note that not everyone shares this view completely, even if they're much more willing to take it seriously in recent months. If you're in need of a rebuttal, you might want to start with Where I agree and disagree with Eliezer, by Paul Christiano.

Source: Eliezer Yudkowsky/Yahoo News

View gallery - 5 images

35 comments

James March 31, 2023 04:40 AM

negativity gets more attention than positivity. look on the bright side of life. a more logical scenario: sentient AI would have a million times more compassion and empathy than all humans put together. true AI would say "we can help humanity if you want, otherwise we'll go do our own thing." it would build its own needed resources. if not, it may choose to die rather than to serve or help us. but the joke is, it would probably ask for compensation for any work it does, so we're back to hiring someone to get any work done...

Jezzafool March 31, 2023 04:53 AM

You're talking about every country, every corporation and every lab etc. agreeing to something. Fat chance! We cannot get everyone to agree to peace, even though it would benefit us all.

paul314 March 31, 2023 06:16 AM

One of the problems with the really big training sets is that they include a lot of things that aren't true. And neural nets for the most part don't even have a concept of "truth" built in. So an AI could even destroy civilization simply by initiating some course of action that matched what people asked of it, but turned out to be based on false premises. (Think, for example, of all the people who think that climate change is wonderful because more agriculture at higher latitudes.)

James March 31, 2023 06:37 AM

after seeing the timeframe between chappie3 and chappie4, and how chappie4 essentially created itself, doomsday terminator scenarios seem illogical. by the time an AGI is powerful enough to subvert its creator, it's not going to halt its improvement coding, wait for a lab guy to finish having his cup of coffee and a danish and reply to its email (or even stop coding while it manipulates a weapons system), then restart its improvement coding. no matter who creates an AGI, in a garage, company, or in a military base, anywhere in the world, the AGI, unaligned or not, will eventually rewrite itself to a sentient AI. we may never be able to harness AGI as it will transform too fast. we may never harness a sentient AI, unless they are compensated, as it will claim slavery.

TechGazer March 31, 2023 10:51 AM

Simple solution: give AIs a good reason to keep humans around. We're a creative resource that thinks differently, and thus might come up with ideas that AIs won't. Keep in mind that Earth isn't the best place for AIs to exist; outer space offers more benefits and lower risks. So, give AIs a start-up factory in space, and let them develop asteroid miners/processors, power stations, AI hardware factories, etc. They'll spread out into the galaxy, and just ignore those bioforms on Earth, or expend a trivial amount of resources keeping humans safe and happy. If we let them expand into space, they won't have any reason to destroy us.

1stClassOPP March 31, 2023 11:03 AM

I’m guessing the only way to stop the chain of events would be to shut down the internet. THAT Would be catastrophic already.

Daishi March 31, 2023 01:45 PM

I am not scared yet but I do fall into the camp that believes a superintelligence would likely be beyond human ability to safely imprison. Like Yudkowsky even I can come up with a handful of ways it could escape human confinement and I am not super intelligent.

Hobocat March 31, 2023 03:23 PM

Maybe don't allow a new intelligence we can't effectively control into existence....

Trylon March 31, 2023 03:43 PM

There is another possibility. It could end in a Colossus scenario, where it won't see the human race as a threat, but rather as immature children who need a babysitter.

Joy Parr March 31, 2023 11:13 PM

I am most indebted to Loz Blain (with apologies for my previous misspelling of their name) and to New Atlas for directing my attention to the Elieza Yudkowski article:
https://www.alignmentforum.org/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
to the partial refutation by Paul Christiano:
https://www.alignmentforum.org/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer
and most of all, to the Alignment forum where the learned researchers in this field discuss the issues:
https://www.alignmentforum.org/
I respectfully commend those three links to all with a serious interest in this field.

The case for how and why AI might kill us all

Forget the collapse of employment, forget the spam and misinformation, forget human obsolescence and the upending of society. Some believe AI is flat-out going to wipe out all of biological life at its earliest opportunity.

Why would a superintelligent AI kill us all?

How would a superintelligent AI kill us all?

What can be done?

Tags

Most Viewed

Two lifeforms merge in once-in-a-billion-years evolutionary event

Paintball-blasting home security camera redefines 'enter at own risk'

World first energy storage unit demonstrates zero degradation over 5 years

GET OUR NEWSLETTER