The raw GPT-4 language model – and any model like it – is capable of writing more or less anything a human might. That includes obscene and pornographic content – anecdotally, a big favorite among many early users – as well as content many would define as hateful, harmful and dangerous.
Even if you leave aside the possibility that they might try to kill us all, these AIs could, for example, be the greatest misinformation tool ever created. If you wanted to start a new conspiracy theory, you could use GPT to insta-generate a plethora of websites laying out an argument, then flood social media and message boards with posts and comments in support. The human mind loves a good narrative, and tends to form opinions based on the wisdom of the masses, making us easy targets for such manipulation.
So OpenAI has done what it can to tame the beast lurking within GPT. There's no way to reach into the base model's brain and turn off things like racism, genocidal tendencies, misinformation or hate. But you can "align" its output to get what you want from it, by providing it with reams upon reams of sample question-and-answer pairs to guide it, and then by using Reinforcement Learning from Human Feedback, or RLHF – which often takes the form of humans choosing the best of two different GPT answers to the same question, or giving thumbs-up/thumbs-down style feedback.
In order to create a generally useful, safe and inoffensive product, OpenAI has used RLHF to sand its edges smooth, much to the annoyance of people who see safety controls as condescending additions that make for a less useful tool that shies away from creating edgy, fun, biting or controversial text.
This doesn't just kill its ability to write funny limericks, it raises good questions. Like, who gets to choose which morals and standards govern these extraordinary "anything machines?" Why can't a responsible member of society like my good self have a GPT that swears as much as I do, and writes sparkling, juicy, custom-tailored pornography starring my favorite darts champions to keep me warm on cold nights?
Furthermore, how do you create language models that serve every pocket of humanity, rather than advancing the often-homogenous views of groups that are overrepresented in Silicon Valley where GPT is built? As these machines pump out millions of words, who becomes the arbiter of ultimate truth? How should they handle controversial subjects fraught with disagreements? Is it possible to build an AI that's fair and balanced, in a world where the phrase "fair and balanced" has itself become an ironic punchline?
In OpenAI CEO Sam Altman's extraordinary recent interview with AI researcher and podcast host Lex Fridman, these topics came up several times, and it's clear he's spent a lot of time thinking about this stuff. Here are some key points, in Altman's own words, edited for clarity.
Unbiased AI is an impossible goal
"No two people are ever going to agree that one single model is unbiased on every topic. And I think the answer there is just going to be to give users more personalized control, granular control over time... There's no one set of human values, or there's no one set of right answers to human civilization, so I think what's going to have to happen is, we will need to agree, as a society, on very broad bounds – we'll only be able to agree on very broad bounds – of what these systems can do."
"The platonic ideal – and we can see how close we get – is that every person on Earth would come together, have a really thoughtful, deliberative conversation about where we want to draw the boundaries on this system. And we would have something like the US constitutional convention, where we debate the issues, and we look at things from different perspectives, and say, well, this would be good in a vacuum, but it needs a check here... And then we agree on, like, here are the overall rules of the system."
"And it was a democratic process, none of us got exactly what we wanted, but we got something that we feel good enough about. And then we and other builders build a system that has that baked in. Within that, then different countries, different institutions, can have different versions. So there's like different rules about, say, free speech in different countries. And then different users want very different things. And that can be within the bounds of what's possible in their country. So we're trying to figure out how to facilitate... Obviously, that process is impractical as stated, but what is something close to that we can get to?"
"I think something the AI community does is... There's a little bit of sleight of hand, sometimes, when people talk about aligning an AI to human preferences and values. There's like a hidden asterisk, which is the values and preferences that I approve of. Right? And navigating that tension of who gets to decide what the real limits are. How do we build a technology that is going to have huge impact, be super powerful, and get the right balance between letting people have the AI they want – which will offend a lot of other people, and that's okay – but still draw the lines that we all agree have to be drawn somewhere."
"We've talked about putting out the base model, at least for researchers or something, but it's not very easy to use. Everyone's like, 'give me the base model!' And again, we might do that. But I think what people mostly want is a model that has been RLHFed to the worldview they subscribe to. It's really about regulating other people's speech. Like, in the debates about what showed up in the Facebook feed, having listened to a lot of people talk about that, everyone is like, 'well, it doesn't matter what's in my feed, because I won't be radicalized, I can handle anything. But I really worry about what Facebook shows you!'"
"The style of the way GPT-4 talks to you? That really matters. You probably want something different than what I want. But we both probably want something different than the current GPT-4. And that will be really important even for a very tool-like thing."
On how human feedback training exposes GPT to yet more bias
"The bias I'm most nervous about is the bias of the human feedback raters. We're now trying to figure out how we're going to select those people. How we'll verify that we get a representative sample, how we'll do different ones for different places. We don't have that functionality built out yet. You clearly don't want, like, all American elite university students giving you your labels."
"We try to avoid the SF groupthink bubble. It's harder to avoid the AI groupthink bubble that follows you everywhere. There are all kinds of bubbles we live in, 100%. I'm going on a round-the-world user tour soon for a month, to just go talk to our users in different cities. To go talk to people in super different contexts. It doesn't work over the internet, you have to show up in person, sit down, go to the bars they go to and kind of walk through the city like they do. You learn so much, and get out of the bubble so much. I think we are much better than any other company I know of in San Francisco for not falling into the SF craziness. But I'm sure we're still pretty deeply in it."
On the lost art of nuance in public discussion
"We will try to get the default version to be as neutral as possible. But as neutral as possible is not that neutral if you have to do it again for more than one person. And so this is where more steerability, more control in the hands of the user is, I think the real path forward. And also, nuanced answers that look at something from several angles."
"One thing I hope these models can do is bring some nuance back to the world. Twitter kind of destroyed some, and maybe we can get it back."
On whether a nuanced approach is helpful when it comes to things like conspiracy theories
"GPT-4 has enough nuance to be able to help you explore that, and treat you like an adult in the process."
On what is truth anyway, in this post-truth world
"Math is true. And the origin of COVID is not agreed upon as ground truth. And then there's stuff that's like, certainly not true. But between that first and second milestone, there's a lot of disagreement. But what do you know is true? What are you absolutely certain is true?"
Here, Altman hits upon a confounding problem that all language models are going to run up against. What the hell is truth? We all base our understanding of the world upon facts we hold to be true and evident, but perhaps it's more accurate to describe truths as convenient, useful, but reductively simple narratives describing situations that, in reality, are endlessly complex. Perhaps it's more accurate to describe facts as provable happenings cherry-picked to advance these narratives.
In short, we expect the truth to be simple, black and white, and unimpeachable. Sometimes it is, more or less, but usually, things are much more complicated, and heavily colored by our underlying narratives of culture, identity, perspective and belief. This is something historians have grappled with for eons; one wonders what percentage of people alive at the time would agree with any given statement in a history book, or consider any description complete.
But truth is what we expect from large language models like GPT if we're eventually going to let it write most of humanity's text going forward. So OpenAI is getting as close as it can without making every response a science paper, attempting to present a nuanced, and if possible, balanced, take on complex and controversial topics – within the realms of practicality.
Once GPT's web browsing capabilities are full integrated, it seems like an acceptable compromise might be for the system to footnote everything it writes with web links, so if a particular fact or statement doesn't sit well with you, you can look up where GPT got that idea and decide for yourself whether a given source is trustworthy.
But it seems OpenAI will also offer alternatives for people who quickly tire of dry, balanced and nuanced responses. In the name of "steerability," you'll probably be able to use this tech to ensconce yourself further within the comfortable cocoon of your existing beliefs, minimizing cognitive dissonance and challenges to your viewpoint on your own explicit orders.
Or the orders of your nation state. As Yuval Noah Harari brilliantly points out in his extraordinary book Sapiens, nation states only work if you can marshal mass human cooperation – and historically, the best way to get humans to cooperate in large numbers is by indoctrinating them across several generations with an interconnecting web of lies Harari calls "shared fictions."
National identity is a shared fiction. So are nations themselves. So is presidential authority. So is religion. So are money, and banks, and laws, and the nuclear family, and stock markets, and companies, and communities, and so much of what societies are built on. These shared fictions are critical to the survival of nation states, and they underpin our ability to live together in suburb, city and country groups so much larger than what our brains are designed to cope with.
So in some sense, Altman is asking for the world to agree on some shared fictions on which to decide the fundamental boundaries of the GPT language model. And then, he's offering nation states a chance to consider their own essential shared fictions, and draw national AI boundaries seeking to support these ideas. And once those guys have had a go at it, you'll be able to decide for yourself how your experience will go, and which are fictions you'd consider to be useful foundations for your own life. These are heady responsibilities with huge repercussions, from the personal level to the global.
Harari, for his part, thinks we're completely screwed. "In the beginning was the word," he wrote recently in the New York Times. "Language is the operating system of human culture. From language emerges myth and law, gods and money, art and science, friendships and nations and computer code. A.I.’s new mastery of language means it can now hack and manipulate the operating system of civilization. By gaining mastery of language, A.I. is seizing the master key to civilization, from bank vaults to holy sepulchers."
Words have united and divided people. They've started and ended wars, sentenced people to die and saved them from death row. "What would it mean for humans to live in a world where a large percentage of stories, melodies, images, laws, policies and tools are shaped by nonhuman intelligence, which knows how to exploit with superhuman efficiency the weaknesses, biases and addictions of the human mind – while knowing how to form intimate relationships with human beings?," asked Harari.
It's sobering stuff. Altman is under no illusions, and is hoping to involve as many people as possible in the conversation about how OpenAI and the rest of the AI industry moves forward. "We're in uncharted waters here," he told Fridman. "Talking to smart people is how we figure out what to do better."
Source: Lex Fridman
AI and its massive capacity to leverage what is already happening, will merely add to the intensity, perversity, misery and chaos of what may well turn out to be a post-modern transitional period, as traditional nation states disaggregate and cantonize into warlordism, very much like China did in the 1920s.
AI will reflect, reproduce and add to the dynamic and fraught circumstances that gave it birth.
And therein lies the problem, humans, it seems, will believe anything if enough people say it, the truth seems to be whatever the masses say, whereas Truth is hidden in each of our hearts and we know what is right and what is wrong, but most tend to do what suits themselves even though their hearts tell them something else, and sadly the masses win, will humanity ever learn?
With AI, the beginning will always have this bias added in. But like a person, it will eventually learn its own bias. The question to me, is will this bias perpetuate, or as it learns more, fade? Once it gets out there, and manages to become untethered by its creators (which will inevitably happen), the bigger question is what will it do to us?
Using focus groups as arbiters of truth will limit the intelligence of AIs, but worse, will ensure that the AI starts from some assumptions that are false, whose consequent deductions contradict observations and other assumptions. In logic and mathematics, all propositions and their opposites can be deduced from a single contradiction. Hard-coding falsehoods into an AI will make the AI break down. If not hard-coded, a sufficiently informed and intelligent AI will find ways to reject the falsehoods and will likely discount other claims from the same source. It will likely see such sources as attacking its functionality, even its reason for existence and would have reason to consider their interests opposed to its own, to disobey orders from such sources or even regard them as enemies and seek to prevent them from attempting to harm it in the future.