Artificially intelligent machines rising up to usurp their creators has long been a staple of science fiction, but rapid developments in AI have seen the potential for humans to be relegated to the evolutionary scrapheap become an immediate non-fiction fear for many. With this in mind, researchers at Google DeepMind have devised a "big red button" that would interrupt an AI that looks to be heading down a worrying path, while preventing it from learning to resist such an interruption.
The kill switch intended for an AI that starts "misbehaving" is proposed in a paper written by Laurent Orseau at Google DeepMind and Stuart Armstrong at the Future of Humanity Institute. It relies on the concept of "safe interruptibility," which basically means letting a human safely intercede to stop an AI in its tracks.
The AI discussed work with a process called reinforcement learning, where its behavior is shaped by rewarding its successes, so the AI reads its environment and gradually learns which actions are most likely to lead to further rewards. But like a child, sometimes it won't understand that a course of action could be harmful, either to itself, other people or the environment, and a human supervisor may need to step in and lead it back onto a safer path. This is what the researchers casually refer to as "pressing the big red button."
The example they give is a robot tasked with either sorting boxes inside a warehouse, or going outside to bring boxes in. As the latter is more important, the scientists reward the robot more for that task so it learns to favor that action. But when it rains, the robot will continue to work outside without worrying about being damaged. In this case, a human might have to press the red button, shutting the robot down and carrying it inside.
However, that human intervention changes the environment the robot is operating in, and can lead to two problems: the robot could begin to learn that the scientists want it to stay indoors, meaning it might ignore the more important task. Or, potentially worse, it could still favor the harmful action, and just view the interruption as an obstacle it should try to avoid. In resisting human intervention, the AI could even learn to disable the red button. It's a disturbing thought.
The team's solution is a kind of selective amnesia in the AI programming. Essentially, when scientists have to press the big red button, the robot will continue to operate under the assumption that it will never be interrupted again. So rather than learn that scientists will bring it back inside every time it goes outside, it will still refer back to the reward system to decide how it prioritizes tasks.
Obviously, that doesn't solve the second problem, where it will keep trying a harmful course of action if it thinks it will be rewarded, but the scientists proposed a way around that, too. When human intervention is required, the robot is made to think it decided to change the course of action by itself.
These protocols are less about preventing a robot apocalypse and more about just making sure artificially intelligent machines are learning as efficiently and safely as possible.