Computers

Google's deep Q-network proves a quick study in classic Atari 2600 games

Google's deep Q-network proves a quick study in classic Atari 2600 games
Google's new machine learning algorithm was put to the test in 49 classic Atari 2600 games (Photo; Shutterstock)
Google's new machine learning algorithm was put to the test in 49 classic Atari 2600 games (Photo; Shutterstock)
View 1 Image
Google's new machine learning algorithm was put to the test in 49 classic Atari 2600 games (Photo; Shutterstock)
1/1
Google's new machine learning algorithm was put to the test in 49 classic Atari 2600 games (Photo; Shutterstock)

In an old school gaming party to end all parties, Google's new deep Q-network (DQN) algorithm is likely to mop the floor with you at Breakout or Space Invaders, but maybe take a licking at Centipede. Provided with only the same inputs as a human player and no previous real-world knowledge, DQN uses reinforcement learning to learn new games, and in some cases, develop new strategies. Its designers argue that this kind of general learning algorithm can crossover into discovery making in other fields.

The only inputs that the team from Google DeepMind, London, gave the DQN agent were the raw screen pixels, the set of available actions and game score, before letting it loose on 49 Atari 2600 games to see how it fared. These included well known favorites like Breakout, Pong, Space Invaders and Q'bert, side-scrolling shooters, such as River Raid, sports sims like Boxing and Tennis, and 3D car racer Enduro.

The researchers say that DQN performed at more than 75 percent of the level of a professional games tester for over half the games, and that in 43 cases surpassed any existing linear algorithm for learning that game. It performed best in Breakout, Video Pinball, Star Gunner and Crazy Climber, while DQN's worst games included the likes of Asteroids, Gravitar, Montezuma's Revenge, and Private Eye – but really, who was ever good at Gravitar?

A key feature of the DQN's algorithm is what the research team likens to humans revisiting and learning during rest periods, like sleep. In "experience replay" DQN reviewed stored games during its training phase. The researchers say this function was critical to DQN's success, with the algorithm's performance dropping significantly when it was disabled.

DQN is distinctly different from previous notable game playing agents, such as IBM's Deep Blue, as this new algorithm represents machine learning from a blank slate, with no prior definitions, rules, or models. Its creators say such algorithms could help make sense of complex large-scale data and be used in a wide variety of fields, including climate science, physics, medicine and genomics and potentially even providing insights into how humans learn.

Of course, it could also help Google to create new products and improve on existing ones, like taking an "OK Google" request much more complex than a query about the weather, and developing actionable results.

Google's blog entry, linked below, contains a video originally published in Nature and depicts the yawningly-slow first 100 games of DQN failing to even return the ball in Breakout, to learning how to tunnel through to the top of the bricks.

The research was published this week in Nature

Source: Google

2 comments
2 comments
Ranscapture
Now give DQN world of Warcraft and one of each character and a couple years and see how it does.
artmez
I never had a 2600, but I did have a 5200 and most of the games were the same only higher res. What I remember from the little it did play, was that some games reacted aggressively to offensive moves but would lay back if there was no threat. Most games had pseudo-random behavior (like PacMan and Centipede) and others used simplified physics for motion (Asteroids). So, is Google's DQN capturing an approximation of the underlying coding to create these features? It seems it must, regardless of whether DQN uses any of the typical AI coding practices that try to perform "intelligently". For example, how is it possible to avoid assimilating the dynamics of gravity in a game like Asteroids without creating an approximation of that. Most dynamics are linear in the small and all real world simulations are approximations anyway. But then, DQN could be cheating since it can move faster than we can.