Landmark AI system beats poker pros in multi-player Texas hold 'em

Landmark AI system beats poker pros in multi-player Texas hold 'em
An example hand from the game showing the AI system (Pluribus) bluffing five professional players
An example hand from the game showing the AI system (Pluribus) bluffing five professional players
View 1 Image
An example hand from the game showing the AI system (Pluribus) bluffing five professional players
An example hand from the game showing the AI system (Pluribus) bluffing five professional players

A team of computer scientists from Carnegie Mellon University and the Facebook AI Research lab has created an AI system that, for the first time, has defeated several poker professionals in six-player Texas hold-em. Unlike earlier iterations of the system, the researchers will not publicly release this algorithm's code, for fear it could decimate the online poker world.

Back in early 2017 a team of Carnegie Mellon researchers demonstrated a new AI poker system called Libratus. Over a decades worth of work culminated in an impressive 20-day event in which Libratus beat four poker professionals across 120,000 hands of no limit Texas hold 'em.

Libratus was not perfect though. As well as only functioning effectively in two-player, head-to-head versions of the game, it relied on an extraordinary amount of supercomputer power to work. Libratus needed 15 million CPU core hours to just develop a blueprint strategy, and during live gameplay still relied on 1,400 CPU cores to function.

Now, in 2019, the researchers have revealed Pluribus, an extraordinary evolution of the poker playing system, which can now win multi-player poker games while using only a fraction of the processing power of its predecessor – 12,400 core hours to compute its blueprint strategy and just 28 CPU cores in live play.

Over the last few years we've seen a number of incredible milestones in AI development. Games have always been a compelling benchmark for assessing truly dynamic artificial intelligence systems, and from chess to Go we've witnessed increasingly sophisticated algorithms dominate human players. However, these games have all primarily been zero-sum, two-player challenges. Multi-player poker, on the other hand, is exponentially more complicated, relying on hidden information, bluffing, and unpredictable strategic play.

To test Pluribus, the researchers recruited a pool of poker champions to play 10,000 hands a day across a 12-day period. These were six-player games, pitting the AI against five professionals. Another series of experiments pitted a single professional against five independent copies of Pluribus. Across all experiments and games Pluribus steadily beat the human pros.

"Playing a six-player game rather than head-to-head requires fundamental changes in how the AI develops its playing strategy," says Noam Brown, one of the Carnegie Mellon researchers who recently joined the Facebook AI Research lab. "We're elated with its performance and believe some of Pluribus' playing strategies might even change the way pros play the game."

AI Poker Bluffs and Wins

Pluribus works by beginning each competition with a blueprint strategy, produced by playing multiple games against itself. But, pretty much immediately after the first round of gameplay, the system begins to shift that strategy in real-time. One of Pluribus' interesting, and successful, strategies was utilizing a method referred to as "donk betting", which is commonly avoided by human players.

"Donk betting" is when a player starts a round with a bet, immediately following a round they ended with a call. Only on rare occasions is this considered a strong strategic play, and the name itself is a reference to calling bad players donkeys, as they may often unknowingly make this move without realizing what they are doing.

"It was incredibly fascinating getting to play against the poker bot and seeing some of the strategies it chose," explains Michael Gagliano, a professional player who was pitted against Pluribus. "There were several plays that humans simply are not making at all, especially relating to its bet sizing. Bots/AI are an important part in the evolution of poker, and it was amazing to have first-hand experience in this large step toward the future."

AI Poker Sets a Trap to Win More Money from Humans

Alongside Pluribus' sophisticated and unpredictable gameplay comes a significantly reduced need for processing power compared to prior AI gameplay systems. The researchers note 2016's incredible AlphaGo system won its games using 1,920 CPUs, and Libratus in 2017 needed 100 CPUs to run its two-player poker games. Pluribus incredibly runs on just two Intel Haswell E5-2695 v3 CPUs, and less than 128 GB of memory. Each move Pluribus makes takes on average 20 seconds, about twice as fast as a standard professional poker player.

This landmark achievement is undeniably an impressive leap forward for AI development, but it is reasonable to ask what this means for the highly profitable world of online poker. Despite more openly revealing the code behind Libratus back in 2017, the researchers are suggesting Pluribus' algorithms will have to remain secret, and will not be publicly released at this point. Speaking to MIT Technology Review, Noam Brown suggests the system could effectively win huge volumes of money in the online poker environment.

"It could be very dangerous for the poker community," Brown warns.

The new research is published in the journal Science.

Source: Carnegie Mellon University

The choice of the trap play is mystifying. Pluribus might guess that P6 has a queen, but it doesn't know what kicker P6 has. Even if not a nine or seven, Pluribus's play gives P9 two chances to pair that card (or hide a king). It does this in anticipation of P6 making a very questionable play. On the river, P6 chooses to raise. What hand does P6 expect to make a call here? He only gets called when he's beat. Picking this hand out of 10,000 played is weak; surely there's another hand there where both Pluribus and the pro are being smart.
I wonder if the AI develops player profiles such that it can predict how each player will react at each step of the game. Human players do that of course, but they don't have perfect memories of all the players and their past behavior. A fascinating business. The AI can't see "tells," but I'm guessing pro players don't have many.