AI Poker Program Beats Top Pros in Six-Max NLHE
While humans have been battling over gold bracelets at the 50th annual World Series of Poker (WSOP) at the Rio Casino in Las Vegas, a team of artificial intelligence (AI) researchers interested in the field of poker playing “bots” recently made a major breakthrough with their poker bot Pluribus.
Thomas Sandholm – who holds a Ph.D. and works as a professor of computer science in the Machine Learning Department at Carnegie Mellon University – collaborated with Facebook AI research scientist Noam Brown to create the poker bot Pluribus.
Sandholm previously created the AI poker bot Claudico – which was defeated in a series of heads-up (one on one) matches played against four top pros in 2015. He also created its successor Libratus, a heads-up bot that succeeded in a similar challenge two years later.
After conquering the heads-up tables against elite competition, Sandholm set to work developing Pluribus to tackle the entirely different challenge of six-handed No Limit Hold’em.
Sandholm and Brown published a research paper titled “Superhuman AI for Multiplayer Poker” in the July edition of the journal Science. In the paper, Pluribus’ computing abilities are described as follows:
“Pluribus runs on two CPUs. For comparison, AlphaGo used 1,920 CPUs and 280 GPUs for real-time search in its 2016 matches against top Go professional Lee Sedol. Pluribus also uses less than 128 GB of memory. The amount of time Pluribus takes to search on a single subgame varies between one second and 33 seconds depending on the particular situation. On average, Pluribus plays twice as fast as typical human pros: 20 seconds per hand when playing against copies of itself in six-player poker.”
While those specifications sound sophisticated to laymen, Carnegie Mellon University and Facebook revealed that poker bot Pluribus was put together in only eight days, using 512 GB of RAM, and $150 in cloud computing server space.
Two Experiments Result in Same Pluribus Success
To assess Pluribus’ ability to play six-handed No Limit Hold’em against the most skilled human opponents, 15 decorated pros from both the cash game and tournament circuit were chosen to participate in the trial:
Players were compensated for their participation using an incentive-based system designed to motivate them to play as well as possible.
In the first experiment, Pluribus competed against five pros at a time using a random daily selection over the course of 12 days and 10,000 hands. A variance reduction algorithm known as AVIAT was used to smooth out the statistical oddities human players refer to as “running well,” including a higher ratio of premium starting hands than probability would suggest.
In the end, Pluribus outplayed all of its human opponents to the tune of $5 big blinds in profit per 100 hands. Expressed in monetary terms, Pluribus earned $5 in profit on average, good for an astounding hourly win rate of approximately $1,000.
A second experiment saw five independent versions of Pluribus compete against a lone human opponent, with Ferguson, Elias, and Loeliger each taking a shot at the machines. Once again Pluribus prevailed, winning an average of 2.3 big blinds per 100 hands played.
The research paper explained that Pluribus sought to approximate Nash equilibrium strategies to attain game theory optimal (GTO) approaches for the human and AI opponents it competed against:
“The core of Pluribus’s strategy was computed via self play, in which the AI plays against copies of itself, without any data of human or prior AI play used as input. The AI starts from scratch by playing randomly, and gradually improves as it determines which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy.”
Pros Praise Poker Bot Pluribus for Outplaying the Best
Elias confirmed Pluribus’ staggering ability to improve while speaking to NPR about his experience playing the AI program:
“It was improving very rapidly, where it went from being a mediocre player to basically a world-class-level poker player in a matter of days and weeks.
Which was pretty scary.”
Jason Les is one of the people to play against Claudico, Libratus, and Pluribus, and in comments published alongside the research paper, he put AI’s six-handed breakthrough in perspective:
“I probably have more experience battling against best-in-class poker AI systems than any other poker professional in the world. I know all the spots to look for weaknesses, all the tricks to try to take advantage of a computer’s shortcomings.