Computers

ChatGPT takes on a 1977 Atari at chess ... and it didn't go well

ChatGPT takes on a 1977 Atari at chess ... and it didn't go well
The match between ChatGPT and Atari was illuminating
The match between ChatGPT and Atari was illuminating
View 1 Image
The match between ChatGPT and Atari was illuminating
1/1
The match between ChatGPT and Atari was illuminating

ChatGPT volunteered to play a 1977-vintage Atari 2600 to a game of chess and came to regret it after the eight-bit chess engine from the age of Disco Fever and the introduction of the Force did better than expected. A lot better.

On a LinkedIn post, Citrix software engineer Robert Caruso related how he entered into a conversation with ChatGPT about the history of AI in chess that led to it offering a game with Atari Chess. Using a Stella emulator, Caruso obliged, but the challenger didn't do as well as one would have thought for a representative of the Robotic Brainiacs that are supposedly on the threshold of surpassing human intelligence as they sprint down the homestretch to godhood.

In fact, the Atari wiped the floor with ChatGPT when it came to chess. I don't mean the near-half-century-old game console won. I mean that ChatGPT made blunders that would embarrass the greenest of beginners. According to Caruso, the AI had trouble keeping even the most basic aspects of the game straight. It confused rooks for bishops, overlooked pawn forks, and forgot where pieces were. Even when the gameplay swapped to standard chess notation, it still played like a fish as it made lemon after lemon, as chess enthusiasts would say.

As to the Atari, it just kept plugging away while Caruso spent 90 minutes stopping the AI from making multiple blunders until it finally conceded the match.

What is particularly impressive about its victory is that the Atari chess engine dates from a time when just getting a computer (any computer) to play an actual legal game was a major accomplishment, much less a game console. Around 1977, I was writing chess programs and over the years bought a number of early computer chess games that I would soon give away because many couldn't handle castling or en passant – not to mention the ones where I discovered how to play a perfect game against it so I never lost.

So why did a chess engine that came under the pathetic category and only looks one move ahead not just defeat but humiliate ChatGPT? The answer is one that tells us a lot about AI and how that blanket term is becoming obsolete as we come to understand more about the technology.

It isn't that AI can't play chess or play chess well. We've had AI chess engines that can beat world champions for decades and are routinely used today to help grandmasters hone their skills. It's that AI isn't just one thing and isn't progressing like a monolith. More to the point, the term AI may not have any real meaning.

We like to think of AI as something that exploded on the scene only in the last few years. In fact, it's been around since the 1960s and was understood theoretically back in the 1940s. I've lost count of the number of times I've seen reports of AI showing its nascent supremacy over humans by making scientific "discoveries," creating new recipes, outdoing doctors at diagnoses, or doing something clever only to recall precisely the same achievements being celebrated as far back as 1961.

In other words, AI has been with us for a very long time and is really a blanket and rather loaded term for a vast array of computer technologies, from simple expert systems and rule-based algorithms to machine learning, neural networks, and advanced robotic systems that often have little to do with one another.

One example is chess engines as compared to Large Language Models (LLM) like ChatGPT. A chess engine is a very specialized algorithm that, at the highest level, runs on a specially designed computer capable of processing hundreds of millions of moves per second according to the strict rules of chess. These are designed to look several moves ahead in a game and trim the impossible-to-evaluate number of possible moves down to a manageable number while taking into account things like known chess openings combined with rules of thumb and the ability to learn from past mistakes.

The result is a brute force chess player that doesn't play well – chess computers are inelegant things that are often compared to Martians in their thinking – but they make fewer blunders than their human opponents. However, they do play and the best are able to take down the best human players in the world.

LLMs are completely different things made for very different purposes and, compared to many other AI applications, not very bright at all. They only seem ultra impressive to us because they are designed to handle language, can draw on extremely large databases, and play into the human tendency to anthropomorphize machines and do half the work of making them seem truly intelligent.

The problem with LLMs and chess is that an LLM works by using its training from vast databases and linear algebra to predict the next token in its response, not as the application of complex game rules. They are also extremely bad at logic that would allow them to validate their moves against the rules of the game.

They are also essentially stateless. They can keep the context of a conversation in their heads (if they had heads), but not multiple chess moves. As a result, they forget what they are doing, cannot keep the board straight, misinterpret positions, hallucinate pieces out of nowhere, and basically play in what can best be described as an absent-minded fashion.

Bear in mind that it is not that LLMs aren't good enough yet to play chess and someday will be. It's the way they're built, just as a top-level chess engine like Stockfish would be helpless if you asked it to explain the game of chess or its history or discuss the rationale behind a move.

We can see this by how chess engines and LLMs approach the board. The chess engine always has a precise understanding of the gameplay while the LLM works out problems statistically based on training data – the exact opposite to the engine. This also keeps the LLM from planning ahead. It can't handle deep search chess algorithms or minmax strategies, so they just regurgitate what they've learned from game transcripts and commentaries, not an actual assessment of the game. It also can't handle the symbolic representation of the board, resulting in the spontaneous invention of pieces, impossible board set ups, illegal moves, and verbose explanations to justify these.

To put it another way, chess simply is not ChatGPT's game. But this goes beyond chess. This limitation highlights how LLMs like ChatGPT, while becoming increasingly useful and powerful, are not universal problem solvers. They can't handle precise logical thinking, strict rules, or tasks that require a persistent memory of contexts and reliance on hard facts. An LLM might be great at helping a chess engine explain the game or discuss strategies with a human being, but its metaphorical hand needs slapping if it reaches for a pawn.

As International Grandmaster David Bronstein said, "The essence of chess is thinking about what chess is."

And that's not the chatbot's wheelhouse.

Source: LinkedIn

1 comment
1 comment
martinwinlow
Well, that's one way of looking at it but if you are hoping AI is going to provide a more 'human-like' influence on the man-(intelligent) machine relationship, then considering most of the population of the planet don't even *play* chess - let alone are any good at it - you could argue that this is actually a huge success for AI!