From Rules to Strategy: Monte Carlo Tree Search

By Sam Bleckley on 07 01 2016

Go is an amazing game because of how incredibly simple its rules are, and how complex and difficult its strategy and tactics are. Professional Go players spend their entire lives gaining the knowledge, instincts, and tactical reading skills to play the game well.

Go is of particular interest to computer scientists because it is zero-sum, symmetric, of perfect information (no actions are hidden from either player), discrete, and of finite length. These are all qualities that are very well-studied, and computers should make great Go players.

But they don't! Go's branching factor (the average number of different moves a player can make at any given moment) is around 250. Chess, by comparison, has a branching factor of about 35. Straightforwardly searching for the right move is exponential with respect to the branching factor —- looking ahead at every possibility for the next three moves in Chess means looking at just under 50 thousand positions; in Go, that number is 15 million. Worse, Go games are generally much longer than chess games (200 moves, instead of 40); and judging the relative value of different moves is difficult, and often requires reading much further ahead.

6 years ago, computers were not only unable to beat the best human Go players — they were unable to beat talented amateurs. The've been steadily improving, and in the past few years Computer go players have beat professional-level Go players when given a handicap. (The situation is complicated by the fact that, just like in Chess, few professional players want to take the risk of being beaten by a computer on a level playing field — so few such games are staged.)

Nevertheless, just like computer Chess, the fundamental technique these programs are using is old — from the late 1980's. But the technique, called Monte Carlo Tree Search, is pretty freaky.

Basic Monte Carlo Tree Search (MCTS from here on) requires only two things: be able to list all legal moves in a given position, and be able to score a finished game. For Go, these two requirements are easily satisfied. But how does the algorithm turn those two pieces of information into strategic insight? Especially when we know it is expensive to exhaustively search more than a few moves ahead?

The answer is pretty strange.

For each possible move, MCTS makes that move, and then assumes both players play randomly from there on — it plays out a complete game, randomly! It scores that game, an repeats — playing a heap of totally random games. It scores the initial move based on those random outcomes.

By itself, that's a pretty terrible way to pick a move.

But as a way to decide which moves to investigate further, it's not the worst.

So, as long as there is time, MCTS takes the best-scoring moves and repeats the process, one move further along. You see how this begins to look like A search? Except the scoring algorithm is based on the shocking idea of random playouts. In fact, if there's enough time, MCTS converges on optimal play. It converges after an incredibly long time, but because it converges fairly monotonically and gradually, it can stop and give an approximately-best move at any* time — which is an important quality because most professional Go games, like Chess games, are timed.

So why do great computer Go players play in this frankly bizarre way, instead of learning to play the way humans do? The answer is that, frankly, we have no idea how human players play. Human players narrow the playing field to a few potential moves unconsciously and almost instantly. They can report reasons why the moves they're considering are good — but not the algorithm they used to pick those moves out of the sea of others. Certainly there are researchers attempting to use machine learning techniques, neural networks, and the like to emulate a more human approach to play, but these techniques are still in their infancy; and I'd put good money down that the ascendancy of computer Go will be much like that of computer Chess — a victory powered by hardware, not a triumph of AI.