Deep Reinforcement learning for Board Games



Reinforcement learning (RL) is one of the key AI paradigms for the development of autonomous systems. RL allows a learning agent to solve a task based on trial-and error interactions with its environment. By observing the results of its actions, the agent can determine the optimal sequence of actions to take in order to reach some goal.

Deep learning is a popular research track within the field of machine learning. The main idea behind deep learning is to create architectures consisting of multiple layers of representations in order to learn high level abstractions. Examples are the deep neural network methods used in image processing. Starting from individual pixels, each successive layer of the network learns progressively more complex features until the highest layers are able to recognize objects in the image.


Recent research has also shown that deep learning can be used to learn useful representations for reinforcement learning tasks. This has led to a new generation of state-of-the-art algorithms that combine deep learning and reinforcement learning. One recent succes is the development of AlphaGo, a computer agent for the Go board game that plays at human world champion level. Go was considered to be the next big challenge for AI and its solution was thought  to still be years away. The succes of AlphaGo demonstrates the potential and the power of the new deep RL methods.

This achievement also stands in contrast with previous successes like IBM's Deep Blue, the famous chess program that beat Kasparov in 1997. Despite its ingenuity, Deep Blue searched through a very large space of possibilities, while relying on human-provided heuristics. Both of these components have proven to be sometimes impassably challenging in Go: there are more possibilities than atoms in the universe, and the human-crafted heuristics have long been failing to capture the subtle complexities of the game. Through the use of deep convolutional neural networks and reinforcement learning, AlphaGo overcame both of challenges, at least a decade earlier than was expected. 
In this dissertation you will implement some of the methods used in AlphaGo and evaluate them on different board games. The goal is to test the generality of the AlphaGo techniques and determine the contribution of each of the steps in the AlphaGo training process.




Peter Vrancx