Bandits games

Context

A common approach in online applications is to use multi-armed bandits in sequential decision makers like games [1]. A large number of real world applications can be modeled as bandits games like, for example, the market markers or computer Go. There are several techniques to play bandit games:

Monte Carlo Tree search (MCTS) builds a search tree according to random samples in the decision space [2]using stochastic bandits where the rewards are sampled from an unknown distribution [3].

In an adversarial environment, the game is played between a forecaster and an environment assuming that the adversarial process controls the rewards [4].

Contextual bandits [5]uses the context to adapt its long term behavior, or regret. For example, the market markers adapt their rewards based on traders’ beliefs and other structural assumptions that form the context.

Consider that bandit games as interactive decision makers that sequentially interact with the forecaster when searching for the preferred solution. We consider bandit problems with reward vectors and we extend stochastic, contextual and adversarial MABs to the multi-objective setting. Our focus is on multi-objective games with tuples of pay-off functions [6]formulated as bandit games.

Contact

David Catteeuw, dcatteeu@vub.ac.be

Bernard Manderick bmanderi@vub.ac.be

References

1.         Cesa-Bianchi, N., and Lugosi, G.. Prediction, Learning and Games. 2006: Cambridge University Press.

2.         Browne, C., et al., A survey of Monte Carlo tree search methods. IEEE Trans on Comp Intel and AI in Games, 2012. 4(1): p. 1-46.

3.         Auer, P., Using Confidence Bounds for Exploitation-Exploration Trade-offs. J of Machine Learning Res, 2002. 3: p. 397-422.

4.         Bubeck, S. and A. Slivkins, The best of both worlds: stochastic and adversial bandits. ICML'12: p. 1-23.

5.         Li, L., et al. A Contextual-Bandit Approach to Personalized News Article Recommendation. in WWW 2010.

6.         Meijer, A.B. and H. Koppelaar. Towards multi-objective game theory - with application to Go. in Proc of Conf on Intelligent Games and Simulation. 2003.