Bi-level Multi-Armed Bandits

Context
 
Multi-armed bandits (MABs) are an ideal tool to study the exploration exploitation dilemma which plays a crucial role in Reinforcement Learning. Recently we have proposal several extensions to MABs that allow to tackle multi-objective problems. In this thesis you will study, how these approaches can be adapted such that they can also be applied in non-stationary settings. In a first stage you would consider situations where the rewards only slowly change over time. The challenge here is to pull as much as possible optimal arms, but also to pull arms such that the expected reward can be tracked.  Insights from single object MABs for non-stationary bandits will form the starting point for this proposal. Possibly in a second stage, bi-level problems could also be considered. In a bi-level problem MABs are applied on two levels, one level is referred to the leader and the other is referred to the follower. As an example a toll setting problem can be considered. The toll rate will be set by the leader who wants to maximise his revenue, but the users, the followers, will react to these prices and change their behavior because they want to minimise their travel costs. In this setting the follower's behavior generates a non-stationary problem for the leaders level. However, here the non-stationary effect might be much more abrupt and another approach might be necessary.
 
References
 
O. C. Granmo and S. Berg. (2010). Solving non-stationary bandit problems by random sampling from sibling Kalman filters. In Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III (IEA/AIE'10), Springer-Verlag, Berlin, Heidelberg, 199-208.
 
M. M. Drugan and  A. Nowe (2014). Scalarization based Pareto optimal set of arms identification algorithms. In International Joint Conference on Neural Networks (IJCNN). presented at the 07/2014, Bejing, China: IEEE.
 
M. M. Drugan and  A. Nowe (2014). Scalarization based Pareto optimal set of arms identification algorithms. In International Joint Conference on Neural Networks (IJCNN). presented at the 07/2014, Bejing, China: IEEE.
 
B. Colson. Patrice Marcotte ·Gilles Savard, An overview of bilevel optimization, Ann Oper Res (2007) 153: 235–256
 
Contacts
 
David Catteeuw, dcatteeu@vub.ac.be