Multi-Objective Reinforcement Learning
Most real-world decision problems have more than one objective. For example, imagine a computation centre that has to schedule different jobs. On the one hand, we want the jobs to finish as soon as possible, so it might seem tempting to keep all machine on all the time, and schedule all jobs in parallel. However, we also want to minimize our energy usage (for both financial and environmental reasons). Therefore, it is probably more favourable to keep some of the machines on stand-by, or even turn them off completely. Because it is impossible to attain the best possible value for both these objectives simultaneously, we need to find a set with different possibly optimal solutions, that represent different trade-offs between the objectives.
In this project, we will be studying multi-objective reinforcement learning (MORL) problems - focussing on the learning phase (see Figure 1). In the learning phase an algorithm interacts with an environment to learn a set of possibly optimal alternatives, which are subsequently shown to a user for policy selection. We will use the popular Multi-Objective Markov Decision Process (MOMDP) model. The student will have a lot of freedom to choose different methods and approaches, as well as applications. A couple of options are:
- Model-based MORL
- Multi-objective Deep Reinforment learning
- Policy Search (e.g., via local search or evolutionionary methods)
- MORL for traffic control
- MORL for epidemic control
- Diederik M. Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley - A Survey of Multi-Objective Sequential Decision-Making. Journal of Artificial Intelligence Research, 48:67–113, 2013. [pdf]
- Diederik M. Roijers‚ Shimon Whiteson‚ Peter Vamplew and Richard Dazeley - Why Multi−Objective Reinforcement Learning?. In EWRL 2015: Proceedings of the Twelfth European Workshop on Reinforcement Learning, July 2015. [pdf]
- Hossam Mossalam, Yannis Assael, Diederik M. Roijers, and Shimon Whiteson - Multi-Objective Deep Reinforcement Learning. In DeepRL 2016: the NIPS 2016 workshop on Deep Reinforcement Learning.[link]
Van Moffaert, Kristof, and Ann Nowé - Multi-objective reinforcement learning using sets of Pareto dominating policies. Journal of Machine Learning Research 15.1 (2014): 3483-3512.