Fair Policy Search

In real-life situations, agents (such as humans) have to cooperate in order to achieve their goals. However, because agents typically have their individual goals and valuations, it is often hard to come up with a plan that will satisfy all agents. For example, imagine joining a dungeon group in a massively multiplayer online game (e.g., world of warcraft); some players are mainly interested in finishing the dungeon, while other players may be after specific items, or finishing subquests. It thus takes a plan that will satisfy the individual goals of the different agents to such a degree that they will be happy with joining the team.

One solution to this problem is to see the utility of each individual agent as an objective for the team, and solve the problem as a cooperative multi-objective decision problem. Specifically, we want to produce a set of viable plans with different trade-offs between the objectives (i.e., the utilities of the agents), and chose from this set by negotiation. Because we are negotiating at the end, we only want plans that are both undominated and fair:

  • Undominated means that it is not possible to increase the value in one of the objectives without diminishing at least one of the others
  • Fair means, that if for some agent, j, the value is x higher than that of agent i, vj = vi+x, then if we keep the value of all agents except i and j the same, then it would be better to transfer up to x/2 of the value from j to i. We call such transfers Lorenz-transfers. In other words, if we have the choice, we prefer the utility to be shared equally amongst the agents.
  • We call a vector that is undominated, and for which no Lorenz-transfer is possible, Lorenz-optimal. (For a more formal definition see [1].)

We refer to a set that contains all plans that are both undominated and fair (in the abovementioned sense) as the Lorenz-optimal set.

Due to the non-linear nature of Lorenz optimality, it is hard (in the computational sense) to find the Lorenz-optimal set analytically. Therefore, we need to use heuristics. Possible ways to do this is via multi-objective evolutionary algorithms [2] or local search algorithms [3,4]. However, currently no such algorithms exist that specifically aim to find the Lorenz-optimal set for multi-agent planning problems. In this project, we aim to find such algorithms.

Contact: Diederik M. Roijers (droijers@ai.vub.ac.be)

Relevant Literature:

[1] Perny, P., Weng, P., Goldsmith, J., & Hanna, J. (2013). Approximation of lorenz-optimal solutions in multiobjective markov decision processes. arXiv preprint arXiv:1309.6856.

[2] Coello, C. A. C., Lamont, G. B., & Van Veldhuizen, D. A. (2007). Evolutionary algorithms for solving multi-objective problems (Vol. 5). New York: Springer.

[3] Maarten Inja, Chiel Kooijman, Maarten de Waard, Diederik M. Roijers, and Shimon Whiteson - Queued Pareto Local Search for Multi-Objective Optimization. In PPSN 2014: Proceedings of the Thirteenth International Conference on Parallel Problem Solving from Nature, pp. 589–599, September 2014. [pdf]

[4] Chiel Kooijman, Maarten de Waard, Maarten Inja, Diederik M. Roijers, and Shimon Whiteson - Pareto Local Policy Search for MOMDP Planning. In ESANN 2015: Special Session on Emerging Techniques and Applications in Multi-Objective Reinforcement Learning at the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 2015, pp. 53-58, April 2015. [pdf]