Coordination Mechanisms in MARL


The Smile-IT project aims to develop a multi-agent reinforcement learning framework for studying and managing modern distributed networked systems (telecom networks, smart grids, traffic networks…)  that contain a large number of entities or agents, both machine and human, which strive to achieve their personal objectives. The framework developed within the project will train these entities by merely observing environmental reactions to the agents’ decisions, in order to achieve system-wide optimal behaviour in the face of diverging and incompatible personal goals.


In multi-agent settings, only local information is directly available to an agent. When different agents have to cooperate in order to optimize group behaviour, coordination among the agents is required. However, it is often the case that agents are heterogeneous, that is, different in behavioural capabilities, meaning it is not always wise to imitate any arbitrary agent, as the required behaviour of every agent might be different. For example, an autonomous car that is slow should not learn from a fast car, as it won’t be able to replicate the same behaviour. It is therefore necessary for agents to keep track of reliable sources of information and discard others.


In this thesis, you will start by studying methods for coordination among homogeneous agents, in a pure collaborative setting. Then you will investigate which interactions are reliable among heterogeneous agents and test this in both a collaborative and a competitive setting. You will validate these methods and compare them in simple multi-agent RL settings (e.g., an adaptation of a Prey-Predator setting).