Paper Highlight: Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (AAAI-2022)

Last month, AI Lab PhD student Florent Delgrange presented his paper entitled “Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes”, co-authored by Prof. Ann Nowé and Prof. Guillermo Perez (University of Antwerp – Flanders Make), at the AAAI-22 conference


Paper summary:

While (deep) reinforcement learning (RL) has been applied to a wide range of challenging domains, from game playing to real-world applications (such as energy management or epidemics control), more widespread deployment in the real-world is hampered by the lack of guarantees provided with the learned policies. The aim of the framework developed in the paper is to enhance the reliability of RL solutions. In a nutshell, the idea is to learn a simple, verifiable, and human-understandable abstract model of the real environment by observing the intelligent agent’s interaction with it. At the same time, the RL policy under which the agent operates is distilled over that simpler model. Learning this model is performed by optimizing an equivalence criterion with the original environment, further yielding abstraction quality guarantees. All of this enables reasoning about this explainable model as well as the formal verification of the agent’s behaviors via model checking tools.


The paper can be downloaded here.


Florent is active in GC3 “Multi-agent collaborative AI” under WP2 “Multi-agent control systems” of the Flanders AI Research Program.