Life-Long Hierarchical Reinforcement Learning

Start: 
03/10/2016
End: 
30/09/2020
Funding: 
FWO

Reinforcement Learning is currently applied to single tasks, for instance parking a car, playing a video game, navigating to a goal, etc. Hierarchical Reinforcement Learning allows to decompose a complex task into simpler sub-tasks. For instance, an agent may progressively learn to grab objects, turn them, open doors, navigate through hallways (with doors), then do something interesting in a complete building. This project applies Hierarchical Reinforcement Learning to very complex tasks, for which the agent has to learn many skills. Playing a video game is not the goal anymore, but only a small skill that may sometimes be needed by the agent.

Another great advantage of Hierarchical Reinforcement Learning is that some of the sub-tasks may be solved using fixed policies. If a robot is able to walk, why re-learning that? Walking can be considered "known", and the robot will learn more interesting behaviors built on that. This allows to apply Reinforcement Learning to problems for which partial (but sometimes proven-good) solutions exist. The agent learns to combine these solutions, and can also learn any skill that it needs and that was not provided.

This project consists of the following steps:

  1. Make HRL able to easily represent the kind of non-Markovian policies that arise in robotics, where most of the information is observable but some long-term memory is still needed because the robot has to remember what it has already done and what it must still do.
  2. Explore how skills can be learned one at a time, then combined by the agent. This forms some sort of "curriculum learning", where the tasks are made progressively more complex.
  3. Automatically discover what should be a skill. Instead of relying on a human to say "now we are learning to kick a ball", the agent should be able to discover that it is learning something interesting.
  4. Playing and intrinsic motivation: have the agent experiment with its environment without supervision, like children at play. The agent discovers the effects of its actions, tries to reproduce interesting events (if the ball falls, try to make it fall again), and automatically learns skills.

At the end of this project, the agent should be able to play and learn autonomously, building a large set of skills that it can then combine to quickly solve new tasks whose goals are provided by humans. Learning to deliver mail is easier when you don't have to first learn what a door knob looks like.

Involved members: