Reinforcement learning is an approach to learning that involves an agent interacting with its environment and optimizing its behaviour based on rewards received for this behaviour. In complex tasks, learning can be excruciatingly slow due to sparse and uniformative rewards. Reward shaping allows us to address this problem by incorporating heuristic knowledge about the task to guide an agent's exploration.
In this context, we have made several contributions on what kind of information can be used and how it can be incorporated. We contributed to the field of transfer learning, by proposing a reward shaping approach to policy transfer. The goal in policy transfer is to speed up learning in a new task by using behaviour designed for, or learned in a different, but similar task. Our approach benefits from the theoretical guarantees for reward shaping, while outperforming state-of-the-art in policy transfer:
As we move toward employing autonomous, general purpose agents (robots) in home and industrial settings, one obvious way to train them for new tasks is by having their owners help them through instruction, demonstration, or advice. Towards that goal, we have proposed techniques to leverage such feedback using reward shaping:
Finally, when multiple shaping functions can be constructed (based on any type of knowledge: domain knowledge, transferred knowledge, demonstrations, advice, etc.), we propose to combine them in a learning ensemble, showing that such an ensemble manages to improve performance over its constituting components:
Code for the Pursuit experiments in our AAAI-14 paper is available for download.
Some of this work was covered in a tutorial I gave with Matthew E. Taylor at the ALA workshop at AAMAS-15. The slides are available here.
Evolutionary Algorithms are meta-heuristics that are used to solve hard problems by iteratively improving a set of candidate solutions. They can be powerful black-box techniques to quickly approximate high-quality solutions to otherwise intractable problems.
In collaboration with fuzzy logic researchers, we developed the first approximate solver for satisfiability in fuzzy logics:
Furthermore, we looked into the effect that employing multiple interacting populations, as opposed to a single panmictic one, has on optimization: