# Algorithm Design using Reinforcement Learning

## Problem Setting

There are many ways to solve a given problem. Therefore when faced with a problem we must choose an algorithm to solve it (*the algorithm selection problem* [1]). Typically the best algorithm depends on the context in which it is used (e.g. problem instance, execution environment,...).

Furthermore, algorithms for solving the problem at hand "out of the box" do not necessarily exist, instead we might only have a vague solution approach. Therefore, when trying to implement this algorithmic concept we are more often than not forced to make further design decisions. Even if good algorithms solving already exist, these methods often have many parameters/components that affect their performance which must be tuned and configured appropriately (*the configuration problem* [2]).

In practice, we often lack the expert knowledge to make these choices, or it might be even impossible to make these (at design time) as the context information on which the best choice depends might be uncertain.

To date, algorithms for many real-world problems are most commonly designed using a manual ad-hoc, trial & error approach and as a consequence is a tedious, time-consuming and costly process, often leading to mediocre results.

Therefore automating algorithm design has been an active field of research for decades. Here, the algorithm selection problem and configuration problem are traditionally studied separately and in a domain-specific setting. A general methodology for automatically designing a context-aware algorithm for a given problem is still an open problem.

## Objectives

In this master thesis topic, you would study a possible approach to solve the problem outlined above:

1- Formulate the Algorithm Design Problem as a Markov Decision Process (MDP), where

- State encodes context information.
- Actions correspond to an
*algorithmic choice*(e.g. using specific components/parameter value).

2- Solve this MDP (e.g. using reinforcement learning (RL) [3])

**Advantages/potential of this approach:**

- MDP's are mathematically well-founded.
- We can exploit one of the many techniques to find a provably optimal configuration (i.e. optimal policy).
- As a bonus, we get an explicit credit assignment, i.e. the value of making a certain algorithmic choice, in a specific context (i.e. Q-function).
- Using on-policy RL we can change our configuration online (while solving), to correct for offline misconfiguration.

You're encouraged to follow an incremental approach:

- A Proof of Concept, applying the methodology to a well-understood problem (e.g. sorting).
- Application to a real-world problem, where the best design choices are (not yet) fully understood.

Here, the choice of problem setting is largely free. One possible application domain would be 'Cross-domain Metaheuristic Optimization' (using hyper-heuristics) [4,5]

Finally, other approaches (besides MDP, RL) to automated algorithm design can be explored and pursued if they prove to be more promising.

### Contact

Contact Steven Adriaensen (steven.adriaensen@vub.ac.be) for more information about this topic (Office: CoMo lab, 10G711).

### Bibliography

[1] Rice, John R. "The algorithm selection problem." (1975).

[2] Hutter, Frank, et al. "ParamILS: an automatic algorithm configuration framework." *Journal of Artificial Intelligence Research* 36.1 (2009): 267-306.

[3] Barto, Andrew G. *Reinforcement learning: An introduction*. MIT press, 1998.

[4] Hoos, Holger H., and Thomas Stützle. *Stochastic local search: Foundations & applications*. Elsevier, 2004.

[5] Burke, Edmund K., et al. "Hyper-heuristics: A survey of the state of the art." *Journal of the Operational Research Society* 64.12 (2013): 1695-1724.