# Stable MultI-agent LEarnIng for neTworks (SMILE-IT)

SMILE-IT aims to extend and improve upon the state of the art in multi-agent reinforcement learning and network management in order to implement and validate a generic, stable, and robust multi-agent reinforcement learning framework, capable of (semi-)automatically managing modern networked systems (e.g., telecommunications networks, smart grids, air traffic routing, traffic control) through software.

## Papers

**Claessens, Bert J., Peter Vrancx, and Frederik Ruelens. "Convolutional neural networks for automatic state-time feature extraction in reinforcement learning applied to residential load control." IEEE Transactions on Smart Grid (2016). **[pdf]

Abstract:

Direct load control of a heterogeneous cluster of residential demand flexibility sources is a high-dimensional control problem with partial observability. This work proposes a novel approach that uses a convolutional neural network to extract hidden state-time features to mitigate the curse of partial observability. More specific, a convolutional neural network is used as a function approximator to estimate the state-action value function or Q-function in the supervised learning step of fitted Q-iteration. The approach is evaluated in a qualitative simulation, comprising a cluster of thermostatically controlled loads that only share their air temperature, whilst their envelope temperature remains hidden. The simulation results show that the presented approach is able to capture the underlying hidden features and successfully reduce the electricity cost the cluster.

**Ruelens, Frederik, et al. "Direct Load Control of Thermostatically Controlled Loads Based on Sparse Observations Using Deep Reinforcement Learning." arXiv preprint. **[pdf]

Abstract:

This paper considers a demand response agent that must find a near-optimal sequence of decisions based on sparse observations of its environment. Extracting a relevant set of features from these observations is a challenging task and may require substantial domain knowledge. One way to tackle this problem is to store sequences of past observations and actions in the state vector, making it high dimensional, and apply techniques from deep learning. This paper investigates the capabilities of different deep learning techniques, such as convolutional neural networks and recurrent neural networks, to extract relevant features for finding near-optimal policies for a residential heating system and electric water heater that are hindered by sparse observations. Our simulation results indicate that in this specific scenario, feeding sequences of time-series to an LSTM network, which is a specific type of recurrent neural network, achieved a higher performance than stacking these time-series in the input of a convolutional neural network or deep neural network.

## Patents

**Claessens, B and Vrancx, P. European patent application 16167240.7. Methods, controllers and systems for the control of distribution systems using a neural network architecture.**

Abstract:

A deep approximation neural network architecture is described which extrapolates data over unsees states for demand response applications in order to control distribition system like product distribution systems of which energy distribution systems, e.g. hear or electrical power distribution, are one example. The present invitation describes a model-free control technique mainly in the form of RL whereby a controller learns from interaction with the system to be controlled to control product distributions of which energt distribution systems, e.g. heat or electrical power distribution, are one example.