On A Study in Direct Policy Search

Faculty of Informatics - Academic Studies Administration

Start date: 10 May 2010

End date: 11 May 2010

The Faculty of Informatics is pleased to announce a seminar given by Daniel Wierstra, Technische Universität München, Germany

DATE: Monday, May 10th 2010
PLACE: USI Università della Svizzera italiana, room A33 red building (Via G. Buffi 13)
TIME: 10.00

ABSTRACT:
Reinforcement learning in partially observable environments constitutes an important and challenging
problem. Since many value function and temporal difference methods have been shown to perform poorly
and even to diverge in non-Markovian settings, direct policy search methods may hold more promise.

The aim of this thesis is to advance the state-of-the-art in direct policy search
and black box optimization as applied to reinforcement learning. Its contributions include
a taxonomy of reinforcement learning algorithms and four new algorithms:

(1) a novel algorithm which backpropagates recurrent policy gradients through time, as such learning both memory
and a policy at the same time with the use of recurrent neural networks, in particular Long Short-Term
Memory (LSTM);

(2) an instantiation of the well-known Expectation-Maximization algorithm adapted to
learning policies in partially observable environments;

(3) Fitness Expectation-Maximization, a new blackbox search method derived from first principles;

(4) Natural Evolution Strategies, an alternative to conventional evolutionary methods that uses a Monte Carlo-estimated
natural gradient to incrementally update its search distribution. Experimental results with these four methods demonstrate
highly competitive performance on a variety of test problems ranging from standard benchmarks to
deep memory tasks to fine motor control in a car driving simulation.

HOST: Prof. Jürgen Schmidhuber