Facoltà di scienze informatiche

Strumenti

Info per

English

Facoltà

Studia con noi

Ricerca

Info pratiche

Notizie ed eventi

Eventi

Aprile

2024

30.
04.
2024

Quo vadis education in the era of automation

Seminari

Maggio

2024

03.
05.
2024

Hazard Detection for Robotic Applications as Visual Anomaly Detection

Difese di tesi

Maggio

2024

04.
05.
2024

XXVIII Dies academicus

Maggio

2024

06.
05.
2024

CTL* Verification and Synthesis using Existential Horn Clauses

Seminari

Maggio

2024

08.
05.
2024

Business Ideas 2024

Maggio

2024

15.
05.
2024

Exploring the Usage of Pre-trained Models for Code-Related Tasks

Difese di tesi

Maggio

2024

17.
05.
2024

Bachelor Info Day, un'occasione per conoscere l'USI

Maggio

2024

25.
05.
2024

Porte aperte il 25 maggio 2024

On Generation of Representations for Reinforcement Learning

Decanato - Facoltà di scienze informatiche

Data d'inizio: 4 Settembre 2012

Data di fine: 5 Settembre 2012

You are cordially invited to attend the PhD Dissertation Defense of Yi SUN on Tuesday, September 4th 2012 at 09h00 in room A24 (Red building)

Abstract:
Creating autonomous agents that learn to act from sequential interactions has long been perceived as one of the ultimate goals of Artificial Intelligence (AI). Reinforcement Learning (RL), a subfield of Machine Learning (ML), addresses important aspects of this objective. This dissertation investigates into a particular problem encountered in RL called representation generation. Two related sub problems are considered, namely basis generation and model learning, concerning which we present three pieces of original studies.

In the first study, we consider a particular basis generation method called online kernel sparsification (OKS). OKS is originally proposed for recursive least square regression, and then quickly extended to RL. Despite the popularity of the method, important theoretical questions are still to be answered. In particular, it was unclear how the speed in which the size of the OKS dictionary, or equivalently the number of basis functions constructed, grows with the number of data. Characterizing such growth rate is crucial for the understanding of OKS, both on its computational complexity and, perhaps more importantly, the generalization property of the resulting linear regressor or value function estimator. We investigate into this problem, using a novel formula expressing the expected determinant of the kernel Gram matrix in terms of the eigenvalues of the covariance operator. Based on this formula, we are able to connect the cardinality of the dictionary with the eigen-decay of the covariance operator. In particular, we prove that under certain technical conditions, the size of the dictionary will always grow sub-linearly in the number of data points, and, as a consequence, the kernel linear regressor or value function estimator constructed from the resulting dictionary is consistent.

In the second study, we turn to a different class of basis generation methods, which make use of the reward information. Previous approaches in this setup construct a series of basis functions that in sufficient number can eventually represent the value function. In contrast, we show theoretically that there is a single, ideal basis function, whose addition to the set of basis functions immediately reduces the error to zero -- without changing existing weights. Moreover, this ideal basis function is simply the value function that results from replacing the MDP's reward function with its Bellman error. This result suggests a novel method for improving value function estimation: a primary reinforcement learner estimates its value function using its present basis functions; it then sends its TD error to a secondary learner, which interprets that error as a reward function and estimates the corresponding value function; the resulting value function then becomes the primary learner's new basis function. We present both batch and online versions in combination with incremental basis projection, and demonstrate that the performance is superior to existing methods, especially in the case of large discount factors.

In the last study, we focus on the problem of model learning, especially intelligent learning of the transition model of the environment. The problem is investigated under a Bayesian framework, where the learning is performed through probabilistic inference, and the learning progress is measured using Shannon information gain. In this setting, we show that the problem can be formulated as a RL problem, where the reward is given by the immediate information gain from performing the next action. This shows that the model-learning problem can in principle be solved using algorithms developed for RL. In particular, we show theoretically that if the environment is an MDP, then near optimal model learning can be achieved following this approach.

Dissertation Committee:

Prof. Jürgen Schmidhuber, Università della Svizzera italiana/IDSIA, Switzerland (Research Advisor)
Prof. Rolf Krause, Università della Svizzera italiana, Switzerland (Internal Member)
Prof. Kai Hormann, Università della Svizzera italiana, Switzerland (Internal Member)
Prof. Marcus Hutter, Australia’s National University, Australia (External Member)
Prof. Richard S. Sutton, University of Alberta, Canada (External Member)
Prof. Marco Wiering, University of Groningen, The Netherlands (External Member)

Contatti

Decanato - Facoltà di scienze informatiche

+41 58 666 46 90

[email protected]

Allegati

Add to your calendar

Condividi

Facebook

Twitter

LinkedIn

Whatsapp

Email

Stampa

Facoltà di scienze informatiche
Università della Svizzera italiana
Via Buffi 13
6900 Lugano, Svizzera
tel +41 58 666 46 90
fax +41 58 666 45 36
e-mail [email protected]
Altri contatti Feedback sul sito

Indicazioni

Raggiungere il campus

Resta in contatto

Facoltà

Studia con noi

Ricerca

Info pratiche

Notizie ed eventi

On Generation of Representations for Reinforcement Learning

Contatti

Allegati

Condividi

Stampa

Indicazioni

Resta in contatto