Facoltà di scienze informatiche

Strumenti

Info per

English

Facoltà

Studia con noi

Ricerca

Info pratiche

Notizie ed eventi

Eventi

Aprile

2024

30.
04.
2024

Quo vadis education in the era of automation

Seminari

Maggio

2024

03.
05.
2024

Cryptocurrency vs. consensus

Seminari

Hazard Detection for Robotic Applications as Visual Anomaly Detection

Difese di tesi

Maggio

2024

04.
05.
2024

XXVIII Dies academicus

Maggio

2024

06.
05.
2024

CTL* Verification and Synthesis using Existential Horn Clauses

Seminari

Maggio

2024

08.
05.
2024

Business Ideas 2024

Maggio

2024

10.
05.
2024

Workshop of the International Center for Advanced Computing in Medicine (ICAM)

Workshop

Maggio

2024

15.
05.
2024

Exploring the Usage of Pre-trained Models for Code-Related Tasks

Difese di tesi

Reinforcement Learning with General Evaluators and Generators of Policies

Decanato - Facoltà di scienze informatiche

Data: 15 Febbraio 2024 / 11:00 - 12:30

USI East Campus, Room D0.03

You are cordially invited to attend the PhD Dissertation Defence of Francesco Faccio on Thursday 15 February 2024 at 11:00 in room D0.03. East Campus.

Abstract:
Reinforcement Learning (RL) is a subfield of Artificial Intelligence that studies how machines can make decisions by learning from their interactions with an environment. The key aspect of RL is evaluating and improving policies, which dictate the behavior of artificial agents by mapping sensory input to actions. Typically, RL algorithms evaluate these policies using a value function, generally specific to one policy. However, when value functions are updated to track the learned policy, they can forget potentially useful information about previous policies. To address the problem of generalization across many policies, we introduce Parameter-Based Value Functions (PBVFs), a class of value functions that take policy parameters as inputs. A PBVF is a single model capable of evaluating the performance of any policy, given a state, a state-action pair, or a distribution over the RL agent's initial states, and it can generalize across different policies. We derive off-policy actor-critic algorithms based on PBVFs. To input the policy into the value function, we employ a technique called policy fingerprinting. This method compresses the policy parameters, rendering PBVFs invariant to changes in the policy architecture. This policy embedding extracts crucial abstract knowledge about the environment, distilled into a limited number of states sufficient to fully define the behavior of various policies. A policy can improve solely by modifying actions in such states, following the gradient of the value function's predictions. Extensive experiments demonstrate that our method outperforms evolutionary algorithms, demonstrating a more efficient direct search in the policy space. Furthermore, it achieves performance comparable to that of competitive continuous control algorithms. We apply this technique to learn useful representations of Recurrent Neural Network weight matrices, showing its effectiveness in several supervised learning tasks. Lastly, we empirically demonstrate how this approach can be integrated with HyperNetworks to train a single goal-conditioned neural network (NN) capable of generating deep NN policies that achieve any desired return observed during training. (https://kaust.zoom.us/j/92012946881)

Dissertation Committee:
- Prof. Jürgen Schmidhuber, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Cesare Alippi, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Rolf Krause, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Alex Graves, NNAISENSE, United Kingdom (External Member)
- Prof. Marcello Restelli, Politecnico di Milano, Italy (External Member)

Contatti

Decanato - Facoltà di scienze informatiche

+41 58 666 46 90

[email protected]

Allegati

Add to your calendar

Condividi

Facebook

Twitter

LinkedIn

Whatsapp

Email

Stampa

Facoltà di scienze informatiche
Università della Svizzera italiana
Via Buffi 13
6900 Lugano, Svizzera
tel +41 58 666 46 90
fax +41 58 666 45 36
e-mail [email protected]
Altri contatti Feedback sul sito

Indicazioni

Raggiungere il campus

Resta in contatto

Facoltà

Studia con noi

Ricerca

Info pratiche

Notizie ed eventi

Reinforcement Learning with General Evaluators and Generators of Policies

Contatti

Allegati

Condividi

Stampa

Indicazioni

Resta in contatto