Stage-oe-small.jpg

Thema4568: Unterschied zwischen den Versionen

Aus Aifbportal
Wechseln zu:Navigation, Suche
(Die Seite wurde neu angelegt: „{{Abschlussarbeit |Titel=Constrained Model-Based Policy Optimization for Real World Applications |Vorname=Moritz |Nachname=Zanger |Abschlussarbeitstyp=Master |…“)
 
 
(Eine dazwischenliegende Version desselben Benutzers wird nicht angezeigt)
Zeile 1: Zeile 1:
 
{{Abschlussarbeit
 
{{Abschlussarbeit
|Titel=Constrained Model-Based Policy Optimization for Real World Applications
+
|Titel=Uncertainty-Aware Constrained Model-BasedPolicy Optimization
 
|Vorname=Moritz
 
|Vorname=Moritz
 
|Nachname=Zanger
 
|Nachname=Zanger
Zeile 7: Zeile 7:
 
|Partner=FZI Forschungszentrum Informatik
 
|Partner=FZI Forschungszentrum Informatik
 
|Forschungsgruppe=Angewandte Technisch-Kognitive Systeme
 
|Forschungsgruppe=Angewandte Technisch-Kognitive Systeme
|Abschlussarbeitsstatus=In Bearbeitung
+
|Abschlussarbeitsstatus=Abgeschlossen
 
|Beginn=2020/03/30
 
|Beginn=2020/03/30
|Beschreibung DE=Safety and sample efficiency are among the most urgent challenges faced by real-world applications of current reinforcement learning algorithms. Recent developments in model-based reinforcement learning have made significant progress in both sample efficiency and asymptotic performance which is often associated with the model bias problem.
+
|Abgabe=2020/11/01
The overall goal of this thesis is to derive a sample efficient, safe policy search algorithm by leveraging recent results in model-based reinforcement learning and safety-driven reinforcement algorithms. The combination of model-based methods with safety-driven approaches is motivated by the notion that the agent can safely explore and improve within model rollouts, thus reducing the total needed amount of risky real domain interactions.
+
|Beschreibung DE=Safety and sample efficiency are among the most urgent challenges faced by current reinforcement learning (RL) algorithms in real world applications.
 +
The goal of this thesis is to derive a sample efficient, safe policy search algorithm based on recent developments in model-based reinforcement learning and safety-driven RL algorithms. The combination of model-based methods with safety-driven approaches is motivated by the idea that the agent can safely explore and improve with imagined, model-generated interactions, thus reducing the needed amount of risky real domain interactions.
 +
 
 +
Furthermore, emphasis should be put on dealing with model inaccuracies, often referred to as model-bias, which may be exploited by a learning agent, thus leading to worse performance and safety concerns.
 +
 
 +
In particular, the key idea of the proposed approach is to extend the performance boundaries described in Constrained Policy Optimization concerning uncertain state transition probabilities. To maintain the theoretical monotonic improvement guarantees described by Achiam et al., an approach for explicitly quantifying model-related epistemic uncertainty and constraining subsequent model-errors shall be derived.
 
}}
 
}}

Aktuelle Version vom 9. März 2021, 16:06 Uhr



Uncertainty-Aware Constrained Model-BasedPolicy Optimization


Moritz Zanger



Informationen zur Arbeit

Abschlussarbeitstyp: Master
Betreuer: Mohammd Karam Daaboul
Forschungsgruppe: Angewandte Technisch-Kognitive Systeme
Partner: FZI Forschungszentrum Informatik
Archivierungsnummer: 4568
Abschlussarbeitsstatus: Abgeschlossen
Beginn: 30. März 2020
Abgabe: 01. November 2020

Weitere Informationen

Safety and sample efficiency are among the most urgent challenges faced by current reinforcement learning (RL) algorithms in real world applications. The goal of this thesis is to derive a sample efficient, safe policy search algorithm based on recent developments in model-based reinforcement learning and safety-driven RL algorithms. The combination of model-based methods with safety-driven approaches is motivated by the idea that the agent can safely explore and improve with imagined, model-generated interactions, thus reducing the needed amount of risky real domain interactions.

Furthermore, emphasis should be put on dealing with model inaccuracies, often referred to as model-bias, which may be exploited by a learning agent, thus leading to worse performance and safety concerns.

In particular, the key idea of the proposed approach is to extend the performance boundaries described in Constrained Policy Optimization concerning uncertain state transition probabilities. To maintain the theoretical monotonic improvement guarantees described by Achiam et al., an approach for explicitly quantifying model-related epistemic uncertainty and constraining subsequent model-errors shall be derived.