Betreuer: Mohammd Karam Daaboul
Forschungsgruppe: Angewandte Technisch-Kognitive Systeme
Partner: FZI Forschungszentrum Informatik
Beginn: 30. März 2020
Safety and sample efficiency are among the most urgent challenges faced by current reinforcement learning (RL) algorithms in real world applications. The goal of this thesis is to derive a sample efficient, safe policy search algorithm based on recent developments in model-based reinforcement learning and safety-driven RL algorithms. The combination of model-based methods with safety-driven approaches is motivated by the idea that the agent can safely explore and improve with imagined, model-generated interactions, thus reducing the needed amount of risky real domain interactions.
Furthermore, emphasis should be put on dealing with model inaccuracies, often referred to as model-bias, which may be exploited by a learning agent, thus leading to worse performance and safety concerns.
In particular, the key idea of the proposed approach is to extend the performance boundaries described in Constrained Policy Optimization concerning uncertain state transition probabilities. To maintain the theoretical monotonic improvement guarantees described by Achiam et al., an approach for explicitly quantifying model-related epistemic uncertainty and constraining subsequent model-errors shall be derived.