Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Domain Shifts in Reinforcement Learning: Identifying Disturbances in Environments

: Haider, Tom; Schmoeller Roza, Felippe; Eilers, Dirk; Roscher, Karsten; Günnemann, Stephan

Fulltext urn:nbn:de:0011-n-6404410 (3.1 MByte PDF)
MD5 Fingerprint: 2561cc4fe4bf880de1dfeac247372752
(CC) by
Created on: 21.9.2021

Espinoza, H.:
Workshop on Artificial Intelligence Safety, AISafety 2021. Proceedings. Online resource : Co-located with the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI 2021), Virtual, August 2021
Online im WWW, 2021 (CEUR Workshop Proceedings 2916)
Paper 11, 7 pp.
Workshop on Artificial Intelligence Safety (AISafety) <2021, Online>
International Joint Conference on Artificial Intelligence (IJCAI) <30, 2021, Online>
Bayerisches Staatsministerium für Wirtschaft, Landesentwicklung und Energie StMWi

Conference Paper, Electronic Publication
Fraunhofer IKS ()
reinforcement learning; RL; safety critical; Markov Decision Process; MDP; safety; Safe Intelligence; robustness; domain shift; out of distribution

A significant drawback of End-to-End Deep Reinforcement Learning (RL) systems is that they return an action no matter what situation they are confronted with. This is true even for situations that differ entirely from those an agent has been trained for. Although crucial in safety-critical applications, dealing with such situations is inherently difficult. Various approaches have been proposed in this direction, such as robustness, domain adaption, domain generalization, and out-of-distribution detection. In this work, we provide an overview of approaches towards the more general problem of dealing with disturbances to the environment of RL agents and show how they struggle to provide clear boundaries when mapped to safety-critical problems. To mitigate this, we propose to formalize the changes in the environment in terms of the Markov Decision Process (MDP), resulting in a more formal framework when dealing with such problems. We apply this framework to an example real-world scenario and show how it helps to isolate safety concerns.