CC BY 4.0Haider, TomTomHaiderSchmoeller Roza, FelippeFelippeSchmoeller RozaEilers, DirkDirkEilersRoscher, KarstenKarstenRoscherGünnemann, StephanStephanGünnemann2022-03-1521.9.20212021https://publica.fraunhofer.de/handle/publica/41209210.24406/publica-fhg-412092A significant drawback of End-to-End Deep Reinforcement Learning (RL) systems is that they return an action no matter what situation they are confronted with. This is true even for situations that differ entirely from those an agent has been trained for. Although crucial in safety-critical applications, dealing with such situations is inherently difficult. Various approaches have been proposed in this direction, such as robustness, domain adaption, domain generalization, and out-of-distribution detection. In this work, we provide an overview of approaches towards the more general problem of dealing with disturbances to the environment of RL agents and show how they struggle to provide clear boundaries when mapped to safety-critical problems. To mitigate this, we propose to formalize the changes in the environment in terms of the Markov Decision Process (MDP), resulting in a more formal framework when dealing with such problems. We apply this framework to an example real-world scenario and show how it helps to isolate safety concerns.enreinforcement learningRLsafety criticalMarkov Decision ProcessMDPsafetySafe Intelligencerobustnessdomain shiftout of distributionDomain Shifts in Reinforcement Learning: Identifying Disturbances in Environmentsconference paper