Now showing 1 - 4 of 4
  • Publication
    Out-of-Distribution Detection for Reinforcement Learning Agents with Probabilistic Dynamics Models
    ( 2023) ; ;
    Schmoeller da Roza, Felippe
    ;
    Günnemann, Stephan
    Reliability of reinforcement learning (RL) agents is a largely unsolved problem. Especially in situations that substantially differ from their training environment, RL agents often exhibit unpredictable behavior, potentially leading to performance loss, safety violations or catastrophic failure. Reliable decision making agents should therefore be able to cast an alert whenever they encounter situations they have never seen before and do not know how to handle. While the problem, also known as out-of-distribution (OOD) detection, has received considerable attention in other domains such as image classification or sensory data analysis, it is less frequently studied in the context of RL. In fact, there is not even a common understanding of what OOD actually means in RL. In this work, we want to bridge this gap and approach the topic of OOD in RL from a general perspective. For this, we formulate OOD in RL as severe perturbations of the Markov decision process (MDP). To detect such perturbations, we introduce a predictive algorithm utilizing probabilistic dynamics models and bootstrapped ensembles. Since existing benchmarks are sparse and limited in their complexity, we also propose a set of evaluation scenarios with OOD occurrences. A detailed analysis of our approach shows superior detection performance compared to existing baselines from related fields.
  • Publication
    Safe and Efficient Operation with Constrained Hierarchical Reinforcement Learning
    ( 2023)
    Schmoeller da Roza, Felippe
    ;
    ;
    Günnemann, Stephan
    Hierarchical Reinforcement Learning (HRL) holds the promise of enhancing sample efficiency and generalization capabilities of Reinforcement Learning (RL) agents by leveraging task decomposition and temporal abstraction, which aligns with human reasoning. However, the adoption of HRL (and RL in general) to solve problems in the real world has been limited due to, among other reasons, the lack of effective techniques that make the agents adhere to safety requirements encoded as constraints, a common practice to define the functional safety of safety-critical systems. While some constrained Reinforcement Learning methods exist in the literature, we show that regular flat policies can face performance degradation when dealing with safety constraints. To overcome this limitation, we propose a constrained HRL topology that separates planning and control, with constraint optimization achieved at the lower-level abstraction. Simulation experiments show that our approach is able to keep its performance while adhering to safety constraints, even in scenarios where the flat policy’s performance deteriorates when trying to prioritize safety.
  • Publication
    Domain Shifts in Reinforcement Learning: Identifying Disturbances in Environments
    ( 2021) ;
    Schmoeller Roza, Felippe
    ;
    ; ;
    Günnemann, Stephan
    A significant drawback of End-to-End Deep Reinforcement Learning (RL) systems is that they return an action no matter what situation they are confronted with. This is true even for situations that differ entirely from those an agent has been trained for. Although crucial in safety-critical applications, dealing with such situations is inherently difficult. Various approaches have been proposed in this direction, such as robustness, domain adaption, domain generalization, and out-of-distribution detection. In this work, we provide an overview of approaches towards the more general problem of dealing with disturbances to the environment of RL agents and show how they struggle to provide clear boundaries when mapped to safety-critical problems. To mitigate this, we propose to formalize the changes in the environment in terms of the Markov Decision Process (MDP), resulting in a more formal framework when dealing with such problems. We apply this framework to an example real-world scenario and show how it helps to isolate safety concerns.
  • Publication
    Domain Shifts in Reinforcement Learning: Identifying Disturbances in Environments
    ( 2021) ;
    Schmoeller Roza, Felippe
    ;
    ; ;
    Günnemann, Stephan
    End-to-End Deep Reinforcement Learning (RL) systems return an action no matter what situation they are confronted with, even for situations that differ entirely from those an agent has been trained for. In this work, we propose to formalize the changes in the environment in terms of the Markov Decision Process (MDP), resulting in a more formal framework when dealing with such problems.