Now showing 1 - 10 of 12
No Thumbnail Available
Publication

Towards Probabilistic Safety Guarantees for Model-Free Reinforcement Learning

2023 , Schmoeller da Roza, Felippe , Roscher, Karsten , Günneman, Stephan

Improving safety in model-free Reinforcement Learning is necessary if we expect to deploy such systems in safety-critical scenarios. However, most of the existing constrained Reinforcement Learning methods have no formal guarantees for their constraint satisfaction properties. In this paper, we show the theoretical formulation for a safety layer that encapsulates model epistemic uncertainty over a distribution of constraint model approximations and can provide probabilistic guarantees of constraint satisfaction.

No Thumbnail Available
Publication

Safety Assurance with Ensemble-based Uncertainty Estimation and overlapping alternative Predictions in Reinforcement Learning

2023 , Eilers, Dirk , Burton, Simon , Schmoeller da Roza, Felippe , Roscher, Karsten

A number of challenges are associated with the use of machine learning technologies in safety-related applications. These include the difficulty of specifying adequately safe behaviour in complex environments (specification uncertainty), ensuring a predictably safe behaviour under all operating conditions (technical uncertainty) and arguing that the safety goals of the system have been met with sufficient confidence (assurance uncertainty). An assurance argument is therefore required that demonstrates that the effects of these uncertainties do not lead to an unacceptable level of risk during operation. A reinforcement learning model will predict an action in whatever state it is in - even in previously unseen states for which a valid (safe) outcome cannot be determined due to lack of training. Uncertainty estimation is a well understood approach in machine learning to identify states with a high probability of an invalid action due a lack of training experience, thus addressing technical uncertainty. However, the impact of alternative possible predictions which may be equally valid (and represent a safe state) in estimating uncertainty in reinforcement learning is not so clear and to our knowledge, not so well documented in current literature. In this paper we build on work where we investigated uncertainty estimation on simplified scenarios in a gridworld environment. Using model ensemble-based uncertainty estimation we proposed an algorithm based on action count variance to deal with discrete action spaces whilst considering in-distribution action variance calculation to handle the overlap with alternative predictions. The method indicates potentially unsafe states when the agent is near out-of-distribution elements and can distinguish it from overlapping alternative, but equally valid predictions. Here, we present these results within the context of a safety assurance framework and highlight the activities and evidences required to build a convincing safety argument. We show that our previous approach is able to act as an external observer and can fulfil the requirements of an assurance argumentation for systems based on machine learning with ontological uncertainty.

No Thumbnail Available
Publication

Ensemble-based Uncertainty Estimation with overlapping alternative Predictions

2022 , Eilers, Dirk , Schmoeller da Roza, Felippe , Roscher, Karsten

A reinforcement learning model will predict an action in whatever state it is. Even if there is no distinct outcome due to unseen states the model may not indicate that. Methods for uncertainty estimation can be used to indicate this. Although a known approach in Machine Learning, most of the available uncertainty estimation methods are not able to deal with the choice overlap that happens in states where multiple actions can be taken by a reinforcement learning agent with a similar performance outcome. In this work, we investigate uncertainty estimation on simplified scenarios in a gridworld environment. Using ensemble-based uncertainty estimation we propose an algorithm based on action count variance (ACV) to deal with discrete action spaces and a calculation based on the in-distribution delta (IDD) of the action count variance to handle overlapping alternative predictions. To visualize the expressiveness of the model uncertainty we create heatmaps for different in-distribution (ID) and out-of-distribution (OOD) scenarios and propose an indicator for uncertainty. We can show that the method is able to indicate potentially unsafe states when the agent is facing novel elements in the OOD scenarios while capable to distinguish uncertainty resulting from OOD instances from uncertainty caused by the overlapping of alternative predictions.

No Thumbnail Available
Publication

Domain Shifts in Reinforcement Learning: Identifying Disturbances in Environments

2021 , Haider, Tom , Schmoeller Roza, Felippe , Eilers, Dirk , Roscher, Karsten , Günnemann, Stephan

A significant drawback of End-to-End Deep Reinforcement Learning (RL) systems is that they return an action no matter what situation they are confronted with. This is true even for situations that differ entirely from those an agent has been trained for. Although crucial in safety-critical applications, dealing with such situations is inherently difficult. Various approaches have been proposed in this direction, such as robustness, domain adaption, domain generalization, and out-of-distribution detection. In this work, we provide an overview of approaches towards the more general problem of dealing with disturbances to the environment of RL agents and show how they struggle to provide clear boundaries when mapped to safety-critical problems. To mitigate this, we propose to formalize the changes in the environment in terms of the Markov Decision Process (MDP), resulting in a more formal framework when dealing with such problems. We apply this framework to an example real-world scenario and show how it helps to isolate safety concerns.

No Thumbnail Available
Publication

Safe and Efficient Operation with Constrained Hierarchical Reinforcement Learning

2023 , Schmoeller da Roza, Felippe , Roscher, Karsten , Günnemann, Stephan

Hierarchical Reinforcement Learning (HRL) holds the promise of enhancing sample efficiency and generalization capabilities of Reinforcement Learning (RL) agents by leveraging task decomposition and temporal abstraction, which aligns with human reasoning. However, the adoption of HRL (and RL in general) to solve problems in the real world has been limited due to, among other reasons, the lack of effective techniques that make the agents adhere to safety requirements encoded as constraints, a common practice to define the functional safety of safety-critical systems. While some constrained Reinforcement Learning methods exist in the literature, we show that regular flat policies can face performance degradation when dealing with safety constraints. To overcome this limitation, we propose a constrained HRL topology that separates planning and control, with constraint optimization achieved at the lower-level abstraction. Simulation experiments show that our approach is able to keep its performance while adhering to safety constraints, even in scenarios where the flat policy’s performance deteriorates when trying to prioritize safety.

No Thumbnail Available
Publication

Towards Safety Assurance of Uncertainty-Aware Reinforcement Learning Agents

2023 , Schmoeller da Roza, Felippe , Hadwiger, Simon , Thorn, Ingo , Roscher, Karsten

The necessity of demonstrating that Machine Learning (ML) systems can be safe escalates with the ever-increasing expectation of deploying such systems to solve real-world tasks. While recent advancements in Deep Learning reignited the conviction that ML can perform at the human level of reasoning, the dimensionality and complexity added by Deep Neural Networks pose a challenge to using classical safety verification methods. While some progress has been made towards making verification and validation possible in the supervised learning landscape, works focusing on sequential decision-making tasks are still sparse. A particularly popular approach consists of building uncertainty-aware models, able to identify situations where their predictions might be unreliable. In this paper, we provide evidence obtained in simulation to support that uncertainty estimation can also help to identify scenarios where Reinforcement Learning (RL) agents can cause accidents when facing obstacles semantically different from the ones experienced while learning, focusing on industrial-grade applications. We also discuss the aspects we consider necessary for building a safety assurance case for uncertainty-aware RL models.

No Thumbnail Available
Publication

AI in MedTech Production. Visual Inspection for Quality Assurance

2021 , Roscher, Karsten

Automated visual inspection based on machine learning and computer vision algorithms is a promising approach to ensure the quality of critical medical implants and equipments. However, limited availability of data and potentially unpredictable deep learning models pose major challenges to bring such solutions to life and to the market. This talk addresses the open challenges as well as current research directions for dependable visual inspection in quality assurance of medical products.

No Thumbnail Available
Publication

Out-of-Distribution Detection for Reinforcement Learning Agents with Probabilistic Dynamics Models

2023 , Haider, Tom , Roscher, Karsten , Schmoeller da Roza, Felippe , Günnemann, Stephan

Reliability of reinforcement learning (RL) agents is a largely unsolved problem. Especially in situations that substantially differ from their training environment, RL agents often exhibit unpredictable behavior, potentially leading to performance loss, safety violations or catastrophic failure. Reliable decision making agents should therefore be able to cast an alert whenever they encounter situations they have never seen before and do not know how to handle. While the problem, also known as out-of-distribution (OOD) detection, has received considerable attention in other domains such as image classification or sensory data analysis, it is less frequently studied in the context of RL. In fact, there is not even a common understanding of what OOD actually means in RL. In this work, we want to bridge this gap and approach the topic of OOD in RL from a general perspective. For this, we formulate OOD in RL as severe perturbations of the Markov decision process (MDP). To detect such perturbations, we introduce a predictive algorithm utilizing probabilistic dynamics models and bootstrapped ensembles. Since existing benchmarks are sparse and limited in their complexity, we also propose a set of evaluation scenarios with OOD occurrences. A detailed analysis of our approach shows superior detection performance compared to existing baselines from related fields.

No Thumbnail Available
Publication

Safe Robot Navigation Using Constrained Hierarchical Reinforcement Learning

2022 , Schmoeller da Roza, Felippe , Rasheed, Hassan , Roscher, Karsten , Ning, Xiangyu , Günnemann, Stephan

Safe navigation is one of the steps necessary for achieving autonomous control of robots. Among different algorithms that focus on robot navigation, Reinforcement Learning (and more specifically Deep Reinforcement Learning) has shown impressive results for controlling robots with complex and high-dimensional state representations. However, when integrating methods to comply with safety requirements by means of constraint satisfaction in flat Reinforcement Learning policies, the system performance can be affected. In this paper, we propose a constrained Hierarchical Reinforcement Learning framework with a safety layer used to modify the low-level policy to achieve a safer operation of the robot. Results obtained in simulation show that the proposed method is better at retaining performance while keeping the system in a safe region when compared to a constrained flat model.

No Thumbnail Available
Publication

Domain Shifts in Reinforcement Learning: Identifying Disturbances in Environments

2021 , Haider, Tom , Schmoeller Roza, Felippe , Eilers, Dirk , Roscher, Karsten , Günnemann, Stephan

End-to-End Deep Reinforcement Learning (RL) systems return an action no matter what situation they are confronted with, even for situations that differ entirely from those an agent has been trained for. In this work, we propose to formalize the changes in the environment in terms of the Markov Decision Process (MDP), resulting in a more formal framework when dealing with such problems.