Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Benchmarking Uncertainty Estimation Methods for Deep Learning with Safety-Related Metrics

 
: Henne, Maximilian; Schwaiger, Adrian; Roscher, Karsten; Weiß, Gereon

:
Fulltext urn:nbn:de:0011-n-5827235 (679 KByte PDF)
MD5 Fingerprint: 15bb9376d6d18bf67b35f7dee32b6db2
(CC) by
Created on: 25.3.2020


Espinoza, H.:
Workshop on Artificial Intelligence Safety, SafeAI 2020. Proceedings. Online resource : Co-located with 34th AAAI Conference on Artificial Intelligence (AAAI 2020), New York, USA, Feb 7, 2020
New York/N.Y., 2020 (CEUR Workshop Proceedings 2560)
http://ceur-ws.org/Vol-2560/
pp.83-90
Workshop on Artificial Intelligence Safety (SafeAI) <2020, New York/NY>
Conference on Artificial Intelligence (AAAI) <34, 2020, New York/NY>
Bayerisches Staatsministerium für Wirtschaft, Landesentwicklung und Energie StMWi
BAYERN DIGITAL II; ADA-Center
ADA Lovelace Center for Analytics, Data and Applications
English
Conference Paper, Electronic Publication
Fraunhofer IKS ()
uncertainty estimation; deep learning; safety metrics; computer vision; safety; artificial intelligence; AI

Abstract
Deep neural networks generally perform very well on giving accurate predictions, but they often lack in recognizing when these predictions may be wrong. This absence of awareness regarding the reliability of given outputs is a big obstacle in deploying such models in safety-critical applications. There are certain approaches that try to address this problem by designing the models to give more reliable values for their uncertainty. However, even though the performance of these models are compared to each other in various ways, there is no thorough evaluation comparing them in a safety-critical context using metrics that are designed to describe trade-offs between performance and safe system behavior. In this paper we attempt to fill this gap by evaluating and comparing several state-of-the-art methods for estimating uncertainty for image classifcation with respect to safety-related requirements and metrics that are suitable to describe the models performance in safety-critical domains. We show the relationship of remaining error for predictions with high confidence and its impact on the performance for three common datasets. In particular, Deep Ensembles and Learned Confidence show high potential to significantly reduce the remaining error with only moderate performance penalties.

: http://publica.fraunhofer.de/documents/N-582723.html