Now showing 1 - 4 of 4
  • Publication
    A Quantitative Human-Grounded Evaluation Process for Explainable ML
    ( 2022) ;
    Müller, Sebastian
    ;
    Methods from explainable machine learning are increasingly applied. However, evaluation of these methods is often anecdotal and not systematic. Prior work has identified properties of explanation quality and we argue that evaluation should be based on them. In this work, we provide an evaluation process that follows the idea of property testing. The process acknowledges the central role of the human, yet argues for a quantitative approach for the evaluation. We find that properties can be divided into two groups, one to ensure trustworthiness, the other to assess comprehensibility. Options for quantitative property tests are discussed. Future research should focus on the standardization of testing procedures.
  • Publication
    Aligning Subjective Ratings in Clinical Decision Making
    ( 2020) ; ; ; ;
    Foldenauer, Ann Christina
    ;
    Köhm, Michaela
    In addition to objective indicators (e.g. laboratory values), clinical data often contain subjective evaluations by experts (e.g. disease severity assessments). While objective indicators are more transparent and robust, the subjective evaluation contains a wealth of expert knowledge and intuition. In this work, we demonstrate the potential of pairwise ranking methods to align the subjective evaluation with objective indicators, creating a new score that combines their advantages and facilitates diagnosis. In a case study on patients at risk for developing Psoriatic Arthritis, we illustrate that the resulting score (1) increases classification accuracy when detecting disease presence/absence, (2) is sparse and (3) provides a nuanced assessment of severity for subsequent analysis.
  • Publication
    Approaching Neural Network Uncertainty Realism
    ( 2019)
    Sicking, Joachim
    ;
    ;
    Fahrland, Matthias
    ;
    ;
    Hüger, Fabian
    ;
    ;
    Schlicht, Peter
    ;
    Statistical models are inherently uncertain. Quantifying or at least upper-bounding their uncertainties is vital for safety-critical systems such as autonomous vehicles. While standard neural networks do not report this information, several approaches exist to integrate uncertainty estimates into them. Assessing the quality of these uncertainty estimates is not straightforward, as no direct ground truth labels are available. Instead, implicit statistical assessments are required. For regression, we propose to evaluate uncertainty realism-a strict quality criterion-with a Mahalanobis distance-based statistical test. An empirical evaluation reveals the need for uncertainty measures that are appropriate to upper-bound heavy-tailed empirical errors. Alongside, we transfer the variational U-Net classification architecture to standard supervised image-to-image tasks. We adopt it to the automotive domain and show that it significantly improves uncertainty realism compared to a plain encoder-decoder model.