Now showing 1 - 4 of 4
  • Publication
    Beyond Test Accuracy: The Effects of Model Compression on CNNs
    ( 2022) ;
    Schwienbacher, Kristian
    ;
    Model compression is widely employed to deploy convolutional neural networks on devices with limited computational resources or power limitations. For high stakes applications, such as autonomous driving, it is, however, important that compression techniques do not impair the safety of the system. In this paper, we therefore investigate the changes introduced by three compression methods - post-training quantization, global unstructured pruning, and the combination of both - that go beyond the test accuracy. To this end, we trained three image classifiers on two datasets and compared them regarding their performance on the class level and regarding their attention to different input regions. Although the deviations in test accuracy were minimal, our results show that the considered compression techniques introduce substantial changes to the models that reflect in the quality of predictions of individual classes and in the salience of input regions. While we did not observe the introduction of systematic errors or biases towards certain classes, these changes can significantly impact the failure modes of CNNs and thus are highly relevant for safety analyses. We therefore conclude that it is important to be aware of the changes caused by model compression and to already consider them in the early stages of the development process.
  • Publication
    Measuring Ensemble Diversity and its Effects on Model Robustness
    Deep ensembles have been shown to perform well on a variety of tasks in terms of accuracy, uncertainty estimation, and further robustness metrics. The diversity among ensemble members is often named as the main reason for this. Due to its complex and indefinite nature, diversity can be expressed by a multitude of metrics. In this paper, we aim to explore the relation of a selection of these diversity metrics among each other, as well as their link to different measures of robustness. Specifically, we address two questions: To what extent can ensembles with the same training conditions differ in their performance and robustness? And are diversity metrics suitable for selecting members to form a more robust ensemble? To this end, we independently train 20 models for each task and compare all possible ensembles of 5 members on several robustness metrics, including the performance on corrupted images, out-of-distribution detection, and quality of uncertainty estimation. Our findings reveal that ensembles trained with the same conditions can differ significantly in their robustness, especially regarding out-of-distribution detection capabilities. Across all setups, using different datasets and model architectures, we see that, in terms of robustness metrics, choosing ensemble members based on the considered diversity metrics seldom exceeds the baseline of a selection based on the accuracy. We conclude that there is significant potential to improve the formation of robust deep ensembles and that novel and more sophisticated diversity metrics could be beneficial in that regard.
  • Publication
    Benchmarking Uncertainty Estimation Methods for Deep Learning with Safety-Related Metrics
    Deep neural networks generally perform very well on giving accurate predictions, but they often lack in recognizing when these predictions may be wrong. This absence of awareness regarding the reliability of given outputs is a big obstacle in deploying such models in safety-critical applications. There are certain approaches that try to address this problem by designing the models to give more reliable values for their uncertainty. However, even though the performance of these models are compared to each other in various ways, there is no thorough evaluation comparing them in a safety-critical context using metrics that are designed to describe trade-offs between performance and safe system behavior. In this paper we attempt to fill this gap by evaluating and comparing several state-of-the-art methods for estimating uncertainty for image classifcation with respect to safety-related requirements and metrics that are suitable to describe the models performance in safety-critical domains. We show the relationship of remaining error for predictions with high confidence and its impact on the performance for three common datasets. In particular, Deep Ensembles and Learned Confidence show high potential to significantly reduce the remaining error with only moderate performance penalties.
  • Publication
    Is Uncertainty Quantification in Deep Learning Sufficient for Out-of-Distribution Detection?
    Reliable information about the uncertainty of predictions from deep neural networks could greatly facilitate their utilization in safety-critical applications. Current approaches for uncertainty quantification usually focus on in-distribution data, where a high uncertainty should be assigned to incorrect predictions. In contrast, we focus on out-of-distribution data where a network cannot make correct predictions and therefore should always report high uncertainty. In this paper, we compare several state-of-the-art uncertainty quantification methods for deep neural networks regarding their ability to detect novel inputs. We evaluate them on image classification tasks with regard to metrics reflecting requirements important for safety-critical applications. Our results show that a portion of out-of-distribution inputs can be detected with reasonable loss in overall accuracy. However, current uncertainty quantification approaches alone are not sufficient for an overall reliable out-of-distribution detection.