Now showing 1 - 6 of 6
  • Publication
    Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey
    Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized as being non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificially generated datasets, which often do not reflect reality. By basing decision-making algorithms on Deep Neural Networks, prejudice and unfairness may be promoted unknowingly due to a lack of transparency. Hence, several so-called explanators, or explainers, have been developed. Explainers try to give insight into the inner structure of machine learning black boxes by analyzing the connection between the input and output. In this survey, we present the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about the taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas.
  • Publication
    A comparison of deep saliency map generators on multispectral data in object detection
    Deep neural networks, especially convolutional deep neural networks, are state-of-the-art methods to classify, segment or even generate images, movies, or sounds. However, these methods lack of a good semantic understanding of what happens internally. The question, why a COVID-19 detector has classified a stack of lung-ct images as positive, is sometimes more interesting than the overall specificity and sensitivity. Especially when human domain expert knowledge disagrees with the given output. This way, human domain experts could also be advised to reconsider their choice, regarding the information pointed out by the system. In addition, the deep learning model can be controlled, and a present dataset bias can be found. Currently, most explainable AI methods in the computer vision domain are purely used on image classification, where the images are ordinary images in the visible spectrum. As a result, there is no comparison on how the methods behave with multimodal image data, as well as most methods have not been investigated on how they behave when used for object detection. This work tries to close the gaps by investigating three saliency map generator methods on how their maps differ in the different spectra. This is achieved via an accurate and systematic training. Additionally, we examine how they perform when used for object detection. As a practical problem, we chose object detection in the infrared and visual spectrum for autonomous driving. The dataset used in this work, is the Multispectral Object Detection Dataset,1 where each scene is available in the long-wave (FIR), mid-wave (MIR) and short-wave (NIR) infrared as well as the visual (RGB) spectrum. The results show, that there are differences between the infrared and visual activation maps. Further, an advanced training with both, the infrared and visual data not only improves the network's output, it also leads to more focused spots in the saliency maps.
  • Publication
    Automated license plate detection for image anonymization
    Images or videos recorded in public areas may contain personal data such as license plates. According to German law, one is not allowed to save the data without permission of the affected people or an immediate anonymization of personal information in the recordings. As asking for and obtaining permission is practically impossible for one thing and then again, manual anonymization time consuming, an automated license plate detection and localization system is developed. For the implementation, a two-stage neural net approach is chosen that hierarchically combines a YOLOv3 model for vehicle detection and another YOLOv3 model for license plate detection. The model is trained using a specifically composed dataset that includes synthesized images, the usage of low-quality or non-annotated datasets as well as data augmentation methods. The license plate detection system is quantitatively and qualitatively evaluated, yielding an average precision (AP) of 98.73% for an intersection over union threshold of 0.3 on the openALPR dataset and showing an outstanding robustness even for rotated, small scaled or partly covered license plates.
  • Publication
    Augmentation techniques for video surveillance in the visible and thermal spectral range
    In intelligent video surveillance, cameras record image sequences during day and night. Commonly, this demands different sensors. To achieve a better performance it is not unusual to combine them. We focus on the case that a long-wave infrared camera records continuously and in addition to this, another camera records in the visible spectral range during daytime and an intelligent algorithm supervises the picked up imagery. More accurate, our task is multispectral CNN-based object detection. At first glance, images originating from the visible spectral range differ between thermal infrared ones in the presence of color and distinct texture information on the one hand and in not containing information about thermal radiation that emits from objects on the other hand. Although color can provide valuable information for classification tasks, effects such as varying illumination and specialties of different sensors still represent significant problems. Anyway, obtaining sufficient and practical thermal infrared datasets for training a deep neural network poses still a challenge. That is the reason why training with the help of data from the visible spectral range could be advantageous, particularly if the data, which has to be evaluated contains both visible and infrared data. However, there is no clear evidence of how strongly variations in thermal radiation, shape, or color information influence classification accuracy. To gain deeper insight into how Convolutional Neural Networks make decisions and what they learn from different sensor input data, we investigate the suitability and robustness of different augmentation techniques. We use the publicly available large-scale multispectral ThermalWorld dataset consisting of images in the long-wave infrared and visible spectral range showing persons, vehicles, buildings, and pets and train for image classification a Convolutional Neural Network. The training data will be augmented with several modifications based on their different properties to find out which ones cause which impact and lead to the best classification performance.
  • Publication
    An architecture for automatic multimodal video data anonymization to ensure data protection
    To perform a data protection concept for our mobile sensor platform (MODISSA), we designed and implemented an anonymization pipeline. This pipeline contains plugins for reading, modifying, and writing different image formats, as well as methods to detect the regions that should be anonymized. This includes a method to determine head positions and an object detector for the license plates, both based on state of the art deep learning methods. These methods are applied for all image sensors on the platform, no matter if they are panoramic RGB, thermal IR, or grayscale cameras. In this paper we focus on the whole face anonymization process. We determine the face region to anonymize on the basis of body pose estimates from OpenPose what proved to lead to robust results. Our anonymization pipeline achieves nearly human performance, with almost no human resources spent. However, to gain perfect anonymization a quick additional human interactive postprocessing step can be performed. We evaluated our pipeline quantitatively and qualitatively on urban example data recorded with MODISSA.
  • Publication
    Evaluating the Impact of Color Information in Deep Neural Networks
    Color images are omnipresent in everyday life. In particular, they provide the only necessary input for deep neural network pipelines, which are continuously being employed for image classification and object recognition tasks. Although color can provide valuable information, effects like varying illumination and specialties of different sensors still pose significant problems. However, there is no clear evidence how strongly variations in color information influence classification performance throughout rearward layers. To gain a deeper insight about how Convolutional Neural Networks make decisions and what they learn from input images, we investigate in this work suitability and robustness of different color augmentation techniques. We considered several established benchmark sets and custom-made pedestrian and background datasets. While decreasing color or saturation information we explore the activation differences in the rear layers and the stability of confidence values. We show that Luminance is most robust against changing color system in test images irrespective of degraded texture or not. Finally, we present the coherence between color dependence and properties of the regarded datasets and classes.