Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities

: Rosenzweig, Julia; Sicking, Joachim; Houben, Sebastian; Mock, Michael; Akila, Maram


Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computer Society; Computer Vision Foundation -CVF-:
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021. Proceedings : Virtual, 19 - 25 June 2021
Los Alamitos, Calif.: IEEE Computer Society Conference Publishing Services (CPS), 2021
ISBN: 978-1-6654-4899-4
ISBN: 978-1-6654-4900-7
Conference on Computer Vision and Pattern Recognition (CVPR) <2021, Online>
Bundesministerium fur Wirtschaft und Energie BMWi (Deutschland)
VDA Leitinitiative autonomes und vernetztes Fahren; 19A19005X; KI Absicherung - Safe AI for Automated Driving
Bundesministerium für Bildung und Forschung BMBF (Deutschland)
01IS18038B; Machine-Learning Rhein-Ruhr (ML2R)
Fraunhofer IAIS ()

An important pillar for safe machine learning (ML) is the systematic mitigation of weaknesses in neural networks to afford their deployment in critical applications. An ubiquitous class of safety risks are learned shortcuts, i.e. spurious correlations a network exploits for its decisions that have no semantic connection to the actual task. Networks relying on such shortcuts bear the risk of not generalizing well to unseen inputs. Explainability methods help to uncover such network vulnerabilities. However, many of these techniques are not directly applicable if access to the network is constrained, in so-called black-box setups. These setups are prevalent when using third-party ML components. To address this constraint, we present an approach to detect learned shortcuts using an interpreta ble-by-design network as a proxy to the black-box model of interest. Leveraging the proxys guarantees on introspection we automatically extract candidates for learned shortcuts. Their transferability to the black box is validated in a systematic fashion. Concretely, as proxy model we choose a BagNet, which bases its decisions purely on local image patches. We demonstrate on the autonomous driving dataset A2D2 that extracted patch shortcuts significantly influence the black box model. By efficiently identifying such patch-based vulnerabilities, we contribute to safer ML models.