Explainable Acoustic Scene Classification: Making Decisions Audible

Bicer, H. NazimH. NazimBicerGötz, PhilippPhilippGötzTuna, CagdasCagdasTunaHabets, Emanuël A.P.Emanuël A.P.Habets2023-07-132023-07-132022https://publica.fraunhofer.de/handle/publica/44560210.1109/IWAENC53105.2022.99146992-s2.0-85141355779This study presents a sonification method that provides "audible explanations"to improve the transparency of the decision-making processes of convolutional neural networks designed for acoustic scene classification (ASC). First, a deep neural network (DNN) based on the ResNet architecture is proposed. Secondly, Grad-CAM and guided backpropagation images are computed for a given input signal. These images are then used to produce frequency-selective filters that retain signal components in the input that contribute to the decision of the trained DNN. The test results demonstrate that the proposed model outperforms two baseline models. The reconstructed audio waveform is interpretable by the human ear, serving as a valuable tool to examine and possibly improve ASC models.enacoustic scene classificationconvolutional neural networksexplainable artificial intelligenceGrad-CAMguided backpropagationinterpretabilityResNetspectrogramExplainable Acoustic Scene Classification: Making Decisions Audibleconference paper