Acoustic scene classification by combining autoencoder-based dimensionality reduction and convolutional neural networks

Abeßer, Jakob; Gräfe, Robert; Lukashevich, Hanna; Mimilakis, Stylianos-Ioannis

2017

Conference Paper

Abstract

Motivated by the recent success of deep learning techniques in various audio analysis tasks, this work presents a distributed sensor-server system for acoustic scene classification in urban environments based on deep convolutional neural networks (CNN). Stacked autoencoders are used to compress extracted spectrogram patches on the sensor side before being transmitted to and classified on the server side. In our experiments, we compare two state-of-the-art CNN architectures subject to their classification accuracy under the presence of environmental noise, the dimensionality reduction in the encoding stage, as well as a reduced number of filters in the convolution layers. Our results show that the best model configuration leads to a classification accuracy of 75% for 5 acoustic scenes. We furthermore discuss which confusions among particular classes can be ascribed to particular sound event types, which are present in multiple acoustic scene classes.

Author(s)

Abeßer, Jakob

Gräfe, Robert

Lukashevich, Hanna

Mimilakis, Stylianos-Ioannis

Mainwork

Detection and Classification of Acoustic Scenes and Events Workshop, DCASE 2017. Proceedings

Conference

Workshop on Detection and Classification of Acoustic Scences and Events (DCASE) 2017

Options

Acoustic scene classification by combining autoencoder-based dimensionality reduction and convolutional neural networks