Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

General-Purpose Audio Tagging by Ensembling Convolutional Neural Networks based on Multiple Features

: Wilkighoff, Kevin

Fulltext (PDF; )

Plumbley, M.D. ; Tampere University of Technology:
Detection and Classification of Acoustic Scenes and Events Workshop, DCASE 2018. Proceedings : Woking, Surrey, UK, 19-20 November 2018
Tampere: Tampere University of Technology, 2018
ISBN: 978-952-15-4262-6
5 pp.
Workshop on Detection and Classification of Acoustic Scences and Events (DCASE) <3, 2018, Surrey>
Conference Paper, Electronic Publication
Fraunhofer FKIE ()

This paper describes an audio tagging system that participated in Task 2 “General-purpose audio tagging of Freesound content with AudioSet labels” of the “Detection and Classification of Acoustic Scenes and Events (DCASE)” Challenge 2018. The system is an ensemble consisting of five convolutional neural networks based on Mel-frequency Cepstral Coefficients, Perceptual Linear Prediction features, Mel-spectrograms and the raw audio data. For ensembling all models, score-based fusion via Logistic Regression is performed with another neural network. In experimental evaluations, it is shown that ensembling the models significantly improves upon the performances obtained with the individual models. As a final result, the system achieved a Mean Average Precision with Cutoff 3 of 0:9414 on the private leaderboard of the challenge.