Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

On open-set classification with L3-net embeddings for machine listening applications

 
: Wilkinghoff, K.

:

Heusdens, Richard (Hrsg.) ; European Association for Signal Processing -EURASIP-:
28th European Signal Processing Conference, EUSIPCO 2020. Proceedings : 24-28 August 2020, Amsterdam, the Netherlands
Amsterdam: EURASIP, 2020
ISBN: 978-9-0827-9705-3
ISBN: 978-9-08279-704-6
ISBN: 978-1-7281-5001-7
pp.800-804
European Signal Processing Conference (EUSIPCO) <28, 2020, Amsterdam/cancelled>
European Signal Processing Conference (EUSIPCO) <28, 2021, Online>
English
Conference Paper
Fraunhofer FKIE ()

Abstract
Obtaining labeled data for machine listening applications is expensive because labeling audio data requires humans listening to recordings. However, state-of-the-art deep learning based systems usually require large amounts of labeled data to be trained with. A solution for this problem is to train a neural network with a large collection of unlabeled data to extract embeddings and then use these embeddings to train a shallow classifier on a small but labeled dataset suitable for the application. One example are Look, Listen, and Learn (L3-Net) embeddings, which are trained self-supervised to capture audio-visual correspondence in videos. Since shallow classifiers are trained discriminatively and thus tacitly assume a closed-set classification task, they do not perform well in open-set classification tasks. In this paper, a neural network that combines all L3-Net embeddings belonging to one recording into a single vector by using an x-vector mechanism as well as an open -set classification system based on that are presented. In experiments conducted on the open-set acoustic scene classification task belonging to the DCASE challenge 2019, the proposed system significantly outperforms a shallow discriminative classifier and all other previously published systems, while at the same time performing equally well as a shallow classifier on multiple closed-set machine listening datasets.

: http://publica.fraunhofer.de/documents/N-637578.html