Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Should deep neural nets have ears?

The role of auditory features in deep learning approaches
: Martinez, Angel Mario Castro; Moritz, Niko; Meyer, Bernd T.

Fulltext (PDF; )

International Speech Communication Association -ISCA-:
INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association. Online resource : Singapore, September 14-18, 2014
Singapore, 2014
International Speech Communication Association (INTERSPEECH Annual Conference) <15, 2014, Singapore>
Conference Paper, Electronic Publication
Fraunhofer IDMT ()

Features inspired by the auditory system have previously demonstrated improvement in automatic speech recognition (ASR). Similarly, the use of Deep Neural Networks (DNN) was found to outperform classic approaches to ASR in many conditions. Since DNNs have the potential to learn the task relevant features from a conventional filter bank output, we investigate if the combination of auditory features and deep learning should be preferred over self-learned patterns. Specifically, noise-robust Gabor features and Amplitude Modulation Filter-Bank (AMFB) features, highly invariant against reverberation, are used as input to a state-of-the-art ASR system incorporating DNN processing. On the Aurora-4 task, both mel-frequency cepstral coefficients (MFCC) and filter bank (FBank) features are outperformed in many acoustic conditions through auditory processing, yielding average relative improvements of up to 69% over MFCC and 21% over the commonly used DNN-FBank setup. This highlights the mutual benefit of auditory signal processing and recent advances in machine learning.