Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Integration of Optimized Modulation Filter Sets into Deep Neural Networks for Automatic Speech Recognition

: Moritz, Niko; Kollmeier, Birger; Anemüller, Jörn


IEEE ACM transactions on audio, speech, and language processing 24 (2016), Nr.12, S.2439-2452
ISSN: 2329-9290
ISSN: 2329-9304
Fraunhofer IDMT ()

Inspired by physiological studies on the human auditory system and by results from psychoacoustics, an amplitude modulation filter bank (AMFB) has been developed and successfully applied to feature extraction for automatic speech recognition (ASR) in earlier work. Here, we address the question as to which amplitude modulation (AM) frequency decomposition leads to optimal ASR performance by proposing a parameterized functional relationship between modulation center frequency and modulation bandwidth. Word error rates (WERs) of ASR experiments with 1551 different AMFBs are systematically evaluated and compared, resulting in the identification of a comparatively narrow range of optimal modulation frequency to modulation bandwidth characteristics. To integrate modulation processing with deep neural network (DNN) acoustic modeling, we propose (1) merging of modulation filter coefficients with DNN weights prior to a final training step and (2) an improved mean-variance normalization scheme for AMFBs.