Options
2016
Journal Article
Titel
Integration of Optimized Modulation Filter Sets into Deep Neural Networks for Automatic Speech Recognition
Abstract
Inspired by physiological studies on the human auditory system and by results from psychoacoustics, an amplitude modulation filter bank (AMFB) has been developed and successfully applied to feature extraction for automatic speech recognition (ASR) in earlier work. Here, we address the question as to which amplitude modulation (AM) frequency decomposition leads to optimal ASR performance by proposing a parameterized functional relationship between modulation center frequency and modulation bandwidth. Word error rates (WERs) of ASR experiments with 1551 different AMFBs are systematically evaluated and compared, resulting in the identification of a comparatively narrow range of optimal modulation frequency to modulation bandwidth characteristics. To integrate modulation processing with deep neural network (DNN) acoustic modeling, we propose (1) merging of modulation filter coefficients with DNN weights prior to a final training step and (2) an improved mean-variance normalization scheme for AMFBs.