Probabilistic spatial filter estimation for signal enhancement in multi-channel automatic speech recognition

Kayser, Hendrik; Moritz, Niko; Anemüller, Jörn

doi:10.21437/Interspeech.2016-1340

2016

Conference Paper

Abstract

Speech recognition in multi-channel environments requires target speaker localization, multi-channel signal enhancement and robust speech recognition. We here propose a system that addresses these problems: Localization is performed with a recently introduced probabilistic localization method that is based on support-vector machine learning of GCC-PHAT weights and that estimates a spatial source probability map. The main contribution of the present work is the introduction of a probabilistic approach to (re-)estimation of location-specific steering vectors based on weighting of observed inter-channel phase differences with the spatial source probability map derived in the localization step. Subsequent speech recognition is carried out with a DNN-HMM system using amplitude modulation filter bank (AMFB) acoustic features which are robust to spectral distortions introduced during spatial filtering. The system has been evaluated on the CHIME-3 multi-channel ASR dataset. Recognition was carried out with and without probabilistic steering vector re-estimation and with MVDR and delay-and-sum beamforming, respectively. Results indicate that the system attains on real-world evaluation data a relative improvement of 31.98% over the baseline and of 21.44% over a modified baseline. We note that this improvement is achieved without exploiting oracle knowledge about speech/non-speech intervals for noise covariance estimation (which is, however, assumed for baseline processing).

Author(s)

Kayser, Hendrik

Moritz, Niko

Anemüller, Jörn

Mainwork

Understanding speech processing in humans and machines. Vol.4

Conference

International Speech Communication Association (Interspeech Annual Conference) 2016

Options

Probabilistic spatial filter estimation for signal enhancement in multi-channel automatic speech recognition