Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Combination strategy based on relative performance monitoring for multi-stream reverberant speech recognition

: Xiong, F.; Goetze, S.; Meyer, B.T.


Institute of Electrical and Electronics Engineers -IEEE-; IEEE Signal Processing Society:
IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017. Proceedings : March 5-9, 2017, Hilton New Orleans Riverside, New Orleans, Louisiana, USA
Piscataway, NJ: IEEE, 2017
ISBN: 978-1-5090-4117-6
ISBN: 978-1-5090-4116-9
ISBN: 978-1-5090-4118-3
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) <42, 2017, New Orleans/La.>
Fraunhofer IDMT ()

A multi-stream framework with deep neural network (DNN) classifiers is applied to improve automatic speech recognition (ASR) in environments with different reverberation characteristics. We propose a room parameter estimation model to establish a reliable combination strategy which performs on either DNN posterior probabilities or word lattices. The model is implemented by training a multilayer perceptron incorporating auditory-inspired features in order to distinguish between and generalize to various reverberant conditions, and the model output is shown to be highly correlated to ASR performances between multiple streams, i.e., relative performance monitoring, in contrast to conventional mean temporal distance based performance monitoring for a single stream. Compared to traditional multi-condition training, average relative word error rate improvements of 7.7% and 9.4% have been achieved by the proposed combination strategies performing on posteriors and lattices, respectively, when the multi-stream ASR is tested in known and unknown simulated reverberant environments as well as realistically recorded conditions taken from REVERB Challenge evaluation set.