Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Joint estimation of pitch and direction of arrival: Improving robustness and accuracy for multi-speaker scenarios

: Gerlach, S.; Bitzer, J.; Goetze, S.; Doclo, S.

Volltext (PDF; )

EURASIP Journal on audio, speech, and music processing : EURASIP JASMP 2014 (2014), Art. 31, 17 S.
ISSN: 1687-4714
ISSN: 1687-4722
Zeitschriftenaufsatz, Elektronische Publikation
Fraunhofer IDMT ()

In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have been proposed in the last decades. In this paper, we propose several extensions to a recently presented joint direction of arrival (DOA) and pitch estimation method, increasing its robustness in multi-speaker scenarios, noise, and reverberation. First, a spectral comb filter is added to the original algorithm to better cope with concurrent speakers. Second, the well-known generalized cross-correlation with phase transform (GCC-PHAT) is used as an additional weighting function to improve the DOA estimation accuracy in terms of correct hits. Third, using multiple microphone pairs, the multi-channel cross-correlation approach is incorporated to improve the robustness against noise and reverberation. In order to improve tracking for moving and even intersecting speakers, a particle filter is used. Experiments with real-world recordings in realistic acoustic conditions show that the proposed extensions increase the DOA hit rate by about 33% compared to the original algorithm for two step-wise moving sources at a signal-to-noise ratio (SNR) of 15 dB and a reverberation time RT60 of 560 ms.