Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Harmonic cues for number of simultaneous speakers estimation

: Rafi, U.; Bardeli, R.

Dittmar, C. ; Audio Engineering Society -AES-:
53rd International Conference on Semantic Audio 2014 : London, United Kingdom 26 – 29 January 2014
Red Hook, NY: Curran, 2014
ISBN: 978-1-63266-284-2
ISBN: 1-63266-284-1
International Conference on Semantic Audio <53, 2014, London>
Fraunhofer IAIS ()

Overlapped speech, where several speakers are speaking simultaneously, is a common occurence in multiparty discussions such as meetings. This kind of speech presents a great challenge to automatic speech processing systems such as speech recognition systems and speaker diarisation systems. In recent speaker diarisation systems, a large portion of the remaining error comes from overlapped speech. So far little work has been done on detecting overlapped speech and the number of speakers present in overlapped speech. In this paper we first describe a model-based approach for estimating the number of simultaneous speakers. Then, we propose a new approach called Spectral Peak Clustering where instead of training statistical models we extract spectral peaks from the input data and then cluster them into components by using a similarity measure between peaks where each component represents a speaker present in the input data.