Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Accurately capturing speech feature distributions by extending supervectors for robust speaker recognition

: Wilkinghoff, Kevin

Doclo, S. ; Informationstechnische Gesellschaft -ITG-, Fachausschuss Sprachakustik; Informationstechnische Gesellschaft -ITG-:
Speech communication. 13. ITG-Fachtagung Sprachkommunikation 2018 : 10.- 12. Oktober 2018, Oldenburg, CD-ROM
Berlin: VDE-Verlag, 2018 (ITG-Fachbericht 282)
ISBN: 978-3-8007-4767-2
Fachtagung Sprachkommunikation <13, 2018, Oldenburg>
Conference Paper
Fraunhofer FKIE ()

Supervectors represent speaker-specific Gaussian Mixture Models which are enrolled from a Universal Background Model (UBM) and approximate the unknown, underlying speech feature distributions. But as supervectors only consist of the stacked means of the Gaussian components, lowdimensional i-vectors which are derived from them do not completely capture the true feature distributions. In this work, the classical supervectors are extended with additional parameters before reducing their dimension to capture the feature distributions more accurately and complement the i-vectors more effectively. To extend a supervector, the mixture weights, the log-likelihood values of the UBM, a Bhattacharyya-distance based kernel and the Hellinger distance between each enrolled Gaussian component and the corresponding one of the UBM are used. In closed-set speaker identification experiments conducted on the NTIMIT corpus which consists of telephone quality speech, the extended supervectors provide significantly lower error rates than the standard supervectors, even after fusing them with i-vectors and the UBM.