Accurately capturing speech feature distributions by extending supervectors for robust speaker recognition

Wilkinghoff, Kevin

2018

Conference Paper

Abstract

Supervectors represent speaker-specific Gaussian Mixture Models which are enrolled from a Universal Background Model (UBM) and approximate the unknown, underlying speech feature distributions. But as supervectors only consist of the stacked means of the Gaussian components, lowdimensional i-vectors which are derived from them do not completely capture the true feature distributions. In this work, the classical supervectors are extended with additional parameters before reducing their dimension to capture the feature distributions more accurately and complement the i-vectors more effectively. To extend a supervector, the mixture weights, the log-likelihood values of the UBM, a Bhattacharyya-distance based kernel and the Hellinger distance between each enrolled Gaussian component and the corresponding one of the UBM are used. In closed-set speaker identification experiments conducted on the NTIMIT corpus which consists of telephone quality speech, the extended supervectors provide significantly lower error rates than the standard supervectors, even after fusing them with i-vectors and the UBM.

Author(s)

Wilkinghoff, Kevin

Mainwork

Speech communication. 13. ITG-Fachtagung Sprachkommunikation 2018

Conference

Fachtagung Sprachkommunikation 2018

Options

Accurately capturing speech feature distributions by extending supervectors for robust speaker recognition