Options
2018
Conference Paper
Title
Accurately capturing speech feature distributions by extending supervectors for robust speaker recognition
Abstract
Supervectors represent speaker-specific Gaussian Mixture Models which are enrolled from a Universal Background Model (UBM) and approximate the unknown, underlying speech feature distributions. But as supervectors only consist of the stacked means of the Gaussian components, lowdimensional i-vectors which are derived from them do not completely capture the true feature distributions. In this work, the classical supervectors are extended with additional parameters before reducing their dimension to capture the feature distributions more accurately and complement the i-vectors more effectively. To extend a supervector, the mixture weights, the log-likelihood values of the UBM, a Bhattacharyya-distance based kernel and the Hellinger distance between each enrolled Gaussian component and the corresponding one of the UBM are used. In closed-set speaker identification experiments conducted on the NTIMIT corpus which consists of telephone quality speech, the extended supervectors provide significantly lower error rates than the standard supervectors, even after fusing them with i-vectors and the UBM.
Conference