Mapping representations of speaker characteristics using deep learning

Tanuadji, Maureen

doi:10.24406/publica-fhg-281512

2016

Master Thesis

Abstract

An automatic model-based system is proposed to estimate the corner vowel formant frequencies and the acoustic measure known as the triangle Vowels Space Area (tVSA) directly from unlabeled natural speech. The proposed algorithm is able to estimate the tVSA automatically from the speech signal without phonetical or vowel transcriptions. The i-Vector features are employed as the speaker characteristic representation from which the formant frequencies of the corner vowels of the speaker are estimated by regression classiffiers. Two regression classiffiers, Deep Neural Networks (DNN) and Support Vector Regression (SVR) are investigated in this thesis. The best configuration uses the SVR, which is able to predict the formant frequencies of the test speakers with evaluation measures R2 up to 0 .56719 and rho up to 0.76485.

Thesis Note

Aachen, TH, Master Thesis, 2016

Author(s)

Tanuadji, Maureen

Publishing Place

Aachen

Project(s)

i-Prognosis

Funder

European Commission

Options

Mapping representations of speaker characteristics using deep learning