Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Mapping representations of speaker characteristics using deep learning

 
: Tanuadji, Maureen

:
Volltext urn:nbn:de:0011-n-4424897 (1.0 MByte PDF)
MD5 Fingerprint: b8a6a7cd81c58fa099cec07c5064e93d
Erstellt am: 3.5.2017


Aachen, 2016, IV, 44 S.
Aachen, TH, Master Thesis, 2016
European Commission EC
H2020; 690494; i-Prognosis
Englisch
Master Thesis, Elektronische Publikation
Fraunhofer IAIS ()

Abstract
An automatic model-based system is proposed to estimate the corner vowel formant frequencies and the acoustic measure known as the triangle Vowels Space Area (tVSA) directly from unlabeled natural speech. The proposed algorithm is able to estimate the tVSA automatically from the speech signal without phonetical or vowel transcriptions. The i-Vector features are employed as the speaker characteristic representation from which the formant frequencies of the corner vowels of the speaker are estimated by regression classiffiers. Two regression classiffiers, Deep Neural Networks (DNN) and Support Vector Regression (SVR) are investigated in this thesis. The best configuration uses the SVR, which is able to predict the formant frequencies of the test speakers with evaluation measures R2 up to 0 .56719 and rho up to 0.76485.

: http://publica.fraunhofer.de/dokumente/N-442489.html