Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Mapping representations of speaker characteristics using deep learning

: Tanuadji, Maureen

Fulltext urn:nbn:de:0011-n-4424897 (1.0 MByte PDF)
MD5 Fingerprint: b8a6a7cd81c58fa099cec07c5064e93d
Created on: 3.5.2017

Aachen, 2016, IV, 44 pp.
Aachen, TH, Master Thesis, 2016
European Commission EC
H2020; 690494; i-Prognosis
Master Thesis, Electronic Publication
Fraunhofer IAIS ()

An automatic model-based system is proposed to estimate the corner vowel formant frequencies and the acoustic measure known as the triangle Vowels Space Area (tVSA) directly from unlabeled natural speech. The proposed algorithm is able to estimate the tVSA automatically from the speech signal without phonetical or vowel transcriptions. The i-Vector features are employed as the speaker characteristic representation from which the formant frequencies of the corner vowels of the speaker are estimated by regression classiffiers. Two regression classiffiers, Deep Neural Networks (DNN) and Support Vector Regression (SVR) are investigated in this thesis. The best configuration uses the SVR, which is able to predict the formant frequencies of the test speakers with evaluation measures R2 up to 0 .56719 and rho up to 0.76485.