Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Automatic estimation of the triangular vowel space area from i-Vectors

 
: Tanuadji, Maureen; Stadtschnitzer, Michael; Bardeli, Rolf; Jaeger, Hagen

Doclo, S. ; Informationstechnische Gesellschaft -ITG-, Fachausschuss Sprachakustik; Informationstechnische Gesellschaft -ITG-:
Speech communication. 13. ITG-Fachtagung Sprachkommunikation 2018 : 10.- 12. Oktober 2018, Oldenburg, CD-ROM
Berlin: VDE-Verlag, 2018 (ITG-Fachbericht 282)
ISBN: 978-3-8007-4767-2
5 S.
Fachtagung Sprachkommunikation <13, 2018, Oldenburg>
European Commission EC
H2020; 690494; i-Prognosis
Intelligent Parkinson eaRly detectiOn Guiding NOvel Supportive InterventionS
Englisch
Konferenzbeitrag
Fraunhofer IAIS ()
triangular vowel space area; i-Vectors; Parkinson's disease

Abstract
Parkinsons Disease (PD) is a neurodegenerative disorder which gradually effects the neurological condition of the patient. In many cases the disease impairs the reliability of the articulatory system and the ability to pronounce vowels normally. One prominent way to measure the degree of the functioning of the articulatory system is the Vowel Space Area (VSA). However, the typical way to measure it, is to manually annotate sustained vowel recordings or phonetically annotated speech utterances of a speaker and then analyze the signals. However, it is often desirable to measure the VSA directly from unlabeled natural speech. Therefore an automatic model-based system is proposed in this paper to estimate the triangular Vowels Space Area (tVSA) and the underlying corner vowel formant frequencies directly from unlabeled natural speech. The proposed algorithm is able to estimate the tVSA automatically from the speech signals without the need of phonetical or vowel transcriptions. The i-Vectors are extracted from the signals as the speakers characteristic representation, from which the speakers corner vowel formant frequencies are estimated by regression classifiers. Two regression classifiers, namely Deep Neural Networks (DNN) and Support Vector Regression (SVR), are investigated in this work. The proposed configuration employs the SVR classifier, which is able to predict the corner vowel formant frequencies of the test speakers with R2 up to 0.56719 and p up to 0.76485.

: http://publica.fraunhofer.de/dokumente/N-531369.html