Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Data-driven pronunciation modeling of swiss german dialectal speech for automatic speech recognition

 
: Stadtschnitzer, Michael; Schmidt, Christoph Andreas

:
Volltext urn:nbn:de:0011-n-4942031 (286 KByte PDF)
MD5 Fingerprint: 9cd593bab52dde56b9c25a8198571828
(CC) by-nc
Erstellt am: 23.5.2018


Calzolari, N. ; European Language Resources Association -ELRA-, Paris:
LREC 2018, Eleventh International Conference on Language Resources and Evaluation. Proceedings. Online resource : May 7-12, 2018, Phoenix Seagaia Conference Center Miyazaki, Japan
Paris: ELRA, 2018
ISBN: 979-10-95546-00-9
S.3152-3156
International Conference on Language Resources and Evaluation (LREC) <11, 2018, Miyazaki>
Englisch
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IAIS ()
robust speech recognition; Swiss German; dialectal speech; data-driven pronunciation modeling

Abstract
Automatic speech recognition is a requested technique in many fields like automatic subtitling, dialogue systems and information retrieval systems. The training of an automatic speech recognition system is usually straight forward given a large annotated speech corpus for acoustic modeling, a phonetic lexicon, and a text corpus for the training of a language model. However, in some use cases these resources are not available. In this work, we discuss the training of a Swiss German speech recognition system. The only resources that are available is a small size audio corpus, containing the utterances of highly dialectical Swiss German speakers, annotated with a standard German transcription. The desired output of the speech recognizer is again standard German, since there is no other official or standardized way to write Swiss German. We explore strategies to cope with the mismatch between the dialectal pronunciation and the standard German annotation. A Swiss German speech recognizer is trained by adapting a standard German model, based on a Swiss German grapheme-to-phoneme conversion model, which was learned in a data-driven manner. Also, Swiss German speech recognition systems are created, with the pronunciation based on graphemes, standard German pronunciation and with a data-driven Swiss German pronunciation model. The results of the experiments are promising for this challenging task.

: http://publica.fraunhofer.de/dokumente/N-494203.html