Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Improving Grapheme-to-Phoneme Conversion for Anglicisms in German Speech Recognition

: Pritzen, Julia Maria
: Zühlke, Dietlind; Gref, Michael

Volltext urn:nbn:de:0011-n-6346777 (2.8 MByte PDF)
MD5 Fingerprint: d705b2277829e4999e49c9b2b0597b92
Erstellt am: 6.5.2021

Köln, 2021, XI, 128 S.
Köln, TH, Master Thesis, 2021
Master Thesis, Elektronische Publikation
Fraunhofer IAIS ()
automatic speech recognition; ASR; Anglicisms; loanwords; grapheme-to-phoneme; G2P; phoneme-to-phoneme; P2P; sequence-to-sequence; Seq2Seq; multitask learning; MTL

This work designs and evaluates methods for improving the recognition of anglicisms in German speech recognition. Focusing on the pronunciation dictionary of an ASR system, three approaches were designed and implemented for creating supplementary anglicism pronunciation dictionaries. In the first approach, anglicism pronunciations were directly derived from the German Wiktionary. In the second approach, anglicism pronunciations were generated with both a German and an English G2P model. By comparing the confidence measures, the respective best pronunciation was chosen to be added to the resulting anglicism pronunciation dictionary. An additional P2P model was created for this approach that maps English phonemes to their German equivalents. In the third approach, multitask learning was util ized by adding an additional anglicism classification task to a German Seq2Seq G2P model. By distinguishing anglicisms and native German words, the G2P model was able to generate different pronunciations for each respective case. For each resulting anglicism pronunciation dictionary, a dedicated ASR model was created with similar settings. All ASR models including a baseline model were evaluated on a dedicated anglicism test set and two additional German test sets from the broadcast domain to prevent performance issues in other use cases. Ten out of thirteen models performed better than the baseline. The best model resulted from the comparative approach. For the anglicism test set, the WER could be decreased by 0.21 percentage points with 22 more anglicism being recognized compared to the baseline model. The mean WER based on all test sets was decreased by 0.08 percentage points. More anglicism data of better quality and refined model implementations are needed to further improve the anglicism recognition results.