Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Multitask sequence-to-sequence models for grapheme-to-phoneme conversion

: Milde, Benjamin; Schmidt, Christoph Andreas; Köhler, Joachim

Fulltext urn:nbn:de:0011-n-4769796 (179 KByte PDF)
MD5 Fingerprint: 0141c47eb23cb1254657f0065028b315
Created on: 15.12.2017

Lacerda, F. ; International Speech Communication Association -ISCA-:
Interspeech 2017. Online resource : 20-24 August 2017, Stockholm
Stockholm, 2017
DOI: 10.21437/Interspeech.2017
International Speech Communication Association (Interspeech Annual Conference) <2017, Stockholm>
Conference Paper, Electronic Publication
Fraunhofer IAIS ()
grapheme to phoneme; G2P; sequence to sequence; Seq2Seq; multitask learning

Recently, neural sequence-to-sequence (Seq2Seq) models have been applied to the problem of grapheme-to-phoneme (G2P) conversion. These models offer a straightforward way of modeling the conversion by jointly learning the alignment and translation of input to output tokens in an end-to-end fashion. However, until now this approach did not show improved error rates on its own compared to traditional joint-sequence based n-gram models for G2P. In this paper, we investigate how multitask learning can improve the performance of Seq2Seq G2P models. A single Seq2Seq model is trained on multiple phoneme lexicon datasets containing multiple languages and phonetic alphabets. Although multi-language learning does not show improved error rates, combining standard datasets and crawled data with different phonetic alphabets of the same language shows promising error reductions on English and German Seq2Seq G2P conversion. Finally, combining Seq2seq G2P models with standard n-grams based models yields significant improvements over using either model alone.