
Publica
Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten. Multitask sequence-to-sequence models for grapheme-to-phoneme conversion
:
Fulltext urn:nbn:de:0011-n-4769796 (179 KByte PDF) MD5 Fingerprint: 0141c47eb23cb1254657f0065028b315 Created on: 15.12.2017 |
| Lacerda, F. ; International Speech Communication Association -ISCA-: Interspeech 2017. Online resource : 20-24 August 2017, Stockholm Stockholm, 2017 http://www.isca-speech.org/archive/Interspeech_2017/ DOI: 10.21437/Interspeech.2017 pp.2536-2540 |
| International Speech Communication Association (Interspeech Annual Conference) <2017, Stockholm> |
|
| English |
| Conference Paper, Electronic Publication |
| Fraunhofer IAIS () |
| grapheme to phoneme; G2P; sequence to sequence; Seq2Seq; multitask learning |
Abstract
Recently, neural sequence-to-sequence (Seq2Seq) models have been applied to the problem of grapheme-to-phoneme (G2P) conversion. These models offer a straightforward way of modeling the conversion by jointly learning the alignment and translation of input to output tokens in an end-to-end fashion. However, until now this approach did not show improved error rates on its own compared to traditional joint-sequence based n-gram models for G2P. In this paper, we investigate how multitask learning can improve the performance of Seq2Seq G2P models. A single Seq2Seq model is trained on multiple phoneme lexicon datasets containing multiple languages and phonetic alphabets. Although multi-language learning does not show improved error rates, combining standard datasets and crawled data with different phonetic alphabets of the same language shows promising error reductions on English and German Seq2Seq G2P conversion. Finally, combining Seq2seq G2P models with standard n-grams based models yields significant improvements over using either model alone.