Options
2025
Master Thesis
Title
Multilingual Automatic Phonetic Transcription - a Linguistic Investigation of its Performance on German and Approaches to Improving the State of the Art
Abstract
Phonetic transcription represents the pronunciation of speech in a language-independent script. Accurate manual transcriptions require expert knowledge and are time-consuming. Automatic phonetic transcription (APT) can help reduce the high cost of phonetic transcription, but it is still limited by training data scarcity and quality.
This work investigates the highest performing multilingual APT models to find ways to improve them. For this, we seek to improve the modeling for a single target language, German, while aiming to maintain the language-independent accuracy. After the initial investigation, we develop the transcription bootstrapping approach "selective augmentation" and apply it to a model based on the state of the art "MultIPA". Using this approach, we exemplarily improve plosive phonation recognition including the addition of aspiration recognition by selectively transferring plosive phonation information from a helper model trained with Hindi. We propose criteria for judging the improvement and conduct an acoustic phonetic analysis of the VOT.
The results show the efficacy of selective augmentation, since voicing recognition accuracy is increased by 17.56% and aspiration recognition from 0% to 61.17%. In addition, the tenuis class is successfully reduced by 32.21%, thereby reducing the conflations between the German phonemes. We finally discuss how selective augmentation may be further improved.
This work investigates the highest performing multilingual APT models to find ways to improve them. For this, we seek to improve the modeling for a single target language, German, while aiming to maintain the language-independent accuracy. After the initial investigation, we develop the transcription bootstrapping approach "selective augmentation" and apply it to a model based on the state of the art "MultIPA". Using this approach, we exemplarily improve plosive phonation recognition including the addition of aspiration recognition by selectively transferring plosive phonation information from a helper model trained with Hindi. We propose criteria for judging the improvement and conduct an acoustic phonetic analysis of the VOT.
The results show the efficacy of selective augmentation, since voicing recognition accuracy is increased by 17.56% and aspiration recognition from 0% to 61.17%. In addition, the tenuis class is successfully reduced by 32.21%, thereby reducing the conflations between the German phonemes. We finally discuss how selective augmentation may be further improved.
Thesis Note
Bonn, Univ., Master Thesis, 2025
Author(s)
Rights
Under Copyright
Language
English