Automatic sentence boundary detection for German broadcast news
In this work we aim at enriching the transcript of an automatic speech recognition system with punctuation by automatically detecting sentence ends. We make use of a simple word-based language model and combine it with a decision tree for the acoustic features of speech. The focus lies on selecting robust acoustic features that reflect the prosodic characteristics of the German language in a most optimal way. We arrive at a Sentence Unit Error Rate of 54 compared to the state-of-the art rate for English of 61, by applying a comparable detection system. This is a sound indication that prosody has a stronger cue on perception of sentence boundaries for German than for English. Our work is, to our knowledge, the first system developed for sentence boundary detection for the broadcast news dom ain for German language. Our results can therefore serve as a baseline for further studies in this scenario.