Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Improving robust speech recognition for German oral history interviews using multi-condition training

: Gref, Michael; Schmidt, Christoph Andreas; Köhler, Joachim

Doclo, S. ; Informationstechnische Gesellschaft -ITG-, Fachausschuss Sprachakustik; Informationstechnische Gesellschaft -ITG-:
Speech communication. 13. ITG-Fachtagung Sprachkommunikation 2018 : 10.- 12. Oktober 2018, Oldenburg, CD-ROM
Berlin: VDE-Verlag, 2018 (ITG-Fachbericht 282)
ISBN: 978-3-8007-4767-2
5 S.
Fachtagung Sprachkommunikation <13, 2018, Oldenburg>
Bundesministerium für Bildung und Forschung BMBF (Deutschland)
Forschungsinfrastrukturen für die Geistes- und qualitativen Sozialwissenschaften; 01UG1511B; KA3
Kölner Zentrum für Analyse und Archivierung audiovisueller Daten
Fraunhofer IAIS ()
robust speech recognition; multi-condition training; data augmentation; acoustic modeling; oral history

In historical sciences, the term oral history refers to conducting and analyzing interviews with contemporary witnesses. To significantly reduce the resources needed to transcribe these interviews, we work on the adaptation of our speech recognition system to oral history interviews. In this work, we build on our previous experiments by using 1000 hours of training data from the broadcast domain. Utilizing the Kaldi ASR toolkit, we show that advanced chain acoustic models greatly benefit from large data sets and achieve remarkable performance on several test sets. To further improve the speech recognition performance on oral history interviews, we apply artificially created multi-condition data to the chain model training and reduce the WER on the oral history test set compared to a clean trained chain model by 4.8% absolute and 13.9% relative.