Efficient subword-based lattice retrieval of broadcast news speech
Searching for keywords in a collection of spoken documents is a challenging task. The use of Automatic Speech Recognition to produce searchable transcripts is not reliable enough to use standard Information Retrieval techniques. Although different methods are available which address the problem of searching erroneous transcripts, there are only few which are suitable for large-scale spoken document retrieval. Although much work has been done in the field of Spoken Term Detection (STD), most of it has concentrated on English or Mandarin. This thesis presents a prototype system designed to perform STD on German broadcast news which aims at achieving a high level of accuracy while being able to process queries in a reasonable mount of time. Lattice-based indexing and retrieval methods were chosen as suitable approaches to STD and are therefore presented and discussed in this work. Indexing on subword units is of special interest because word transcriptions produced by large scale continuous speech recognition are known to be unable to capture important out-of-vocabulary keywords. As this project was carried out in collaboration with Fraunhofer IAIS, the developed system is evaluated in comparison with the Fraunhofer AudioMining baseline system. On a corpus of 3.5 hours of broadcast news, the best configuration of the lattice-based system using word and syllable indexing units achieved 94% precision and 79% recall, an absolute increase of 6% in precision compared to the baseline system. Syllable lattice search was found to be 6 times faster than fuzzy syllable search using a confusion matrix.
Edinburgh, Univ., Master Thesis, 2008