Towards large scale vocabulary independent spoken term detection: Advances in the Fraunhofer IAIS audiomining system
This contribution presents the advances of the Fraunhofer IAIS Audiomining system for vocabulary independent spoken term detection since the last SIGIR workshop on searching spontaneous conversational speech in 2007. Based on feedback from archivists involved in the development of the prototype, a set of requirements for spoken term detection systems was established, guiding the development of the overall system. After improving the automatic speech recognition (ASR) baseline with data from the broadcast domain, the syllable error rate on a set of broadcast news and broadcast conversation shows could be improved by 45.6% relative, while the time required for analyzing the data could be reduced by 90%. Based on the new ASR results, the F1 value of the fuzzy syllable search used for open vocabulary spoken term detection was increased by 49% relative. The best results could be achieved with a hybrid word and syllable system, with a relative F1 improvement of 58% compared to the 2007 prototype. With the better ASR baseline, exact string search on the syllable transcripts becomes a promising alternative, yielding precise results on large audiovisual archives with only small reductions in recall.