Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Towards large scale vocabulary independent spoken term detection: Advances in the Fraunhofer IAIS audiomining system

: Schneider, D.; Schon, J.; Eickeler, S.

Köhler, J.; Larson, M.; Jong, F. de; Kraaij, W.; Ordelman, R. ; Association for Computing Machinery -ACM-, Special Interest Group on Information Retrieval -SIGIR-; Centre for Telematics and Information Technology -CTIT-, Enschede:
ACM SIGIR Workshop "Searching Spontaneous Conversational Speech" 2008. Proceedings : Held in conjunction with the 31th Annual International ACM SIGIR Conference 24 July 2008, Singapore
Enschede: CTIT, 2008
ISBN: 978-90-365-2697-5
Workshop "Searching Spontaneous Conversational Speech" (SSCS) <2008, Singapore>
International Conference on Research and Development in Information Retrieval (SIGIR) <31, 2008, Singapore>
Fraunhofer IAIS ()

This contribution presents the advances of the Fraunhofer IAIS Audiomining system for vocabulary independent spoken term detection since the last SIGIR workshop on searching spontaneous conversational speech in 2007. Based on feedback from archivists involved in the development of the prototype, a set of requirements for spoken term detection systems was established, guiding the development of the overall system. After improving the automatic speech recognition (ASR) baseline with data from the broadcast domain, the syllable error rate on a set of broadcast news and broadcast conversation shows could be improved by 45.6% relative, while the time required for analyzing the data could be reduced by 90%. Based on the new ASR results, the F1 value of the fuzzy syllable search used for open vocabulary spoken term detection was increased by 49% relative. The best results could be achieved with a hybrid word and syllable system, with a relative F1 improvement of 58% compared to the 2007 prototype. With the better ASR baseline, exact string search on the syllable transcripts becomes a promising alternative, yielding precise results on large audiovisual archives with only small reductions in recall.