Rieber, Joscha SimonJoscha SimonRieberBardeli, RolfRolfBardeli2022-03-132022-03-132013https://publica.fraunhofer.de/handle/publica/393343Common approaches to automatic speech recognition (ASR) are based on training statistical models for the acoustics of speech. In our work, a retrieval-based ASR system is developed that does not rely on training and thus provides more flexible application. It is based on a set of known reference word utterances for each possibly occurring word in a test string. A test word string is identified by finding the most similar reference for each word by using an approach based on dynamic time warping (DTW). The DTW variant suitable for recognizing strings of connected words is called level-building DTW, proposed by Myers and Rabiner in 1981. It is using a level-wise iteration to match each word in the test utterance with the most similar reference. In our work, an ASR system for connected digit recognition based on level-building DTW is developed, evaluated and compared with a state-of-the-art HMM recognizer.en005006629Speech recognition as a retrieval problemconference paper