Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Retrieval of textual song lyrics from sung inputs

: Kruspe, Anna M.

International Speech Communication Association -ISCA-:
Understanding speech processing in humans and machines. Vol.3 : 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016); San Francisco, California, USA, 8-12 September 2016
Red Hook, NY: Curran, 2016
ISBN: 978-1-5108-3313-5
International Speech Communication Association (Interspeech Annual Conference) <17, 2016, San Francisco/Calif.>
Fraunhofer IDMT ()
vocal analysis; phoneme recognition; lyrics retrieval; lyrics alignment

Retrieving the lyrics of a sung recording from a database of text documents is a research topic that has not received attention so far. Such a retrieval system has many practical applications, e.g. for karaoke applications or for indexing large song databases by their lyric content. In this paper, we present such a lyrics retrieval system. In a first step, phoneme posteriorgrams are extracted from sung recordings using various acoustic models trained on TIMIT and a variation thereof, and on subsets of a large database of recordings of unaccompanied singing (DAMP). On the other side, we generate binary templates from the available textual lyrics. Since these lyrics do not have any temporal information, we then employ an approach based on Dynamic Time Warping to retrieve the most likely lyrics document for each recording. The approach is tested on a different subset of the unaccompanied singing database which includes 601 recordings of 301 different songs (12000 lines of lyrics). The approach is evaluated both on a song-wise and on a line-wise scale. The results are highly encouraging and could be used further to perform automatic lyrics alignment and keyword spotting for large databases of songs.