Keyword spotting in a-capella singing

Kruspe, Anna M.

2014

Conference Paper

Abstract

Keyword spotting (or spoken term detection) is an interesting task in Music Information Retrieval that can be applied to a number of problems. Its purposes include topical search and improvements for genre classification. Keyword spotting is a well-researched task on pure speech, but state-of-the-art approaches cannot be easily transferred to singing because phoneme durations have much higher variations in singing. To our knowledge, no keyword spotting system for singing has been presented yet. We present a keyword spotting approach based on keyword-filler Hidden Markov Models (HMMs) and test it on a-capella singing and spoken lyrics. We test Mel-Frequency Cepstral Coefficents (MFCCs), Perceptual Linear Predictive Features (PLPs), and Temporal Patterns (TRAPs) as front ends. These features are then used to generate phoneme posteriors using Multilayer Perceptrons (MLPs) trained on speech data. The phoneme posteriors are then used as the system input. Our approach produces useful results on a-capella singing, but depend heavily on the chosen keyword. We show that results can be further improved by training the MLP on a-capella data. We also test two post-processing methods on our phoneme posteriors before the keyword spotting step. First, we average the posteriors of all three feature sets. Second, we run the three concatenated posteriors through a fusion classifier.

Author(s)

Kruspe, Anna M.

Mainwork

15th Conference of the International Society for Music Information Retrieval, ISMIR 2014. Proceedings. Online resource

Conference

International Society for Music Information Retrieval (ISMIR Conference) 2014

Options

Keyword spotting in a-capella singing