Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

A GMM Approach to Singing Language Identification

 
: Kruspe, Anna M.; Abeßer, Jakob; Dittmar, Christian

Dittmar, C. ; Audio Engineering Society -AES-:
53rd International Conference on Semantic Audio 2014 : London, United Kingdom 26 – 29 January 2014
Red Hook, NY: Curran, 2014
ISBN: 978-1-63266-284-2
ISBN: 1-63266-284-1
pp.140-148
International Conference on Semantic Audio <53, 2014, London>
English
Conference Paper
Fraunhofer IDMT ()
singing language identification; music classification

Abstract
Automatic language identification for singing is a topic that has not received much attention for the past years. Possible application scenarios include searching for musical pieces in a certain language, improvement of similarity search algorithms for music, and improvement of regional music classification and genre classification. It could also serve to mitigate the "glass ceiling" effect. Most existing approaches employ PPRLM (Parallel Phone Recognition followed by Language Modelling) processing. Recent publications show that GMM-based (Gaussian Mixture Models) approaches are now able to produce results comparable to PPRLM systems when using certain audio features. Their advantages lie in their simplicity of implementation and the reduced training data requirements. This was only tested on speech data so far. In this paper, we therefore try out such a GMM-based approach for singing language identification. We test our system on speech data and a-capella singing. We use MFCC (Mel-Frequency Cepstral Coefficients), TRAP (Termporal Pattern), and SDC (Shifted Delta Cepstrum) features. The results are comparable to the state of the art for singing language identification, but the approach is a lot simpler to implement as no phoneme-wise annotations are required. We obtain results of 75% accuracy for speech data and 67.5% accuracy for a-capella data. To our knowledge, neither the GMM-based approach nor this feature combination have been used for the purpose of singing language identification before.

: http://publica.fraunhofer.de/documents/N-298243.html