Two level discriminative training for audio events recognition in sport broadcasts

Biatov, Konstantin

2007

Conference Paper

Abstract

In this paper, two level discriminative learning for audio events recognition in sport broadcasts archive is described. The audio events recognition is based on the idea that audio events are composed of basic units. Basic units are some elementary events. Audio events used for semantic interpretation (mid-level concepts) are presented as a combination of the basic units. Models for the basic units are GMM models. Each 5 frames of audio data are recognized using models of the basic units. Each mid-level concept is described by the distribution of the basic units. The distribution of the basic units in each class of segment corresponding to mid-level concepts is considered as a macro model of this class. For events recognition the tree based framework is used. In each level of the tree two macro models are compared. The two level discriminative learning for macro models is applied. First discriminative training level is on the level of basic units, second is on the level of macro models. The suggested approach is compared with maximum likelihood decision and SVM with polynomial kernel. The results of experiments indicate significant improvement in comparison with the conventional approaches in the task of acoustically closely audio events recognition.

Author(s)

Biatov, Konstantin

Mainwork

Twelfth International Conference SPEECH and COMPUTER, SPECOM 2007. Proceedings

Conference

International Conference SPEECH and COMPUTER (SPECOM) 2007

Options

Two level discriminative training for audio events recognition in sport broadcasts