Large text and audio data alignment for multimedia applications

Biatov, K.

2003

Conference Paper

Abstract

This paper describes the technique for the large text and the large audio alignment. This technique includes a segmentation of the audio into homogeneous speech segments, a recognition of each speech fragment using speech recognizer, a description of each speech fragment by keywords that are selected from the output of the speech recognizer on the base of acoustic confidence score and on the base of salience with respect to the other speech fragments. The sentences of the text are described by the same keywords. The global alignment between the large text and the large audio using only keywords gives rough correspondence between the sentences of the text and the audio fragments. The next recognition pass is based on the finite state automaton generated from roughly aligned sentences that correspond to each speech fragment. This pass gives more precise alignment. Suggested technique gives accurate alignment between the text and the audio.

Author(s)

Biatov, K.

Mainwork

Text, speech and dialogue

Conference

International Conference on Text, Speech and Dialogue (TSD) 2003

Options

Large text and audio data alignment for multimedia applications