Now showing 1 - 1 of 1
  • Publication
    Using topic cues for speaker recognition in broadcast multimedia archives
    ( 2013)
    Baum, Doris
    Current speaker recognition research focusses on features which capture the voice, but, especially in the broadcast domain, other cues can be used to identify speakers, such as the topic of an utterance: Speakers often appear in programmes on topics for which they are considered experts. The goal of this thesis is to explore the idea of topic-based speaker recognition for broadcast data. It tries to answer the question how spoken utterances' topic information can be automatically captured with little manual effort, and how it can be used to learn speakers' topic preferences. Two approaches which automatically learn the speakers' topic preferences from audio training data are presented, one based on implicit and one based on explicit topic representation. Automatic speech recognition is used to produce word transcripts for the audio material, as the topic information is carried mostly by the words in an utterance. In the first approach, the speakers' topic preferences are implicitly represented by learning their idiosyncratic words, which will consist mostly of topic marker words for their preferred topics. In the second approach, topics are explicitly modelled using an unsupervised probabilistic topic modelling algorithm (Latent Dirichlet Allocation) which automatically identifies prevalent topics and their marker words without need for manually labelled topic training data. With the explicitly trained topics, the utterances' word transcripts can be converted into topic probability vectors which are then used to model and recognise speakers. Both approache s are evaluated to see how speaker identification based on topic cues performs. As no big speaker recognition evaluation benchmark containing topic preferences was available for the evaluation, a new corpus based on recordings from the German parliament was created. The evaluation compares topic-based with traditional voice-based and idiolectal speaker recognition systems. Also, topic, voice, and idiolectal systems are fused to see how well topic combines with other cues.