Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Detecting double-talk (overlapping speech) in conversations using deep learning

: Abdullah

Fulltext urn:nbn:de:0011-n-4770042 (1.4 MByte PDF)
MD5 Fingerprint: 600b6f9f3d5d97eefc5951965056d7b2
Created on: 16.12.2017

Aachen, 2017, 75 pp.
Aachen, TH, Master Thesis, 2017
Bundesministerium für Bildung und Forschung BMBF
Forschungsinfrastrukturen für die Geistes- und qualitativen Sozialwissenschaften; 01UG1511B; KA3
Kölner Zentrum für Analyse und Archivierung audiovisueller Daten
Master Thesis, Electronic Publication
Fraunhofer IAIS ()

The work presented in this thesis aims to automatically detect double-talks (overlapping speech) in audio recordings of natural conversations using a Deep Convolutional Neural Network. In doing it so, manual engineering of problem specific acoustic features prevelant in classical approaches is avoided. The characteristic challenges arising from the ephemeral nature of natural double-talks, in addition to the standard issues faced in development of a pattern recognition system, are handled using different methods. In particular, careful rebalancing of the training data for tackling the inherent class imbalance, pre-removal of silence, and two standard normalization procedures for reducing the mismatch in training and testing conditions, are all scientifically evaluated for their respective impacts. Furthermore, the shortcoming of the proposed neural network in modelling long-term temporal dependencies is documented, and the attempt for fixing it with Viterbi decoding is reported. Satisfactory results have been achieved on a large and representative testing set, while multiple avenues have been paved for future works.