Zeddelmann, D. vonD. vonZeddelmann2022-03-122022-03-122012https://publica.fraunhofer.de/handle/publica/3783252-s2.0-850913427222-s2.0-84947969282We propose a robust and easy to realize method for unsupervised Speech Detection (SD) in the context of audio monitoring applications. SD is posed as a binary classification task with the goal of localizing speech in an acoustic monitoring recording. In realistic monitoring settings speech is usually interfered by noisy masking components. The proposed method overcomes this problem to a certain extent by using a parametric mel frequency cepstral coefficients (MFCC) -like feature extraction process, explicitly guided by the human speech production and perceptual characteristics of the human ear. The resulting feature sequence is subsequently interpreted as a set of subband signals. Due to the speech-specific frequency adaptation in the feature extraction process, the energy content of the averaged subband signals shows an extensive emphasis of relevant speech components. An experimental performance evaluation on both synthetic and real data shows a significant improvemen t especially in bad SNR conditions as compared to short time energy-based methods for unsupervised Voice Activity Detection (VAD).en004A feature-based approach to noise robust speech detectionconference paper