• English
  • Deutsch
  • Log In
    Password Login
    or
  • Research Outputs
  • Projects
  • Researchers
  • Institutes
  • Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. Spectro-temporal gabor filterbank features for acoustic event detection
 
  • Details
  • Full
Options
2015
Journal Article
Titel

Spectro-temporal gabor filterbank features for acoustic event detection

Abstract
Algorithms for the automatic detection and recognition of acoustic events are increasingly gaining relevance for the reliable and robust functioning of consumer, assistive and monitoring systems. The extraction of appropriate task relevant acoustic features from the raw sound signal clearly influences performance of subsequent statistical classification, in particular in adverse acoustic situations. The present contribution investigates the use of biologically-inspired features, derived from a filterbank of two-dimensional Gabor functions, that decompose the spectro-temporal power density into components which capture spectral, temporal and joint spectro-temporal modulation patterns. It is hypothesized that the comparably large joint spectral and temporal extent of these Gabor functions results in features that allow for robust classification. Evaluation of the proposed feature extraction scheme together with an hidden Markov model (HMM) classifier is conducted on two corpora comprising acoustic events in realistic adverse conditions from the D-CASE and CLEAR'07 evaluation campaigns. Relevance of each Gabor filter for classification is analyzed and an optimized parameter set for the Gabor filterbank (GFB) is identified. Performance of the optimized GFB is evaluated in comparison to other state-of-the-art algorithms on isolated event classification and on the full acoustic event detection (AED) including joint classification and temporal segmentation of events. Results show that Gabor features result in a signal representation that exhibits separated average class-specific patterns. An improvement in classification accuracy of up to 26% relative to the Mel-frequency cepstral coefficient (MFCC) baseline is obtained with the optimized GFB. Further experiments demonstrate that this improvement cannot be explained by purely temporal or purely spectral Gabor basis functions. Rather, a GFB with features extending in joint spectro-temporal directions is required to obtain- optimum performance. Performance on AED with the D-CASE challenge dataset is shown to improve on previous algorithms from the recent literature.
Author(s)
Schröder, J.
Goetze, S.
Anemüller, J.
Zeitschrift
IEEE ACM transactions on audio, speech, and language processing
Thumbnail Image
DOI
10.1109/TASLP.2015.2467964
Language
English
google-scholar
Fraunhofer-Institut für Digitale Medientechnologie IDMT
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Send Feedback
© 2022