• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. How Robust are Audio Embeddings for Polyphonic Sound Event Tagging?
 
  • Details
  • Full
Options
2023
Journal Article
Title

How Robust are Audio Embeddings for Polyphonic Sound Event Tagging?

Abstract
Sound classification algorithms are challenged by the natural variability of everyday sounds, particularly for large sound class taxonomies. In order to be applicable in real-life environments, such algorithms must also be able to handle polyphonic scenarios, where simultaneously occurring and overlapping sound events need to be classified. With the rapid progress of deep learning, several deep audio embeddings (DAEs) have been proposed as pre-trained feature representations for sound classification. In this article, we analyze the embedding spaces of two non-trainable audio representations (NTARs) and five DAEs for sound classification in polyphonic scenarios (sound event tagging) and make several contributions. First, we compare general properties like the inter-correlation between feature dimensions and the scattering of sound classes in the embedding spaces. Second, we test the robustness of the embeddings against several audio degradations and propose two sensitivity measures based on a class-agnostic and a class-centric view on the resulting drift in the embedding space. Finally, as a central contribution, we study how a blending between pairs of sounds maps to embedding space trajectories and how the path of these trajectories can cause classification errors due to their proximity to other sound classes. Throughout our analyses, the PANN embeddings have shown the best overall performance for low-polyphony sound event tagging.
Author(s)
Abeßer, Jakob  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Grollmisch, Sascha  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Müller, Meinard
Journal
IEEE ACM transactions on audio, speech, and language processing  
Open Access
DOI
10.1109/TASLP.2023.3293032
Additional full text version
Landing Page
Language
English
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Keyword(s)
  • deep audio embeddings

  • Degradation

  • embedding space

  • Robustness

  • Sound event tagging

  • sound polyphony

  • Tagging

  • Task analysis

  • Training

  • Trajectory

  • Transfer learning

  • Environmental Sound Analysis

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024