Classifying Sounds in Polyphonic Urban Sound Scenes

Abeßer, Jakob

2022

Journal Article

Abstract

The deployment of machine listening algorithms in real-world application scenarios is challenging. In this paper, we investigate how the superposition of multiple sound events within complex sound scenes affects their recognition. As a basis for our research, we introduce the Urban Sound Monitoring (USM) dataset, which is a novel public benchmark dataset for urban sound monitoring tasks. It includes 24,000 sound scenes that are mixed from isolated sounds using different loudness levels, sound polyphony levels, and stereo panorama placements. In a benchmark experiment, we evaluate three deep neural network architectures for sound event tagging (SET) on the USM dataset. In addition to counting the overall number of sounds in a sound scene, we introduce a local sound polyphony measure as well as a temporal and frequency coverage measure of sounds which allow to characterize complex sound scenes. The analysis of these measures confirms that SET performance decreases for higher sound polyphony levels and larger temporal coverage of sounds.

Author(s)

Abeßer, Jakob

Fraunhofer-Institut für Digitale Medientechnologie IDMT

Journal

AES E-Library. Online resource

Conference

Audio Engineering Society (AES Europe Spring Convention) 2022

Options

Classifying Sounds in Polyphonic Urban Sound Scenes