Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Harmonic-percussive-residual sound separation using the structure tensor on spectrograms

: Füg, Richard; Niedermeier, Andreas; Driedger, Jonathan; Disch, Sascha; Müller, Meinard


Institute of Electrical and Electronics Engineers -IEEE-; IEEE Signal Processing Society:
IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016. Proceedings : March 20-25, 2016, Shanghai International Convention Center, Shanghai, China
Piscataway, NJ: IEEE, 2016
ISBN: 978-1-4799-9988-0 (electronic)
ISBN: 978-1-4799-9987-3 (USB)
ISBN: 978-1-4799-9989-7 (print)
International Conference on Acoustics, Speech and Signal Processing (ICASSP) <2016, Shanghai>
Conference Paper
Fraunhofer IIS ()
semantic audio processing; Segmentierung; Audio Analyse; audio; Analyse; Zerlegung; Separation

Harmonic-percussive-residual (HPR) sound separation is a useful preprocessing tool for applications such as pitched instrument transcription or rhythm extraction. Recent methods rely on the observation that in a spectrogram representation, harmonic sounds lead to horizontal structures and percussive sounds lead to vertical structures. Furthermore, these methods associate structures that are neither horizontal nor vertical (i.e., non-harmonic, non-percussive sounds) with a residual category. However, this assumption does not hold for signals like frequency modulated tones that show fluctuating spectral structures, while nevertheless carrying tonal information. Therefore, a strict classification into horizontal and vertical is inappropriate for these signals and might lead to leakage of tonal information into the residual component. In this work, we propose a novel method that instead uses the structure tensor-a mathematical tool known from image processing-to calculate predominant orientation angles in the magnitude spectrogram. We show how this orientation information can be used to distinguish between harmonic, percussive, and residual signal components, even in the case of frequency modulated signals. Finally, we verify the effectiveness of our method by means of both objective evaluation measures as well as audio examples.