Harmonic-percussive-residual sound separation using the structure tensor on spectrograms
Harmonic-percussive-residual (HPR) sound separation is a useful preprocessing tool for applications such as pitched instrument transcription or rhythm extraction. Recent methods rely on the observation that in a spectrogram representation, harmonic sounds lead to horizontal structures and percussive sounds lead to vertical structures. Furthermore, these methods associate structures that are neither horizontal nor vertical (i.e., non-harmonic, non-percussive sounds) with a residual category. However, this assumption does not hold for signals like frequency modulated tones that show fluctuating spectral structures, while nevertheless carrying tonal information. Therefore, a strict classification into horizontal and vertical is inappropriate for these signals and might lead to leakage of tonal information into the residual component. In this work, we propose a novel method that instead uses the structure tensor-a mathematical tool known from image processing-to calculate predominant orientation angles in the magnitude spectrogram. We show how this orientation information can be used to distinguish between harmonic, percussive, and residual signal components, even in the case of frequency modulated signals. Finally, we verify the effectiveness of our method by means of both objective evaluation measures as well as audio examples.