Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Apparatus and method for harmonic-percussive-residual sound separation using a structure tensor on spectrograms

 
: Niedermeier, Andreas; Füg, Richard; Disch, Sascha; Müller, Meinard; Driedger, Jonathan

:
Frontpage ()

EP 3220386 A1: 20160318
English
Patent, Electronic Publication
Fraunhofer IIS ()

Abstract
Apparatus and method for analysing a magnitude spectrogram of an audio signal for Harmonic-Percussive Residual Sound Separation HPSS comprising : Determining a change of a frequency for each time-frequency bin of a plurality of time-frequency bins of the magnitude spectrogram of the audio signal; classifying each time-frequency bin into a signal component group depending on the change of the frequency. A structural tensor is applied to the image of the spectogram for preprocessing or feature extraction by edge and corner detection, in particular by calculating predominant orientation angles in the spectrogram.The structure tensor can be considered a black box, where the input is a gray scale image and the outputs are angles n for each pixel corresponding to the direction of lowest change and a certainty or anisotropy measure for this direction for each pixel. A local frequency change is extracted from the angles : It can be determined, whether a time-frequency-bin in the spectrogram belongs to a harmonic component (= low local frequency change) or to a percussive component (= high or infinite local frequency change). Examples of application : (figure 1) Distinguish between harmonic, percussive, and residual signal components by employing this orientation information. (figure 5) Analyse an audio signal for upmixing to five audio output channels front left, center, right, left surround and right surround : - The harmonic weighting factor may be greater for generating the left, center and right output channels compared to the harmonic weighting factor for generating the left surround and right surround output channels. - The percussive weighting factor may be smaller for generating the left, center and right output channels compared to the percussive weighting factor for generating the left surround and right surround output channels. (figure 6) Compute source separation metrics (source to distortion ratio SDR, source to interference ratio SIR, and source to artifacts ratios SAR) in a recorded audio signal. For example : A vibrato in a singing voice has a high instantaneous frequency change rate; an assignment of a bin in the spectrogram to "residual" is dependent on the bin anisotropy.

: http://publica.fraunhofer.de/documents/N-487420.html