Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

: Drossos, K.; Mimilakis, S.I.; Serdyuk, D.; Schuller, G.; Virtanen, T.; Bengio, Y.


Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computational Intelligence Society; International Neural Network Society:
International Joint Conference on Neural Networks, IJCNN 2018. Proceedings : 8-13 July 2018, Rio de Janeiro, Brazil
Piscataway, NJ: IEEE, 2018
ISBN: 978-1-5090-6014-6
ISBN: 978-1-5090-6015-3
International Joint Conference on Neural Networks (IJCNN) <2018, Rio de Janeiro>
Fraunhofer IDMT ()

Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel recurrent neural approach that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.