MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

Drossos, K.; Serdyuk, D.; Virtanen, T.; Bengio, Y.; Mimilakis, S.I.; Schuller, G.

doi:10.1109/IJCNN.2018.8489565

2018

Conference Paper

Abstract

Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel recurrent neural approach that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.

Author(s)

Drossos, K.

Serdyuk, D.

Virtanen, T.

Bengio, Y.

Mimilakis, S.I.

Schuller, G.

Mainwork

International Joint Conference on Neural Networks, IJCNN 2018. Proceedings

Conference

International Joint Conference on Neural Networks (IJCNN) 2018

Options

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation