Declipping speech using deep filtering

Mack, W.; Habets, E.A.P.

doi:10.1109/WASPAA.2019.8937287

2019

Conference Paper

Abstract

Recorded signals can be clipped in case the sound pressure or analog signal amplification is too large. Clipping is a non-linear distortion, which limits the maximal magnitude modulation of the signal and changes the energy distribution in frequency domain and hence degrades the quality of the recording. Consequently, for declipping, some frequencies have to be amplified, and others attenuated. We propose a declipping method by using the recently proposed deep filtering technique which is capable of extracting and reconstructing a desired signal from a degraded input. Deep filtering operates in the short-time Fourier transform (STFT) domain estimating a complex multidimensional filter for each desired STFT bin. The filters are applied to defined areas of the clipped STFT to obtain for each filter a single complex STFT bin estimation of the declipped STFT. The filter estimation, thereby, is performed via a deep neural network trained with simulated data using soft- or hard-clipping. The loss function minimizes the reconstruction mean-squared error between the non-clipped and the declipped STFTs. We evaluate our approach using simulated data degraded by hard- and soft-clipping and conducted a pairwise comparison listening test with measured signals comparing our approach to one commercial and one open-source declipping method. Our approach outperformed the baselines for declipping speech signals for measured data for strong and medium clipping.

Author(s)

Mack, W.

Habets, E.A.P.

Mainwork

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019

Conference

Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019

Options

Declipping speech using deep filtering