Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Multi-scale aggregation of phase information for complexity reduction of CNN based DOA estimation

: Chakrabarty, S.; Habets, E.A.P.


Bugallo, Mónica F. (General Chair) ; Institute of Electrical and Electronics Engineers -IEEE-; European Association for Signal Processing -EURASIP-:
27th European Signal Processing Conference, EUSIPCO 2019 : A Coruña, Spain, September 2-6, 2019
Piscataway, NJ: IEEE, 2019
ISBN: 978-9-0827-9703-9
ISBN: 978-90-827970-2-2
ISBN: 978-1-5386-7300-3
European Signal Processing Conference (EUSIPCO) <27, 2019, A Coruña/Spain>
Fraunhofer IIS ()

In a recent work on direction-of-arrival (DOA) estimation of multiple speakers with convolutional neural networks (CNNs), the phase component of short-time Fourier transform (STFT) coefficients of the microphone signal is given as input and small filters are used to learn the phase relations between neighboring microphones. Due to the chosen filter size, M − 1 convolution layers are required to achieve the best performance for a microphone array with M microphones. For arrays with large number of microphones, this requirement leads to a high computational cost making the method practically infeasible. In this work, we propose to expand the receptive field of the filters to reduce the computational cost of our previously proposed method. To realize this expansion, we use systematic dilations of the filters in each of the convolution layers. Different systematic dilation strategies for a specific microphone array are explored. Experimental analysis of the different strategies, shows that an aggressive expansion strategy results in a considerable reduction in computational cost while a relatively gradual expansion of the receptive field exhibits the best DOA estimation performance along with reduction in the computational cost.