Spotforming: Spatial filtering with distributed arrays for position-selective sound acquisition
Hands-free capture of speech often requires extraction of sources from a certain spot of interest (SOI), while reducing interferers and background noise. Although state-of-the-art spatial filters are fully data-dependent and computed using the power spectral density (PSD) matrices of the desired and the undesired signals, the existing solutions to extract sources from a SOI are only partially data-dependent. Estimating the time-varying PSD matrices from the data is a challenging problem, especially in dynamic and quickly time-varying acoustic scenes. Hence, the spot signal statistics are often pre-computed based on a near-field propagation model, resulting in suboptimal filters. In this work, we propose a fully data-dependent spatial filtering framework for extraction of speech signals that originate from a SOI. To achieve position-based spatial selectivity, distributed arrays are used, which offer larger spatial diversity compared to arrays of closely spaced microphones. The PSD matrices of the desired and the undesired signals are updated at each time-frequency bin by using a minimum Bayes risk detector that is based on a probabilistic model of narrowband position estimates. The proposed framework is applicable in challenging multitalk situations, without requiring any prior information, except the geometry, location, and orientation of the arrays.