A class of multichannel sparse linear prediction algorithms for time delay estimation of speech sources

2020 
Abstract Time delay estimation (TDE), which is also called time difference of arrival estimation, is an important yet challenging problem in room acoustic environments where reverberation and noise coexist. The multichannel cross-correlation coefficient algorithm extends the traditional cross-correlation method from the two- to the multiple-channel cases and exploits the spatial information among all microphones to improve the robustness of TDE with respect to noise. The multichannel spatiotemporal prediction algorithm generalizes the multichannel cross-correlation coefficient algorithm by incorporating both the spatial and temporal information to make TDE robust to reverberation. This multichannel spatiotemporal prediction algorithm, however, is sensitive to noise. In this work, we attempt to improve the robustness of this algorithm by making it robust to both reverberation and noise. Based on the sparsity of the prediction coefficient matrix of speech signals, a class of multichannel sparse linear prediction algorithms, including the multichannel spatiotemporal sparse prediction and the multichannel spatiotemporal group sparse prediction, are developed for TDE. The multichannel cross-correlation coefficient and multichannel spatiotemporal prediction algorithms are unified from a TDE performance perspective via an F/l1-norm (or F/l1,2-norm) optimization model, which is solved by an alternating direction method of multipliers. The two new algorithms also respectively construct a set of time delay estimators, which make different tradeoffs between prewhitening and non-prewhitening by adjusting a regularization parameter.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    3
    Citations
    NaN
    KQI
    []