Visual localization of non-stationary sound sources

2009 
Sound source can be visually localized by analyzing the correlation between audio and visual data. To correctly analyze this correlation, the sound source is required to be stationary in a scene to date. We introduce a technique that localizes the non-stationary sound sources to overcome this limitation. The problem is formulated as finding the optimal visual trajectories that best represent the movement of the sound source over the pixels in a spatio-temporal volume. Using a beam search, we search these optimal visual trajectories by maximizing the correlation between the newly introduced audiovisual features of inconsistency. An incremental correlation evaluation with mutual information is developed here, which significantly reduces the computational cost. The correlations computed along the optimal trajectories are finally incorporated into a segmentation technique to localize a sound source region in the first visual frame of the current time window. Experimental results demonstrate the effectiveness of our method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    9
    Citations
    NaN
    KQI
    []