Ambisonic Signal Processing DNNs Guaranteeing Rotation, Scale and Time Translation Equivariance

2021 
We propose a novel framework to design Ambisonic signal processing deep neural networks (DNNs) that guarantee physical symmetries. In general, spatial acoustic signal processing DNNs for, e.g., sound event detection, ought to perform with the equivalent accuracy regardless of the directions of arrival (DOA) of sound sources. This property is well known as rotation symmetry in natural science. However, in most conventional multi-channel signal processing DNNs, rotation symmetry has not been explicitly incorporated into the model structure, and pseudo rotation symmetry has been acquired by training models with a large amount of signal datasets arriving from various directions. Therefore, the conventional methods will not perform sufficiently when the training dataset is relatively small scale or statistically biased, e.g., the distribution of the arriving directions of the sound events is inhomogeneous. Furthermore, in order to efficiently handle time series acoustic signals in DNNs, it is necessary to consider several additional symmetries, such as amplitude scaling and time translation of the signals. In this paper, we integratedly formulate these symmetry assumptions, which are called equivariance, in the form of constraints for our targeted DNN design. We propose a new DNN design method called Clebsch--Gordan Nets with Scale and Time translation Symmetry (CGNets-STS), which guarantees to simultaneously satisfy three types of equivariance (3D rotation, amplitude scaling, and time translation). As an instance of this method, we design a DNN model for sound event localization and detection tasks from Ambisonic signals. Experimental results using a realistic dataset show that this model is highly robust against spatial rotations for input data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    2
    Citations
    NaN
    KQI
    []