Early vs Late Fusion in Multimodal Convolutional Neural Networks

Konrad Gadzicki,Razieh Khamsehashari,Christoph Zetzsche

Early vs Late Fusion in Multimodal Convolutional Neural Networks

2020

Combining machine learning in neural networks with multimodal fusion strategies offers an interesting potential for classification tasks but the optimum fusion strategies for many applications have yet to be determined. Here we address this issue in the context of human activity recognition, making use of a state-of-the-art convolutional network architecture (Inception I3D) and a huge dataset (NTU RGB+D). As modalities we consider RGB video, optical flow, and skeleton data. We determine whether the fusion of different modalities can provide an advantage as compared to uni-modal approaches, and whether a more complex early fusion strategy can outperform the simpler late-fusion strategy by making use of statistical correlations between the different modalities. Our results show a clear performance improvement by multi-modal fusion and a substantial advantage of an early fusion strategy,

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations