Lightweight Models for Multimodal Sequential Data

Soumya Sourav,Jessica Ouyang

Lightweight Models for Multimodal Sequential Data

2021

Soumya Sourav
Jessica Ouyang

Human language encompasses more than just text; it also conveys emotions through tone and gestures. We present a case study of three simple and efficient Transformer-based architectures for predicting sentiment and emotion in multimodal data. The Late Fusion model merges unimodal features to create a multimodal feature sequence, the Round Robin model iteratively combines bimodal features using cross-modal attention, and the Hybrid Fusion model combines trimodal and unimodal features together to form a final feature sequence for predicting sentiment. Our experiments show that our small models are effective and outperform the publicly released versions of much larger, state-of-the-art multimodal sentiment analysis systems.

Keywords:

multimodal data
transformer
Gesture
Sequence
Artificial intelligence
Feature (machine learning)
Sentiment analysis
human language
sequential data
Machine learning
Computer science

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations