Cross-Active Connection for Image-Text Multimodal Feature Fusion.

JungHyuk Im,Wooyeong Cho,Dae-Shik Kim

Cross-Active Connection for Image-Text Multimodal Feature Fusion.

2021

JungHyuk Im
Wooyeong Cho
Dae-Shik Kim

Recent research fields tackle high-level machine learning tasks which often deal with multiplex datasets. Image-text multimodal learning is one of the comparatively challenging domains in Natural Language Processing. In this paper, we suggest a novel method for fusing and training the image-text multimodal feature. The proposed architecture follows a multi-step training scheme to train a neural network for image-text multimodal classification. In the training process, different groups of weights in the network are updated hierarchically in order to reflect the importance of each single modality as well as their mutual relationship. The effectiveness of Cross-Active Connection in image-text multimodal NLP tasks was verified through extensive experiments on the task of multimodal hashtag prediction and image-text feature fusion.

Keywords:

Scheme (programming language)
task
Multimodal learning
Artificial intelligence
modality
Process (engineering)
Computer science
Feature (computer vision)
Artificial neural network
Image (mathematics)
Machine learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations