Fine-Grained Unbalanced Interaction Network for Visual Question Answering

Xinxin Liao,Mingyan Wu,Heyan Chai,Shuhan Qi,Xuan Wang,Qing Liao

Fine-Grained Unbalanced Interaction Network for Visual Question Answering

2021

Learning an effective interaction mechanism is important for Visual Question Answering (VQA). It requires an understanding of both the visual content of images and the textual content of questions. Existing approaches consider both the inter-modal and intra-modal interactions, while neglecting the irrelevant information in the interactions. In this paper, we propose a novel Fine-grained Unbalanced Interaction Network (FUIN) to adaptively capture the most useful information from interactions. It contains a parallel interaction module to model the two-way interactions and a fine-grained adaptive activation module to adaptively activate the interactions for each component according to their specific context. Experimental evaluation results on the benchmark VQA-v2 dataset demonstrate that FUIN achieves state-of-the-art VQA performance, we achieve an overall accuracy of 71.14% on the test-std set.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations