GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering

Weixin Liang,Yanhao Jiang,Zixuan Liu

GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering

2021

Weixin Liang
Yanhao Jiang
Zixuan Liu

Images are more than a collection of objects or attributes — they represent a web of relationships among interconnected objects. Scene Graph has emerged as a new modality as a structured graphical representation of images. Scene Graph encodes objects as nodes connected via pairwise relations as edges. To support question answering on scene graphs, we propose GraphVQA, a language-guided graph neural network framework that translates and executes a natural language question as multiple iterations of message passing among graph nodes. We explore the design space of GraphVQA framework, and discuss the trade-off of different design choices. Our experiments on GQA dataset show that GraphVQA outperforms the state-of-the-art accuracy by a large margin (88.43% vs. 94.78%).

Keywords:

Message passing
Scene graph
Natural language
Computer science
Margin (machine learning)
Theoretical computer science
Pairwise comparison
Question answering
Representation (mathematics)
modality

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations