Fact-based visual question answering via dual-process system

2021 
Abstract Fact-based visual question answering (FVQA) requires the model to answer questions based on the observed images and external knowledge. The key is to enable the agent to understand questions and images and then reason on the knowledge base to find the correct answer. Founded on the dual-process theory in cognitive science, an effective framework for the FVQA is proposed in this study by coordinating a perception module (System 1) and an explicit reasoning module (System 2). When a question and an image are given, System 1 first learns the joint representation of them, and then System 2 predicts the answer via reasoning on a fact graph and a semantic graph. Specifically, System 1 is implemented by a two-parallel BERT-style model, while System 2 by a graph neural network (GNN) with a dual-level attention mechanism. Experiments on two public datasets, i.e., FVQA and OK-VQA datasets, show that our model outperforms other baselines. Moreover, the proposed model also provides the interpretation of the reasoning process in addition to a correct answer to the question.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    0
    Citations
    NaN
    KQI
    []