Chain Of Reasoning For Visual Question Answering

Authors:
Chenfei Wu Beijing University of Posts and Telecommunications
Jinlai Liu Beijing University of Posts and Telecommunications
Xiaojie Wang Beijing University of Posts and Telecommunications
Xuan Dong Beijing University of Posts and Telecommunications

Introduction:

Reasoning plays an essential role in Visual Question Answering (VQA).This paper proposes a novel reasoning model for addressing these problems.

Abstract:

Reasoning plays an essential role in Visual Question Answering (VQA). Multi-step and dynamic reasoning is often necessary for answering complex questions. For example, a question "What is placed next to the bus on the right of the picture?" talks about a compound object "bus on the right," which is generated by the relation . Furthermore, a new relation including this compound object is then required to infer the answer. However, previous methods support either one-step or static reasoning, without updating relations or generating compound objects. This paper proposes a novel reasoning model for addressing these problems. A chain of reasoning (CoR) is constructed for supporting multi-step and dynamic reasoning on changed relations and objects. In detail, iteratively, the relational reasoning operations form new relations between objects, and the object refining operations generate new compound objects from relations. We achieve new state-of-the-art results on four publicly available datasets. The visualization of the chain of reasoning illustrates the progress that the CoR generates new compound objects that lead to the answer of the question step by step.

You may want to know: