Object-based classification for Visual Question Answering

2020 
Visual Question Answering (VQA) problem is expected to provide an accurate answer in natural language for a given question about the image in natural language. This paper focuses on the visual question answering problem using deep learning based architecture. The proposed model is based on two popular neural network architectures: Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM). We have used CNN for encoding the given image and word embeddings to encode the questions and the feature extraction. Moreover, we have used LSTM for the question understanding. Finally, the results are compared with the Multi-Layer Perceptron (MLP) and Stacked Attention Network (SAN) models. The effectiveness of the proposed model have also been compared.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    1
    Citations
    NaN
    KQI
    []