A Sequence-to-Sequence Model Approach for ImageCLEF 2018 Medical Domain Visual Question Answering

2018 
Numerous attempts have been made in the recent past for the task of free-form and open-ended Visual Question Answering (VQA). Solving VQA problem typically requires techniques from both computer vision for a deeper understanding of the images and Natural language processing for understanding the semantics of the question and generating appropriate answers. It has caught the attention of a lot of researchers because of its enormous applications in the real-world scenarios. But none of the existing approaches are designed for the medical image-question pairs which require a sequence of words as an answer. We propose a novel approach by combining the tasks of Image captioning and Machine translation and provided a comprehensive model that takes a medical image-question pair as an input and generates a sequence of words as an answer. We evaluate our model on the dataset provided by ImageCLEF as a part of the ImageCLEF 2018 VQA-med challenge. We outperformed all the contestants of the challenge by achieving the best BLEU and WBSS scores. Furthermore, we provide additional insights that can be adopted to develop our baseline model and the challenges that lie ahead of us while building Machine learning models for medical datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    2
    Citations
    NaN
    KQI
    []