Automatic Caption Generation via Attention Based Deep Neural Network Model

2021 
The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to generate textual descriptions for an image which can be used further for visual analysis and understanding the semantics of the content. Various approaches and techniques have been proposed for this problem and in recent times deep learning based models particularly those which have incorporated attention mechanism have produced better caption generators. The attention-based models tend to visualize what is seen prominently in the image and hence, are capable of producing better captions of an image. In this work, an automatic caption generator model based on attention mechanism has been implemented and the experimental results have been discussed. The model consists of a Convolutional Neural Network (CNN) encoder along with a Gated Recurrent Unit (GRU) as a Recurrent Neural Network (RNN) decoder with a local attention module.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []