Evolution of automatic visual description techniques-a methodological survey

2021 
Describing the contents and activities in an image or video in semantically and syntactically correct sentences are known as captioning. Automated captioning is one of the most researched topics these days, with new sophisticated models being discovered every day. Captioning models require intense training and perform intense, complex calculations before successfully generating a caption and hence, takes a considerable amount of time even in machines with high specifications. In this survey, we go through the recent state-of-the-art advancements in automatic image and video description methodologies using deep neural networks and summarize the concepts inferred from them. The summarization has been done with a systematic, detailed, and critical analysis of the latest methodologies published in high impact proceedings and journals. Our investigation focuses on techniques that can optimize existing concepts and incorporate new methods of visual attention for generating captions. This survey emphasizes on the importance of applicability and effectiveness of existing works in real-life applications and highlights those computationally feasible and optimized techniques which can be supported in multiple devices, including lightweight devices like smartphones. Furthermore, we propose possible improvements and model architecture to support online video captioning.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    90
    References
    0
    Citations
    NaN
    KQI
    []