Video Captioning with External Knowledge Assistance and Multi-feature Fusion

2021 
Video captioning aims to describe the main content of a given video in natural language, which has become a research hotspot because of its wide potential application prospect. Semantic information, as a priori knowledge, is often applied to improve the caption quality, but the scope of these semantic information is relatively small, resulting in insufficient coverage of video attributes. In this paper, we introduce external knowledge from ConceptNet to expand the semantic coverage, so that the model can refer to more semantic information. In addition, a multi-feature fusion is proposed to obtain more informative video features and higher quality semantic features. Experimental results on the MSVD and MSRVTT datasets show that the proposed method can greatly improve the caption diversity and model performance, surpass all previous models in all evaluation metrics, and achieve the new state-of-the-art results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []