Image captioning with transformer and knowledge graph

2021 
Abstract The Transformer model has achieved very good results in machine translation tasks. In this paper, we adopt the Transformer model for the image captioning task. To promote the performance of image captioning, we improve the Transformer model from two aspects. First, we augment the maximum likelihood estimation (MLE) with an extra Kullback-Leibler (KL) divergence term to distinguish the difference between incorrect predictions. Second, we introduce a method to help the Transformer model generate captions by leveraging the knowledge graph. Experiments on benchmark datasets demonstrate the effectiveness of our method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    4
    Citations
    NaN
    KQI
    []