Image captioning with transformer and knowledge graph

Yu Zhang,Xinyu Shi,Siya Mi,Xu Yang

Image captioning with transformer and knowledge graph

2021

Yu Zhang
Xinyu Shi
Siya Mi
Xu Yang

Abstract The Transformer model has achieved very good results in machine translation tasks. In this paper, we adopt the Transformer model for the image captioning task. To promote the performance of image captioning, we improve the Transformer model from two aspects. First, we augment the maximum likelihood estimation (MLE) with an extra Kullback-Leibler (KL) divergence term to distinguish the difference between incorrect predictions. Second, we introduce a method to help the Transformer model generate captions by leveraging the knowledge graph. Experiments on benchmark datasets demonstrate the effectiveness of our method.

Keywords:

knowledge graph
Machine translation
Maximum likelihood
Data mining
Computer science
transformer
Closed captioning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations