Structural Semantic Adversarial Active Learning for Image Captioning

Beichen Zhang,Liang Li,Li Su,Shuhui Wang,Jincan Deng,Zheng-Jun Zha,Qingming Huang

Structural Semantic Adversarial Active Learning for Image Captioning

2020

Most image captioning models achieve superior performances with the help of large-scale surprised training data, but it is prohibitively costly to label the image captions. To solve this problem, we propose a structural semantic adversarial active learning (SSAAL) model that leverages both visual and textual information for deriving the most representative samples while maximizing the image captioning performance. SSAAL consists of a semantic constructor, a snapshot& caption (SC) supervisor, and a labeled/unlabeled state discriminator. The constructor is designed to generate a structural semantic representation describing the objects, attributes and object relationships in the image. The SC supervisor is proposed to supervise this representation at the word-level and sentence-level in a multi-task learning manner, which directly relates the representation to ground-truth captions and updates it in the caption generating process. Finally, we introduce a state discriminator to predict the sample state and select images with sufficient semantic and fine-grained diversity. Extensive experiments on standard captioning dataset show that our model outperforms other active learning methods and achieves a competitive performance even though selecting a small amount of samples.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations