Semi-Autoregressive Image Captioning
2021
Current state-of-the-art approaches for image captioning typically adopt an
autoregressive manner, i.e., generating descriptions word by word, which
suffers from slow decoding issue and becomes a bottleneck in real-time
applications. Non-autoregressive image captioning with continuous iterative
refinement, which eliminates the sequential dependence in a sentence
generation, can achieve comparable performance to the autoregressive
counterparts with a considerable acceleration. Nevertheless, based on a
well-designed experiment, we empirically proved that iteration times can be
effectively reduced when providing sufficient prior knowledge for the language
decoder. Towards that end, we propose a novel two-stage framework, referred to
as Semi-Autoregressive Image Captioning (SAIC), to make a better trade-off
between performance and speed. The proposed SAIC model maintains autoregressive
property in global but relieves it in local. Specifically, SAIC model first
jumpily generates an intermittent sequence in an autoregressive manner, that
is, it predicts the first word in every word group in order. Then, with the
help of the partially deterministic prior information and image features, SAIC
model non-autoregressively fills all the skipped words with one iteration.
Experimental results on the MS COCO benchmark demonstrate that our SAIC model
outperforms the preceding non-autoregressive image captioning models while
obtaining a competitive inference speedup. Code is available at
https://github.com/feizc/SAIC.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
46
References
0
Citations
NaN
KQI