Associating Images with Sentences Using Recurrent Canonical Correlation Analysis

2020 
Associating images with sentences has drawn much attention recently. Existing methods commonly represent an image by indistinctively describing all its contents in an one-time static way, which ignores two facts that (1) the association analysis can only be effective for partial salient contents and the associated sentence, and (2) visual information acquisition is a dynamical rather than static process. To deal with this issue, we propose a recurrent canonical correlation analysis (RCCA) method for associating images with sentences. RCCA includes a contextual attention-based LSTM-RNN which can selectively attend to salient regions of an image at each time step, and then represent all the salient contents within a few steps. Different from existing attention-based models, our model focuses on the modelling of contextual visual attention mechanism for the task of association analysis. RCCA also includes a conventional LSTM-RNN for sentence representation learning. The resulting representations of images and sentences are fed into CCA to maximize linear correlation, where parameters of LSTM-RNNs and CCA are jointly learned. Due to the effective image representation learning, our model can well associate images with sentences with complex contents, and achieve better performance in terms of image annotation and retrieval.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    3
    Citations
    NaN
    KQI
    []