Meta Self-Paced Learning for Cross-Modal Matching

2021 
Cross-modal matching has attracted growing attention due to the rapid emergence of the multimedia data on the web and social applications. Recently, many re-weighting methods have been proposed for accelerating model training by designing a mapping function from similarity scores to weights. However, these re-weighting methods are difficult to be universally applied in practice since manually pre-set weighting functions inevitably involve hyper-parameters. In this paper, we propose a Meta Self-Paced Network (Meta-SPN) that automatically learns a weighting scheme from data for cross-modal matching. Specifically, a meta self-paced network composed of a fully connected neural network is designed to fit the weight function, which takes the similarity score of the sample pairs as input and outputs the corresponding weight value. Our meta self-paced network considers not only the self-similarity scores, but also their potential interactions (e.g., relative-similarity) when learning the weights. Motivated by the success of meta-learning, we use the validation set to update the meta self-paced network during the training of the matching network. Experiments on two image-text matching benchmarks and two video-text matching benchmarks demonstrate the generalization and effectiveness of our method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    1
    Citations
    NaN
    KQI
    []