Multi-view collective tensor decomposition for cross-modal hashing

2019 
With the development of social media, data often come from a variety of sources in different modalities. These data contain complementary information that can be used to produce better learning algorithms. Such data exhibit dual heterogeneity: On the one hand, data obtained from multiple modalities are intrinsically different; on the other hand, features obtained from different disciplines are usually heterogeneous. Existing methods often consider the first facet while ignoring the second. Thus, in this paper, we propose a novel multi-view cross-modal hashing method named Multi-view Collective Tensor Decomposition (MCTD) to mitigate the dual heterogeneity at the same time, which can fully exploit the multimodal multi-view feature while simultaneously discovering multiple separated subspaces by leveraging the data categories as supervision information. We propose a novel cross-modal retrieval framework which consists of three components: (1) two tensors which model the multi-view features from different modalities in order to get better representation of the complementary features and a latent representation space; (2) a block-diagonal loss which is used to explicitly enforce a more discriminative latent space by leveraging supervision information; and (3) two feature projection matrices which characterize the data and generate the latent representation for incoming new queries. We use an iterative updating optimization algorithm to solve the objective function designed for MCTD. Extensive experiments prove the effectiveness of MCTD compared with state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    52
    References
    1
    Citations
    NaN
    KQI
    []