Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing

2018 
Cross-modal hashing has become a popular research topic in recent years due to the efficiency of storing and retrieving high-dimensional multimodal data represented by compact binary codes. While most cross-modal hash functions use binary space partitioning functions (e.g. the sign function), our method uses ranking-based hashing, which is based on numerically stable and scale-invariant rank correlation measures. In this paper, we propose a novel deep learning architecture called Deep De-correlated Subspace Ranking Hashing (DDSRH) that uses feature-ranking methods to determine the hash codes for the image and text modalities in a common hamming space. Specifically, DDSRH learns a set of de-correlated nonlinear subspaces on which to project the original features, so that the hash code can be determined by the relative ordering of projected feature values in a given optimized subspace. The network relies upon a pre-trained deep feature learning network for each modality, and a hashing network responsible for optimizing the hash codes based on the known similarity of the training image-text pairs. Our proposed method includes both architectural and mathematical techniques designed specifically for ranking-based hashing in order to achieve de-correlation between the bits, bit balancing, and quantization. Finally, through extensive experimental studies on two widely-used multimodal datasets, we show that the combination of these techniques can achieve state-of the-art performance on several benchmarks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    5
    Citations
    NaN
    KQI
    []