LASH: Large-scale Academic Deep Semantic Hashing

2021 
With the explosively increasing of academic papers, efficient academic document retrieval is becoming an essential requirement for large-scale information retrieval systems. Inspired by the success of deep semantic hashing in normal document retrieval, deep semantic hashing is a promising approach for academic document retrieval by mapping academic documents into efficient hash codes. However, for academic document retrieval, the existing deep semantic hashing methods suffer from following two problems: (1) they cannot differentiate the importance of different field labels; (2) they cannot plenty utilize the structure information in paper citations. To address these problems, we propose a novel Large-scale Academic deep Semantic Hashing, called LASH. Specifically, LASH first treats paper citations as a citation network, and then employs a multi-input variational deep autoencoder to directly encode both structure information of the citation network and semantic information of academic documents into unified hash codes. Moreover, a weighted percentage similarity is designed to measure the importance of different field labels, which is a linear combination of Jaccard and Cosine similarity. Supervised by the similarity, the learned unified hash codes can further preserve the importance of different field labels. Extensive experiments show LASH significantly outperforms state-of-the-art baselines over proposed three real-world large-scale academic datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []