Fast Approximate All Pairwise CoSimRanks via Random Projection

2021 
Given a graph G with n nodes, and two nodes \(u,v\in G\), the CoSimRank value s(u, v) quantifies the similarity between u and v based on graph topology. Compared to SimRank, CoSimRank is shown to be more accurate and effective in many real-world applications including synonym expansion, lexicon extraction, and entity relatedness in knowledge graphs. The computation of all pairwise CoSimRanks in G is highly expensive and challenging. Existing solutions all focus on devising approximate algorithms for the computation of all pairwise CoSimRanks. To attain a desired absolute accuracy guarantee \(\epsilon \), the state-of-the-art approximate algorithm for computing all pairwise CoSimRanks requires \(O(n^3\log _2(\ln (\frac{1}{\epsilon })))\) time, which is prohibitively expensive even \(\epsilon \) is large. In this paper, we propose \(\mathtt {RPCS}\), a fast randomized algorithm for computing all pairwise CoSimRank values. The basic idea of \(\mathtt {RPCS}\) is to approximate the \(n\times n\) matrix multiplications in CoSimRank computation via random projection. Theoretically, \(\mathtt {RPCS}\) runs in \(O(\frac{n^2\ln (n)}{\epsilon ^2}\ln (\frac{1}{\epsilon }))\) time and meanwhile ensures an absolute error of at most \(\epsilon \) in each CoSimRank value in G with a high probability. Extensive experiments using six real graphs demonstrate that \(\mathtt {RPCS}\) is more than up to orders of magnitude faster than the state of the art. In particular, on a million-edge Twitter graph, \(\mathtt {RPCS}\) answers the \(\epsilon \)-approximate (\(\epsilon =0.1\)) all pairwise CoSimRank query within 4 h, using a single commodity server, while existing solutions fail to terminate within a day.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []