UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph

2017 
SimRank is an effective structural similarity measurement between two vertices in a graph, which can be used in many applications like recommender systems. Although progresses have been achieved, existing methods still face challenges to handle large graphs. Besides huge index construction and maintenance cost, the existing methods require considerable search space and time overheads in the online SimRank query. In this paper, we design a Monte Carlo based method, Uni-Walk, to enable the fast top-k SimRank computation over large undirected graphs without indexing. UniWalk directly locates the top-k similar vertices for any single source vertex u via O(R) sampling paths originating from u only, which avoids the selection of candidate vertex set C and the following O(|C|R) bidirectional sampling paths starting from u and each candidate respectively in existing methods. We also design a space-efficient method to reduce intermediate results, and a path-sharing strategy to optimize path sampling for multiple source vertices. Furthermore, we extend UniWalk to existing distributed graph processing frameworks to improve its scalability. We conduct extensive experiments to illustrate that UniWalk has high scalability, and outperforms the state-of-the-art methods by orders of magnitude, and such an improvement is achieved without any indexing overheads.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    5
    Citations
    NaN
    KQI
    []