Inference of Beliefs on Billion-Scale Graphs

2010 
How do we scale up the inference of graphical models to billions of nodes and edges? How do we, or can we even, implement an inference algorithm for graphs that do not t in the main memory? Can we easily implement such an algorithm on top of an existing framework? How would we run it? And how much time will it save us? In this paper, we tackle this collection of problems through an ecient parallel algorithm for Belief Propagation(BP) that we developed for sparse billion-scale graphs using the Hadoop platform. Inference problems on graphical models arise in many scientic domains; BP is an ecient algorithm that has successfully solved many of those problems. We have discovered and we will demonstrate that this useful algorithm can be implemented on top of an existing framework | the crucial observation in the discovery is that the message update process in BP is essentially a special case of GIM-V(Generalized Iterative Matrix-Vector multiplication) [10], a primitive for large scale graph mining, on a line graph induced from the original graph. We show how we formulate the BP algorithm as a variant of GIM-V, and present an ecient algorithm. We experiment with our parallelized algorithm on the largest publicly available Web Graphs from Yahoo!, with about 6.7 billion edges, on M45, one of the top 50 fastest supercomputers in the world, and compare the running time with that of a single-machine, disk-based BP algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    17
    Citations
    NaN
    KQI
    []