RStream: Marrying Relational Algebra With Streaming For Efficient Graph Mining On A Single Machine

Authors:
Kai Wang UC Irvine
Zhiqiang Zuo Nanjing University
John Thorpe UC Irvine
Tim Nguyen UC Irvine
Guoqing Harry Xu UC Irvine and Microsoft Research

Introduction:

Graph mining is an important category of graph algorithms that aim to discover structural patterns such as cliques and motifs in a graph.the authors built RStream, the first single-machine, out-of-core mining system that leverages disk support to store intermediate data. Graph mining is an important category of graph algorithms that aim to discover structural patterns such as cliques and motifs in a graph.

Abstract:

Graph mining is an important category of graph algorithms that aim to discover structural patterns such as cliques and motifs in a graph. While a great deal of work has been done recently on graph computation such as PageRank, systems support for scalable graph mining is still limited. Existing mining systems such as Arabesque focus on distributed computing and need large amounts of compute and memory resources.We built RStream, the first single-machine, out-of-core mining system that leverages disk support to store intermediate data. At its core are two innovations: (1) a rich programming model that exposes relational algebra for developers to express a wide variety of mining tasks; and (2) a runtime engine that implements relational algebra efficiently with tuple streaming. A comparison between RStream and four state-of-the-art distributed mining/Datalog systems---Arabesque, ScaleMine, DistGraph, and BigDatalog---demonstrates that RStream outperforms all of them, running on a 10-node cluster, e.g., by at least a factor of 1.7, and can process large graphs on an inexpensive machine.

You may want to know: