|Shuhao Liu||University of Toronto, Canada|
|Li Chen||University of Toronto, Canada|
|Baochun Li||University of Toronto, Canada|
|Aiden Carnegie||University of Toronto, Canada|
Graph analytics has emerged as one of the fundamental techniques to support modern Internet applications. As real-world graph data is generated and stored globally, the scale of the graph that needs to be processed keeps growing. It is critical to efficiently process graphs across multiple geographically distributed datacenters, running wide-area graph analytics. Existing graph analytics frameworks are not designed to run across multiple datacenters well, as they implement a Bulk Synchronous Parallel model that requires excessive wide-area data transfers. In this paper, we present a new Hierarchical Synchronous Parallel model designed and implemented for synchronization across datacenters with a much improved efficiency in inter-datacenter communication. Our new model requires no modifications to graph analytics applications, yet guarantees their convergence and correctness. Our prototype implementation on Apache Spark can achieve up to 32% lower WAN bandwidth usage, 49% faster convergence, and 30% less total cost for benchmark graph algorithms, with input data stored across five geographically distributed datacenters.