DRASH: A Data Replication-Aware Scheduler in Geo-Distributed Data Centers

Moïse W. Convolbo,Jerry Chou,Shihyu Lu,Yeh-Ching Chung

DRASH: A Data Replication-Aware Scheduler in Geo-Distributed Data Centers

2016

Driven by the trends of BigData and Cloud computing, there is a growing demand for processing and analyzing data that are generated and stored across geo-distributed data centers. However, due to the limited network bandwidth between data centers and the growing data volume spread across different locations, it has become increasingly inefficient to aggregate data and to perform computations at a single data center. An approach that has been commonly used by data-intensive cluster computation systems, like Hadoop, is to distribute computations based on data locality so that data can be processed locally to reduce the network overhead and improve performance. But limited work has been done to adapt and evaluate such technique for geo-distributed data centers. In this paper, we proposed DRASH (Data-Replication Aware Scheduler), a job scheduling algorithm that enforces data locality to prevent data transfer, and exploits data replications to improve overall system performance. Our evaluation using simulations with realistic workload traces shows that DRASH can outperform other existing approaches by 16% to 60% in average job completion time, and achieve greater improvements under higher data replication factors.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations