Shortest Processing Time First and Hadoop

Laurent Bobelin,Patrick Martineau,Haiwu He

Shortest Processing Time First and Hadoop

2016

Laurent Bobelin
Patrick Martineau
Haiwu He

Big data has revealed itself as a powerful tool for many sectors ranging from science to business. Distributed data-parallel computing is then common nowadays: using a large number of computing and storage resources makes possible data processing of a yet unknown scale. But to develop large-scale distributed big data processing, one have to tackle many challenges. One of the most complex is scheduling. As it is known to be an optimal online scheduling policy when it comes to minimize the average flowtime, Shortest Processing Time First (SPT) is a classic scheduling policy used in many systems. We then decided to integrate this policy into Hadoop, a framework for big data processing, and realize an implementation prototype. This paper describes this integration, as well as tests results obtained on our testbed.

Keywords:

Scheduling (computing)
Computer science
Big data
Ranging
Distributed computing
Real-time computing
Testbed
Data processing
Operating system
big data processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations