Shortest Processing Time First and Hadoop

2016 
Big data has revealed itself as a powerful tool for many sectors ranging from science to business. Distributed data-parallel computing is then common nowadays: using a large number of computing and storage resources makes possible data processing of a yet unknown scale. But to develop large-scale distributed big data processing, one have to tackle many challenges. One of the most complex is scheduling. As it is known to be an optimal online scheduling policy when it comes to minimize the average flowtime, Shortest Processing Time First (SPT) is a classic scheduling policy used in many systems. We then decided to integrate this policy into Hadoop, a framework for big data processing, and realize an implementation prototype. This paper describes this integration, as well as tests results obtained on our testbed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []