Minimizing total job completion time in MapReduce scheduling

2021 
Abstract We follow up an earlier studied multiple-task parallel-machine scheduling model that captures the core challenges in MapReduce scheduling, with the optimization goal to minimize the total job completion time. The problem is a novel generalization of the classic two-stage flow-shop scheduling, in which we are given a set of jobs each is associated with multiple map tasks and multiple reduce tasks. All these tasks are non-preemptive to be processed on multiple parallel identical map machines and multiple parallel identical reduce machines, respectively, under the strict precedence constraints that, for each job, any reduce task cannot start before all its map tasks have been finished. We prove a new lower bound on the total job completion time, and based on which we present an O ( n log n + N + m 1 + m 2 ) -time ( 9 - 3 m 1 ) -approximation algorithm, where n and N are the number of jobs and the total number of tasks, respectively, m 1 and m 2 are the numbers of map and reduce machines, respectively, and they can even be part of the input. Our algorithm improves the previous best 12-approximation, and it reduces to the best approximation algorithms for several interesting or well studied special cases. We confirm through numerical experiments on 828 , 100 instances that both our lower bound and our algorithm significantly outperform: the empirical mean improvement ratio of the new lower bound is as high as 37.75 % , and the empirical mean approximation ratio of our algorithm is only 1.7582 .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    2
    Citations
    NaN
    KQI
    []