Intermediate Value Size Aware Coded MapReduce

2020 
MapReduce is a commonly used framework for parallel processing of data-intensive tasks, but its performance usually suffers from heavy communication load incurred by the shuffling of intermediate values (IVs) among computing servers. Recently, the Coded MapReduce framework is proposed which uses a coding scheme named coded distributed computing (CDC) to trade the communication load with extra computation in MapReduce. CDC can achieve the optimal computation-communication tradeoff when all the IVs have the same size. However, in many practical applications, the sizes of IVs can vary over a large range, leading to inferior performance. In this paper, we introduce a generalized CDC scheme which takes the sizes of IVs into account and then propose a combinatorial optimization problem aiming to minimize the communication load when the computation load is fixed. We show that the problem is NP-hard, and further propose a very efficient algorithm which achieves an approximation ratio of 2. Experiments conducted on Alibaba Cloud show that, compared to the original CDC scheme, our proposed IV size aware approach can significantly reduce the communication load and achieve a lower total execution time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []