Performance Models of Data Parallel DAG Workflows for Large Scale Data Analytics

2021 
Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. Building an accurate performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is critical to implement autonomic self-management big data systems. An accurate performance model is challenging because the allocation of preemptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    1
    Citations
    NaN
    KQI
    []