DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters

2020 
Data dependency, often presented as directed acyclic graph (DAG), is a crucial application semantics for the performance of data analytic platforms such as Spark. Spark comes with two built-in schedulers, namely FIFO and Fair scheduler, which do not take advantage of data dependency structures. Recently proposed DAG-aware task scheduling approaches, notably GRAPHENE, have achieved significant performance improvements but paid little attention to cache management. The resulted data access patterns interact poorly with the built-in LRU caching, leading to significant cache misses and performance degradation. On the other hand, DAG-aware caching schemes, such as Most Reference Distance (MRD), are designed for FIFO scheduler instead of DAG-aware task schedulers.In this paper, we propose and develop a middleware Dagon, which leverages the complexity and heterogeneity of DAGs to jointly execute task scheduling and cache management. Dagon relies on three key mechanisms: DAG-aware task assignment that considers dependency structure and heterogeneous resource demands to reduce potential resource fragmentation, sensitivity-aware delay scheduling that prevents executors from long waiting for tasks insensitive to locality, and priority-aware caching that makes the cache eviction and prefetching decisions based on the stage priority determined by DAG-aware task assignment. We have implemented Dagon in Apache Spark. Evaluation on a testbed shows that Dagon improves the job completion time by up to 42% and CPU utilization by up to 46% respectively, compared to GRAPHENE plus MRD.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    5
    Citations
    NaN
    KQI
    []