Elastic Executor Provisioning for Iterative Workloads on Apache Spark

2019 
In memory data analytic frameworks like Apache Spark are employed by an increasing number of diverse applications–such as machine learning, graph computation, and scientific computing, which benefit from the long-running process (e.g. executor) programming model to avoid system I/O overhead. However, existing resource allocation strategies mainly rely on the peak demand normally specified by users. Since the resource usages of long-running applications like iterative computation vary significantly over time, we find that peak-demand-based resource allocation policies lead to low cloud utilization in production environments. In this paper, we present an elastic utilization aware executor provisioning approach for iterative workloads on Apache Spark (i.e., iSpark). It can identify the causes of resource underutilization due to an inflexible resource policy, and elastically adjusts the allocated executors over time according to the real-time resource usage. In general, iterative applications require more computation resources at the beginning stage and their demands for resources diminish as more iterations are completed. iSpark aims to timely scale up or scale down the number of executors in order to fully utilize the allocated resources while taking the dominant factor into consideration. It further preempts the underutilized executors and preserves the cached intermediate data to ensure the data consistency. Testbed evaluations show that iSpark averagely improves the resource utilization of individual executors by 35.2 % compared to vanilla Spark. At the same time, it increases the cluster utilization from 32.1% to 51.3% and effectively reduces the overall job completion time by 20.8% for a set of representative iterative applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    2
    Citations
    NaN
    KQI
    []