COBRA: Toward Provably Efficient Semi-clairvoyant Scheduling In Data Analytics Systems

Xiaoda Zhang Nanjing University, P.R. China
Zhuzhong Qian Nanjing University, P.R. China
Sheng Zhang Nanjing University, P.R. China
Xiangbo Li Nanjing University, P.R. China
Xiaoliang Wang Nanjing University, P.R. China
Sanglu Lu Nanjing University, P.R. China


Typical data analytics systems abstract jobs as directed acyclic graphs (DAGs). It is crucial to maximize throughput and speedup completions for DAG jobs in practice. Existing works propose clairvoyant schedulers optimizing these goals, however, they assume complete job information as a prior knowledge which limits their applicability. Instead, we remove the complete prior knowledge assumption and rely solely on a partial prior information, which is more practical. And we design a semi-clairvoyant task scheduler COBRA working within each job. COBRA adaptively adjusts its resource desires in a multiplicative-increase multiplicative-decrease (MIMD) manner according to nearly past resource utilizations and the current waiting tasks. On the other hand, COBRA seeks to satisfy task locality preferences by allowing each task to wait for some time that is bounded by a parameterized threshold. Surprisingly, even with the partial prior job information, we theoretically prove, COBRA, when working with the widely used fair job scheduler, is O(1)-competitive with respect to both makespan and average job response time. We experimentally validate that the performance promotion of COBRA in both real system deployment and trace-driven simulations.

You may want to know: