|Xiaoda Zhang||Nanjing University, P.R. China|
|Zhuzhong Qian||Nanjing University, P.R. China|
|Sheng Zhang||Nanjing University, P.R. China|
|Xiangbo Li||Nanjing University, P.R. China|
|Xiaoliang Wang||Nanjing University, P.R. China|
|Sanglu Lu||Nanjing University, P.R. China|
Typical data analytics systems abstract jobs as directed acyclic graphs (DAGs). It is crucial to maximize throughput and speedup completions for DAG jobs in practice. Existing works propose clairvoyant schedulers optimizing these goals, however, they assume complete job information as a prior knowledge which limits their applicability. Instead, we remove the complete prior knowledge assumption and rely solely on a partial prior information, which is more practical. And we design a semi-clairvoyant task scheduler COBRA working within each job. COBRA adaptively adjusts its resource desires in a multiplicative-increase multiplicative-decrease (MIMD) manner according to nearly past resource utilizations and the current waiting tasks. On the other hand, COBRA seeks to satisfy task locality preferences by allowing each task to wait for some time that is bounded by a parameterized threshold. Surprisingly, even with the partial prior job information, we theoretically prove, COBRA, when working with the widely used fair job scheduler, is O(1)-competitive with respect to both makespan and average job response time. We experimentally validate that the performance promotion of COBRA in both real system deployment and trace-driven simulations.