DyBatch: Efficient Batching and Fair Scheduling for Deep Learning Inference on Time-sharing Devices

Shaojun Zhang,Wei Li,Chen Wang,Zahir Tari,Albert Y. Zomaya

DyBatch: Efficient Batching and Fair Scheduling for Deep Learning Inference on Time-sharing Devices

2020

Recently, Deep Learning (DL) is widely applied to intelligent systems equipped with resource-constraint hardware accelerators. With multiple DL applications sharing the resource, the execution model can be divided into two stages: (i) batching independent inference tasks initiated by each application, and (ii) scheduling batches to run in a time-sharing manner. The state-of-the-art DL serving systems employ the execution model by organizing sequential tasks into batches and then scheduling batches concerning their targeting deep neural network (DNN) models in a round-robin manner. However, we demonstrated that these practices fail to alleviate the slowdown of tasks, and there is a need to re-visit batching and scheduling in terms of efficiency and fairness. To this end, we formulated batching as a resource allocation problem and investigated scheduling in terms of each application’s utilization on the device. Then, we proposed the fine-grained batching scheme and fairness-driven scheduling scheme for DL serving and implemented a prototype system called DyBatch. To be exact, DyBatch accomplishes efficient batching by taking into account Pareto efficiency of and envy between batches. Besides, DyBatch’s fair scheduler monitors the resource utilization of all applications and assigns a batch from the application with the lowest utilization for execution first. Evaluation under various benchmarks with comparison to the baseline system Tensorflow Serving (TFS) shows the superiority of DyBatch, which achieves up to 55% reduction of slowdown, and up to 12% improvement of throughput.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations