DyBatch: Efficient Batching and Fair Scheduling for Deep Learning Inference on Time-sharing Devices

2020 
Recently, Deep Learning (DL) is widely applied to intelligent systems equipped with resource-constraint hardware accelerators. With multiple DL applications sharing the resource, the execution model can be divided into two stages: (i) batching independent inference tasks initiated by each application, and (ii) scheduling batches to run in a time-sharing manner. The state-of-the-art DL serving systems employ the execution model by organizing sequential tasks into batches and then scheduling batches concerning their targeting deep neural network (DNN) models in a round-robin manner. However, we demonstrated that these practices fail to alleviate the slowdown of tasks, and there is a need to re-visit batching and scheduling in terms of efficiency and fairness. To this end, we formulated batching as a resource allocation problem and investigated scheduling in terms of each application’s utilization on the device. Then, we proposed the fine-grained batching scheme and fairness-driven scheduling scheme for DL serving and implemented a prototype system called DyBatch. To be exact, DyBatch accomplishes efficient batching by taking into account Pareto efficiency of and envy between batches. Besides, DyBatch’s fair scheduler monitors the resource utilization of all applications and assigns a batch from the application with the lowest utilization for execution first. Evaluation under various benchmarks with comparison to the baseline system Tensorflow Serving (TFS) shows the superiority of DyBatch, which achieves up to 55% reduction of slowdown, and up to 12% improvement of throughput.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []