BatchSizer: Power-Performance Trade-off for DNN Inference.

2021 
GPU accelerators can deliver significant improvement for DNN processing; however, their performance is limited by internal and external parameters. A well-known parameter that restricts the performance of various computing platforms in real-world setups, including GPU accelerators, is the power cap imposed usually by an external power controller. A common approach to meet the power cap constraint is using the Dynamic Voltage Frequency Scaling (DVFS) technique. However, the functionally of this technique is limited and platform-dependent. To improve the performance of DNN inference on GPU accelerators, we propose a new control knob, which is the size of input batches fed to the GPU accelerator in DNN inference applications. After evaluating the impact of this control knob on power consumption and performance of GPU accelerators and DNN inference applications, we introduce the design and implementation of a fast and lightweight runtime system, called BatchSizer. This runtime system leverages the new control knob for managing the power consumption of GPU accelerators in the presence of the power cap. Conducting several experiments using a modern GPU and several DNN models and input datasets, we show that our BatchSizer can significantly surpass the conventional DVFS technique regarding performance (up to 29%), while successfully meeting the power cap.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    1
    Citations
    NaN
    KQI
    []