Talos: A Weighted Speedup-Aware Device Placement of Deep Learning Models

Yuanjia Xu,Heng Wu,Wenbo Zhang,Chen Yang,Yuewen Wu,Heran Gao,Tao Wang

Talos: A Weighted Speedup-Aware Device Placement of Deep Learning Models

2021

Efficient device placement of deep learning (DL) models, which consist of many operations, is a big challenge when heterogeneous devices (e.g., CPU, GPU) are considered. Existing average speedup and transient speedup approaches do not make full use of operation-level speedups, and the Total Operation Completion Time (TOCT) cannot be optimized efficiently.To address this challenge, we present Talos, a weighted speedup-awareness approach to optimize device placement of multiple DL models. Talos reveals operations within or across DL models have diverse speedups (from 10−1 to 102) on heterogeneous devices. In addition, the execution time of operations are widely ranged (from 0.1ms to 100ms). Talos considers the two features simultaneously as weighted speedups, and treats them as costs in an incremental minimum-cost flow. Compared with state-of-the-art efforts, experiment results show that Talos can reduce TOCT by up to 50%.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations