Talos: A Weighted Speedup-Aware Device Placement of Deep Learning Models

2021 
Efficient device placement of deep learning (DL) models, which consist of many operations, is a big challenge when heterogeneous devices (e.g., CPU, GPU) are considered. Existing average speedup and transient speedup approaches do not make full use of operation-level speedups, and the Total Operation Completion Time (TOCT) cannot be optimized efficiently.To address this challenge, we present Talos, a weighted speedup-awareness approach to optimize device placement of multiple DL models. Talos reveals operations within or across DL models have diverse speedups (from 10−1 to 102) on heterogeneous devices. In addition, the execution time of operations are widely ranged (from 0.1ms to 100ms). Talos considers the two features simultaneously as weighted speedups, and treats them as costs in an incremental minimum-cost flow. Compared with state-of-the-art efforts, experiment results show that Talos can reduce TOCT by up to 50%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []