Automating System Configuration of Distributed Machine Learning

2019 
The performance of distributed machine learning systems is dependent on their system configuration. However, configuring the system for optimal performance is challenging and time consuming even for experts due to the diverse runtime factors such as workloads or the system environment. We present cost-based optimization to automatically find a good system configuration for parameter server (PS) machine learning (ML) frameworks. We design and implement Cruise that applies the optimization technique to tune distributed PS ML execution automatically. Evaluation results on three ML applications verify that Cruise automates the system configuration of the applications to achieve good performance with minor reconfiguration costs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    5
    Citations
    NaN
    KQI
    []