Globe2Train: A Framework for Distributed ML Model Training using IoT Devices Across the Globe

2021 
Training a problem-solving Machine Learning (ML) model using large datasets is computationally expensive and requires a scalable distributed training platform to complete training within a reasonable time frame. In this paper, we propose a novel concept where, instead of distributed training within a GPU cluster, we train one ML model by utilizing the idle hardware of numerous resource-constrained IoT devices existing across the globe. In such a global setting, staleness and real-world network uncertainties like congestion, latency, bandwidth issues are proven to impact the model convergence speed and training scalability. To implement the novel concept, while simultaneously addressing the real-world global distributed training challenges, we present Globe2Train (G2T), a framework with two components named G2T-Cloud (G2T-C) and G2T-Device (G2T-D) that can efficiently connect together multiple IoT devices and collectively train to produce the target ML models at very high speeds. The evaluation results with analysis show how the framework components jointly eliminate staleness and improve training scalability and speed by tolerating the real-world network uncertainties and by reducing the communication-to-computation ratio.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    0
    Citations
    NaN
    KQI
    []