A Comprehensive Survey on Training Acceleration for Large Machine Learning Models in IoTs

2021 
The ever-growing Artificial Intelligence (AI) applications have greatly reshaped our world in many areas, e.g., smart home, computer vision, natural language processing, etc. Behind these applications are usually machine learning (ML) models with extremely large size, which require huge datasets for accurate training to mine the value contained in the big data. Large machine learning models, however, can consume tremendous computing resources to achieve decent performance and thus it is difficult to train them in resource-constrained Internet of Things (IoT) environments, which would prevent further development and application of AI techniques in the future. To deal with such challenges, there are many efforts on accelerating the training process for large machine learning models in IoTs. In this paper, we provide a comprehensive review on the recent advances toward reducing the computing cost during the training stage while maintaining comparable model accuracy. Specifically, the optimization algorithms that aim to improve the convergence rate are emphasized over various distributed learning architectures that exploit ubiquitous computing resources. Then, the paper elaborates the computation hardware acceleration and communication optimization for collaborative training among multiple learning entities. Finally, the remaining challenges, future opportunities and possible directions are discussed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []