Hippie: A Data-Paralleled Pipeline Approach to Improve Memory-Efficiency and Scalability for Large DNN Training

2021 
With the increase of both data and parameter volume, it has become a big challenge to efficiently train large-scale DNN models on distributed platforms. Ordinary parallelism modes, i.e., data parallelism, model parallelism and pipeline parallelism, can no longer satisfy the efficient scaling of large DNN model training on multiple nodes. Meanwhile, the problem of too much memory consumption seriously restricts GPU computing efficiency and training throughput. In this paper, we propose Hippie, a hybrid parallel training framework that integrates pipeline parallelism and data parallelism to improve the memory efficiency and scalability of large DNN training. Hippie adopts a hybrid parallel method based on hiding gradient communication, which improves the throughput and scalability of training. Meanwhile, Hippie introduces the last-stage pipeline scheduling and recomputation for specific layers to effectively reduce the memory overhead and ease the difficulties of training large DNN models on memory-constrained devices. To achieve a more reasonable evaluation of the optimization effect, we propose an index of memory efficiency (ME) to represent the tradeoff between throughput and memory overhead. We implement Hippie based on PyTorch and NCCL. Experiments on various models show that Hippie achieves above 90% scaling efficiency on a 16-GPU platform. Moreover, Hippie increases throughput by up to 80% while saving 57% of memory overhead, achieving 4.18 × memory efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []