CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs (Extended Abstract)

Xupeng Miao,Lingxiao Ma,Zhi Yang,Yingxia Shao,Bin Cui,Lele Yu,Jiawei Jiang

CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs (Extended Abstract)

2021

Xupeng Miao
Lingxiao Ma
Zhi Yang
Yingxia Shao
Bin Cui
Lele Yu
Jiawei Jiang

In this paper, we propose an efficient GPU-training framework for the large-scale wide models, named cuWide. To fully benefit from the memory hierarchy of GPU, cuWide applies a new flow-based schema for training, which leverages the spatial and temporal locality of wide models to drastically reduce the amount of communication with GPU global memory. Comprehensive experiments show that cuWide can be up to more than 20× faster than the state-of-the-art GPU solutions and multi-core CPU solutions.

Keywords:

Parallel computing
Flow (mathematics)
training
Computer science
Locality of reference
Schema (genetic algorithms)
Memory management
Memory hierarchy
Information engineering
Data modeling

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations