HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs
2016
The massively parallel computation capability has made GPGPUs a promising platform for convolutional neural networks (CNNs). In this paper, we present HppCnn, a CNN library achieves both the high performance and portability on GPGPUs. In HppCnn, we propose a novel three-step approach to implement convolutional kernels using Nvidia cuBLAS efficiently. To overcome limitations of our three-step approach, we improve cuBLAS by enabling nested parallelism, and implement a low-cost auto-tuning module to leveraging existing libraries in the runtime. The experiments show HppCnn achieves significant speedups over both other cuBLAS-based and hand-optimized solutions. The results also show our solution delivers near-optimal performance on GPUs with the portability.
Keywords:
- Deep learning
- Parallel computing
- Distributed computing
- Computation
- Parallel processing
- Convolutional neural network
- Software portability
- Computer science
- Massively parallel
- General-purpose computing on graphics processing units
- Artificial intelligence
- nested parallelism
- Convolution
- massively parallel computation
- Kernel (linear algebra)
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
12
References
3
Citations
NaN
KQI