HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs

Yi Yang,Min Feng,Srimat T. Chakradhar

HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs

2016

Yi Yang
Min Feng
Srimat T. Chakradhar

The massively parallel computation capability has made GPGPUs a promising platform for convolutional neural networks (CNNs). In this paper, we present HppCnn, a CNN library achieves both the high performance and portability on GPGPUs. In HppCnn, we propose a novel three-step approach to implement convolutional kernels using Nvidia cuBLAS efficiently. To overcome limitations of our three-step approach, we improve cuBLAS by enabling nested parallelism, and implement a low-cost auto-tuning module to leveraging existing libraries in the runtime. The experiments show HppCnn achieves significant speedups over both other cuBLAS-based and hand-optimized solutions. The results also show our solution delivers near-optimal performance on GPUs with the portability.

Keywords:

Deep learning
Parallel computing
Distributed computing
Computation
Parallel processing
Convolutional neural network
Software portability
Computer science
Massively parallel
General-purpose computing on graphics processing units
Artificial intelligence
nested parallelism
Convolution
massively parallel computation
Kernel (linear algebra)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations