HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs

2016 
The massively parallel computation capability has made GPGPUs a promising platform for convolutional neural networks (CNNs). In this paper, we present HppCnn, a CNN library achieves both the high performance and portability on GPGPUs. In HppCnn, we propose a novel three-step approach to implement convolutional kernels using Nvidia cuBLAS efficiently. To overcome limitations of our three-step approach, we improve cuBLAS by enabling nested parallelism, and implement a low-cost auto-tuning module to leveraging existing libraries in the runtime. The experiments show HppCnn achieves significant speedups over both other cuBLAS-based and hand-optimized solutions. The results also show our solution delivers near-optimal performance on GPUs with the portability.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    3
    Citations
    NaN
    KQI
    []