FTConv: FPGA Acceleration for Transposed Convolution Layers in Deep Neural Networks

2019 
Transposed convolution, which is often used to scale up feature maps in various computer vision tasks, is a structural inverse process of convolution. Both convolution and transposed convolution, if any, account for the majority of computation in the inferences of deep neural networks. While convolution has been studied extensively, there are few investigations on accelerating transposed convolution. In this paper, we propose a fast algorithm, FTConv, to reduce the computation of transposed convolution using the Winograd algorithm, which has also been used for convolution with small kernels. Specifically, a transposed convolution can be converted into multiple convolutions after dividing the kernel into several congruence classes. Thus, we can accelerate the multiple convolutions using a modified Winograd algorithm. The transposed convolution can be obtained by interleaving output feature elements of each congruence class. We also design a Winograd ALU in four pipeline stages to further accelerate the computation on FPGA. By carefully designing a sliding window for on-chip buffer reuse according to the memory access pattern of transposed convolution, we save the memory bandwidth by 88.2% compared with a straightforward method. We evaluate FTConv using FSRCNN-s, a neural network for super-resolution. The number of multiplications in the transposed convolution layer can be reduced by 69% over the direct computation of FSRCNN-s.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    2
    Citations
    NaN
    KQI
    []