A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization.

Yuyang Zhang,Dik Hin Leung,Min Guo,Yijia Xiao,Haoyue Liu,Yunfei Li,Jiyuan Zhang,Guan Wang,Zhen Chen

A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization.

2021

Yuyang Zhang
Dik Hin Leung
Min Guo
Yijia Xiao
Haoyue Liu
Yunfei Li
Jiyuan Zhang
Guan Wang
Zhen Chen

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.

Keywords:

Deep learning
Computer science
Edge computing
Matrix multiplication
Quantization (signal processing)
Computational science
Field-programmable gate array
Artificial intelligence
Perceptron
Hardware acceleration
Gate array

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations