Pruning and Quantization for Deep Neural Network Acceleration: A Survey.

Tailin Liang,John Glossner,Lei Wang,Shaobo Shi

Pruning and Quantization for Deep Neural Network Acceleration: A Survey.

2021

Tailin Liang
John Glossner
Lei Wang
Shaobo Shi

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. This paper provides a survey on two types of network compression: pruning and quantization. We compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.

Keywords:

Pruning
Artificial neural network
Acceleration
Quantization (signal processing)
Computer engineering
Software deployment
Strengths and weaknesses
Computation
Complex network
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

180

References

Citations