Pruning and Quantization for Deep Neural Network Acceleration: A Survey.
2021
Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. This paper provides a survey on two types of network compression: pruning and quantization. We compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
180
References
18
Citations
NaN
KQI