Pruning and Quantization for Deep Neural Network Acceleration: A Survey.

2021 
Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. This paper provides a survey on two types of network compression: pruning and quantization. We compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    180
    References
    18
    Citations
    NaN
    KQI
    []