GPQ: Greedy Partial Quantization of Convolutional Neural Networks Inspired by Submodular Optimization

Satoki Tsuji,Fuyuka Yamada,Hiroshi Kawaguchi,Atsuki Inoue,Yasufumi Sakai

GPQ: Greedy Partial Quantization of Convolutional Neural Networks Inspired by Submodular Optimization

2020

Recent work has revealed that the effects of neural network quantization on inference accuracy are different for each layer. Therefore, partial quantization and mixed precision quantization have been studied for neural network accelerators with multi-precision designs. However, these quantization methods generally require network training that entails a high computational cost or exhibit a significant loss of inference accuracy. In this paper, we propose a greedy search algorithm for partial quantization that can derive optimal combinations of quantization layers; notably, the proposed method exhibits a low computational complexity, O(N2) (N denotes the number of layers). The proposed Greedy Partial Quantization (GPQ) achieved 4.2 × model size compression with only -0.03% accuracy loss in ResNet50 and 2.5× compression with +0.015% accuracy gain in Xception. The computational cost of GPQ is only 2.5 GPU-hours in the case of EfficientNet-B0 8-bit quantization for ImageNet classification.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations