CDP: Towards Optimal Filter Pruning via Class-wise Discriminative Power

Tianshuo Xu,Yuhang Wu,Xiawu Zheng,Teng Xi,Gang Zhang,Errui Ding,Fei Chao,Rongrong Ji

CDP: Towards Optimal Filter Pruning via Class-wise Discriminative Power

2021

Neural network pruning has shown promising performance in reducing computational complexity and facilitate the deployment of deep neural networks on resource-limited edge devices. Most existing pruning methods focus on the indicators of the filter's weight, gradient, or feature map and regard the weak or similar filters as network redundancy. In contrast, the representation of discriminative power is also a fundamental attribute that analog neural networks to have extraordinary performance in various tasks. However, such representation is neglected in existing works. Alternatively, we propose a novel filter pruning strategy via class-wise discriminative power (CDP). Unlike the previous methods, CDP treats the filters that always yield large or small activation values as redundant and reserves the filters that show different magnitudes in activations as they yield high discriminative power. We further propose to obtain such discriminative power by employing the widely-used Term Frequency-Inverse Document Frequency (TF-IDF) on feature representations across classes. Specifically, the output of a filter is considered as a word, and the whole feature map is considered as a document. Then, TF-IDF is used to generate the relevant score between words and all documents. If a filter has low TF-IDF scores is less discriminate and can be pruned. Thus, the filters with high TF-IDF scores are reserved. To our best knowledge, this is the first work that prunes neural networks through class-wise discriminative power and measures such power by introducing TF-IDF in feature representation among different classes. Without any iterative process, CDP achieves better compression trade-offs comparing to the state-of-the-art compression algorithms. For instance, in VGG-16, we achieve a 68.05%-FLOPs reduction, with a 94.86% Top-1 accuracy on CIFAR-10. Specifically, we compress a 90.12%-FLOPs reduction VGG-16, even retains 93.30% Top-1 accuracy on CIFAR-10. The code is available at https://github.com/Tianshuo-Xu/CDP-Towards-Optimal-Filter-Pruning-via-Class-wise-Discriminative-Power.git

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations