AntiDoteX: Attention-Based Dynamic Optimization for Neural Network Runtime Efficiency

Fuxun Yu,Zirui Xu,Chenchen Liu,Dimitrios Stamoulis,Di Wang,Yanzhi Wang,Xiang Chen

AntiDoteX: Attention-Based Dynamic Optimization for Neural Network Runtime Efficiency

2022

Deep neural networks (DNNs) achieved great cognitive performance at the expense of a considerable computation workload. To relieve the computational burden, many optimization works are developed to reduce the model redundancy by identifying and removing insignificant model components, such as weight sparsity and filter pruning methods. However, these works only evaluate model components’ static significance with parameter information, ignoring their dynamic interaction with external inputs. Specifically, due to the difference in per-input features, the model components’ significance can dynamically change and, thus, the static methods can only achieve suboptimal performance. Focusing on this aspect, we propose a dynamic DNN optimization framework in this work. Based on the neural network attention mechanism, we propose a comprehensive dynamic optimization framework, including 1) testing-phase dynamic feature map pruning; 2) training-phase optimization by training with targeted dropout; and 3) deployment-phase one-for-all (OFA) model adaptability enhancement. By providing a holistic dynamic testing, training, and deployment co-optimization framework, our work has the following benefits: first, it can accurately identify and aggressively remove per-input feature redundancy by considering the model-input interaction and involving the channel/column-wise pruning flexibility; meanwhile, the training-testing co-optimization favors the dynamic pruning and helps maintain the model accuracy even with a very high feature pruning ratio. Finally, the deployment enhancement provides one unified OFA model to support full-spectrum feature sparsity ratios. The unified model can be dynamically reconfigured to meet different resource budgets without any retraining cost, and thus provide significant deployment flexibility. Extensive experiments show that our method could bring 37.4%–54.5% floating-point operations reduction with negligible accuracy drop on various test benchmarks. Meanwhile, the OFA deployment optimization enables us to use one model to support at most ten different resource constraints without any retraining cost.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations