Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

Hsing-Hung Chou,Ching-Te Chiu,Yi-Ping Liao

Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

2021

Hsing-Hung Chou
Ching-Te Chiu
Yi-Ping Liao

Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a computation rate.

Keywords:

Artificial neural network
Algorithm
cross layer
Distillation
Kullback–Leibler divergence
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations