Progressive Mimic Learning: A New Perspective to Train Lightweight CNN Models
Abstract Knowledge distillation (KD) builds a lightweight Student Model (SM) and trains it to approximate a large Teacher Model (TM) by exploring knowledge learned by the TM, which shows effectiveness to train lightweight CNN models. However, training a small SM to achieve better performance remains a challenging problem. Recent researches on human learning behaviors show that both the knowledge from teachers and the knowledge learning processes of teachers are significant for students. Inspired by this characteristic, in this paper, we propose a new perspective, called Progressive Mimic Learning (PML), to train lightweight CNN models by mimicking the learning trajectory of the TM. In order to obtain a more powerful SM, the useful hints in the learning process of the TM are explored. To start with, the TM learning process is divided into multiple stages, and the last state of the TM in each stage is recorded as a landmark. The learning trajectory of the TM is composed of these landmarks. Then, a landmark loss is defined to constrain the SM to progressively mimic the learning process of the TM, by employing landmarks in the learning trajectory as a training hint of the SM. Several experiments are conducted on four benchmark data sets, CIFAR-10, CIFAR-100, Fashion-MNIST, and ImageNet-10, to investigate the performance of the PML. The results show that the PML can make SMs generate more accurate predictions than SMs trained by its counterparts.