Contrastive Adversarial Knowledge Distillation for Deep Model Compression in Time-Series Regression Tasks

2021 
Abstract Knowledge distillation (KD) attempts to compress a deep teacher model into a shallow student model by letting the student mimic the teacher’s outputs. However, conventional KD approaches can have the following shortcomings. First, existing KD approaches align the global distribution between teacher and student models and overlook the fine-grained features. Second, most of existing approaches focus on classification tasks and require the architecture of teacher and student models to be similar. To address these limitations, we propose a contrastive adversarial knowledge distillation called CAKD for time series regression tasks where the student and teacher are using different architectures. Specifically, we first propose adversarial adaptation to automatically align the feature distribution between student and teacher networks respectively. Yet, adversarial adaptation can only align the global feature distribution without considering the fine-grained features. To mitigate this issue, we employ a novel contrastive loss for instance-wise alignment between the student and teacher. Particularly, we maximize similarity between teacher and student features that originate from the same sample. Lastly, a KD loss is used to for the knowledge distillation where the teacher and student have two different architectures. We used a turbofan engine dataset that consists of four sub-datasets to evaluate the model performance. The results show that the proposed CAKD method consistently outperforms state-ofthe-art methods in terms of two different metrics.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    0
    Citations
    NaN
    KQI
    []