CNNPC: End-Edge-Cloud Collaborative CNN Inference With Joint Model Partition and Compression

2022 
Edge Intelligence (EI) aims at addressing concerns like response latency risen by the conflict between predominating Cloud-based deployments of computationally intensive AI applications and the expensive uploading of explosive end data. Convolutional Neural Networks (CNNs) leading the latest flourish of AI inevitably suffer from the aforementioned conflict. There emerge increasing EI-driven attempts on fast CNN inference with high accuracy in the End-Edge-Cloud (EEC) collaborative computing paradigm, where, however, neither model compression approaches for on-device inference nor collaborative inference methods across devices can effectively achieve the trade-off between latency and accuracy of End-to-End (E2E) inference. In this article, we present CNNPC that jointly partitions and compresses CNNs for fast inference with high accuracy in collaborative EEC systems. We implemented CNNPC (source code available at https://github.com/IoTDATALab/CNNPC ) and evaluated its performance within extensive real-world EEC scenarios. Experimental results demonstrate that, compared with state-of-the-art single-end and collaborative approaches, without obvious accuracy loss, collaborative inference based on CNNPC is up to $1.6\times$ and $5.6\times$ faster, and requires as low as $4.30\%$ and $6.48\%$ communications, respectively. Besides, when determines the optimal strategy, CNNPC requires as low as $0.1\%$ actual compression operations that the traversal method (the only viable method providing the theoretically optimal strategy) requires.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []