Enabling Deep Neural Networks with Oversized Working Memory on Resource-Constrained MCUs

2021 
Deep neural networks (DNNs) have shown their great power in effectively extracting features and making predictions from noisy input data, which makes them the most widely used algorithm in machine learning applications. In the meantime, microcontroller units (MCUs) have become the most common processors in our daily life. Therefore, integrating DNNs into MCUs will definitely make a huge impact on the real world. Despite its importance, little attention has been paid to the deployment of DNNs onto MCUs yet. DNNs are usually resource-intensive while MCUs are resource-constrained, which often makes it infeasible to directly run DNNs on MCUs. Apart from the low frequency (1-16 MHz) and limited storage (e.g., 64KB to 256KB ROM), one of the biggest challenges is the small RAM size (e.g., 2KB to 16KB), which is needed to save the intermediate feature maps of a DNN in the runtime. Most existing DNN compression algorithms aim to reduce the model size so that the model can fit into limited storage. However, these algorithms do not reduce the size of intermediate feature maps significantly, which is referred to as working memory and might exceed the capacity of RAM. Therefore, it is possible that DNNs cannot run on MCUs even after compression. To address this problem, this work proposes a technique to dynamically prune the activation values of the output feature maps in the runtime if necessary, such that intermediate feature maps can fit into limited RAM. Experimental results on SVHN and CIFAR-10 have shown that the proposed algorithm could significantly reduce the working memory of a DNN to satisfy the hard constraint of RAM size while maintaining satisfactory accuracy with relatively low overhead on memory and run-time latency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []