A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors

Wei Hao Chen,Kai-Xiang Li,Wei Yu Lin,K. C. Hsu,Pin-Yi Li,Cheng-Han Yang,Cheng-Xin Xue,En Yu Yang,Yen-Kai Chen,Yun-Sheng Chang,Tzu-Hsiang Hsu,Ya-Chin King,Chorng-Jung Lin,Ren-Shuo Liu,Chih-Cheng Hsieh,Kea-Tiong Tang,Meng-Fan Chang

A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors

2018

Many artificial intelligence (AI) edge devices use nonvolatile memory (NVM) to store the weights for the neural network (trained off-line on an AI server), and require low-energy and fast I/O accesses. The deep neural networks (DNN) used by AI processors [1,2] commonly require p-layers of a convolutional neural network (CNN) and q-layers of a fully-connected network (FCN). Current DNN processors that use a conventional (von-Neumann) memory structure are limited by high access latencies, I/O energy consumption, and hardware costs. Large working data sets result in heavy accesses across the memory hierarchy, moreover large amounts of intermediate data are also generated due to the large number of multiply-and-accumulate (MAC) operations for both CNN and FCN. Even when binary-based DNN [3] are used, the required CNN and FCN operations result in a major memory I/O bottleneck for AI edge devices.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

108

Citations