Compute-in-Memory Technologies and Architectures for Deep Learning Workloads

2022 
The use of deep learning (DL) to real-world applications, such as computer vision, speech recognition, and robotics, has become ubiquitous. This can be largely attributed to a virtuous cycle between algorithms, data and computing, and storage capacity, which has driven rapid advances in all these dimensions. The ever-increasing demand for computation and memory from DL workloads presents challenges across the entire spectrum of computing platforms, from edge devices to the cloud. Hence, there is a need to explore new hardware paradigms that go well beyond the current mainstays such as graphical processing units (GPUs), tensor processing units (TPUs), and neural processing units (NPUs). A key bottleneck of current platforms is the so-called memory wall, which arises from the need to move large amounts of data between memory and compute units, expending considerable time and energy. One promising solution to this challenge is to move some computations either closer to memory, within the memory subsystem, or even within individual memory arrays. This approach, which is broadly referred to as compute-in-memory (CiM), has the potential to break the memory wall, and thereby greatly improve speed and power consumption. In this article, we provide an overview of CiM techniques used at different levels of the memory hierarchy and based on different memory technologies, including static random access memories (SRAMs), nonvolatile memories (NVMs), and DRAMs. We also discuss architectural approaches to designing CiM-based DL accelerators. Finally, we discuss the challenges associated with adopting CiM in future DL accelerators.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []