pLUTo: Enabling Massively Parallel Computation In DRAM via Lookup Tables.

2021 
Data movement between main memory and the processor is a key contributor to the execution time and energy consumption of memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory device. PuM yields high throughput and efficiency, but supports a limited range of operations. As a result, PuM architectures cannot efficiently perform some complex operations (e.g., multiplication, division, exponentiation) without sizeable increases in chip area and design complexity. To overcome this limitation in DRAM-based PuM architectures, we introduce pLUTo (processing-using-memory with lookup table [LUT] operations), a DRAM-based PuM architecture that leverages the high area density of DRAM to enable the massively parallel storing and querying of lookup tables (LUTs). The use of LUTs enables pLUTo to efficiently execute complex operations in-memory via memory reads (i.e., LUT queries) instead of relying on complex extra logic or performing long sequences of DRAM commands. pLUTo outperforms the optimized CPU and GPU baselines in performance/energy efficiency by an average of 1960$\times$/307$\times$ and 4.2$\times$/4$\times$ across the evaluated workloads, and by 33$\times$/8$\times$ and 110$\times$/80$\times$ for the LeNet-5 quantized neural network. pLUTo outperforms a state-of-the-art PiM baseline by 50$\times$/342$\times$ in performance/energy efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    100
    References
    0
    Citations
    NaN
    KQI
    []