HBM3 RAS: Enhancing Resilience at Scale

2021 
HBM3 is the next-generation technology in the JEDEC High Bandwidth Memory die-stacked DRAM standard. HBM3 is expected to be widely used in future SoCs to accelerate data center and automotive workloads. Reliability, Availability, and Serviceability (RAS) are key requirements in most of these computing domains and use cases. Memory reliability is especially key to attaining resilience at scale. This paper presents the RAS challenges facing HBM3 and how they are addressed by a novel memory RAS architecture that is now part of the HBM3 standard. The paper shows now this novel HBM3 RAS architecture can reduce the uncorrected memory error rate by 7X compared to HBM2 in future large-scale systems for assumed DRAM fault rates and modes. HBM3 also provides architected metadata to further enhance RAS or enable innovations in memory system design.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    1
    Citations
    NaN
    KQI
    []