OmniDRL: A 29.3 TFLOPS/W Deep Reinforcement Learning Processor with Dualmode Weight Compression and On-chip Sparse Weight Transposer

Juhyoung Lee,Sangyeob Kim,Sangjin Kim,Wooyoung Jo,Donghyeon Han,Jinsu Lee,Hoi-Jun Yoo

OmniDRL: A 29.3 TFLOPS/W Deep Reinforcement Learning Processor with Dualmode Weight Compression and On-chip Sparse Weight Transposer

2021

Juhyoung Lee
Sangyeob Kim
Sangjin Kim
Wooyoung Jo
Donghyeon Han
Jinsu Lee
Hoi-Jun Yoo

This paper presents OmniDRL, a 4.18 TFLOPS and 29.3 TFLOPS/W DRL processor. A group-sparse training core and exponent mean delta encoding are proposed to enable weight and feature map compression for every iteration of DRL training. A sparse weight transposer enables on-chip transpose of compressed weight for reducing external memory access. The processor fabricated in 28 nm CMOS technology and occupies 3.6×3.6 mm2 die area. It achieved 7.16 TFLOPS/W energy efficiency for training robot agent (Mujoco Halfcheetah, TD3), which is 2.4× higher than the previous state-of-the-art.

Keywords:

System on a chip
Very-large-scale integration
Encoding (memory)
Transposer
Auxiliary memory
CMOS
Transpose
Delta encoding
Computational science
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations