A Many-Core Accelerator Design for On-Chip Deep Reinforcement Learning
2020
Deep Reinforcement Learning (DRL) is substantially resource-consuming, and it requires large-scale distributed computing-nodes to learn complicated tasks, like video-game and Go play. This work attempts to down-scale a distributed DRL system into a specialized many-core chip and achieve energy-efficient on-chip DRL. With the customized Network-on-Chip that handles the communication of on-chip data and control-signals, we proposed a Synchronous Asynchronous RL Architecture (SARLA) and the according many-core chip that completely avoids the unnecessary data duplication and synchronization activities in multi-node RL systems. In evaluation, the SARLA system achieves considerable energy-efficiency boost over the GPU-based implementations for typical DRL workloads built with OpenAI-gym.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
23
References
1
Citations
NaN
KQI