A Many-Core Accelerator Design for On-Chip Deep Reinforcement Learning

Ying Wang,Mengdi Wang,Bing Li,Huawei Li,Xiaowei Li

A Many-Core Accelerator Design for On-Chip Deep Reinforcement Learning

2020

Ying Wang
Mengdi Wang
Bing Li
Huawei Li
Xiaowei Li

Deep Reinforcement Learning (DRL) is substantially resource-consuming, and it requires large-scale distributed computing-nodes to learn complicated tasks, like video-game and Go play. This work attempts to down-scale a distributed DRL system into a specialized many-core chip and achieve energy-efficient on-chip DRL. With the customized Network-on-Chip that handles the communication of on-chip data and control-signals, we proposed a Synchronous Asynchronous RL Architecture (SARLA) and the according many-core chip that completely avoids the unnecessary data duplication and synchronization activities in multi-node RL systems. In evaluation, the SARLA system achieves considerable energy-efficiency boost over the GPU-based implementations for typical DRL workloads built with OpenAI-gym.

Keywords:

Chip
Network on a chip
Reinforcement learning
Synchronization
Computer architecture
Implementation
Asynchronous communication
Computer science
Data deduplication
Architecture

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations