Deep Reinforcement Learning with Dual Targeting Algorithm

Naoki Kodama,Taku Harada,Kazuteru Miyazaki

Deep Reinforcement Learning with Dual Targeting Algorithm

2019

Recently, deep reinforcement learning using the Deep Q-networks (DQN) algorithm has attracted attention, and extended methods continue to improve its learning performance. A multi-step DQN using an n-step TD method in the extended method contributes to faster learning. However, in the n-step TD methods, improvement in learning speed is better when using the intermediate prediction over the long-term prediction. Therefore, to further accelerate learning, methods that can use a long-term prediction effectively are required. A learning-accelerated DQN learns faster than DQN through a training neural network with bootstrap targets up to the next positive reward and 1-step bootstrap targets. It is, however, not possible for that method to use long-term prediction for tasks in which rewards are continuously observed. Furthermore, the use of two independent updates leads to instability with respect to the convergence of the neural network. We therefore propose a dual targeting algorithm that uses a single update with bootstrap targets up to the last reward in the next consecutive positive reward and 1-step bootstrap targets. The aim of the proposed method is to reduce instability in the convergence of the neural network by calculating the dual target from the same sampled experience. We apply the proposed method to a few classic control problems involving OpenAI Gym, compare it with DQN and multi-step DQN, and verify its effectiveness.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations