Proximal Deterministic Policy Gradient

Marco Maggipinto,Gian Antonio Susto,Pratik Chaudhari

Proximal Deterministic Policy Gradient

2020

Marco Maggipinto
Gian Antonio Susto
Pratik Chaudhari

This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of optimization and the value network computes the proximal operator. Second, we exploits the two value functions commonly employed in state-of-the-art off-policy algorithms to provide an improved action value estimate through bootstrapping with limited increase of computational resources. Further, we demonstrate significant performance improvement over state-of-the-art algorithms on standard continuous-control RL benchmarks.

Keywords:

Reinforcement learning
Mathematical optimization
Computer science
proximal point
Operator (computer programming)
Performance improvement
Bootstrapping
Value network

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations