Faster Policy Learning with Continuous-Time Gradients.

Samuel K. Ainsworth,Kendall Lowrey,John Thickstun,Zaid Harchaoui,Siddhartha S. Srinivasa

Faster Policy Learning with Continuous-Time Gradients.

2020

Samuel K. Ainsworth
Kendall Lowrey
John Thickstun
Zaid Harchaoui
Siddhartha S. Srinivasa

We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.

Keywords:

construct
robust learning
Policy learning
Discretization
Estimator
Mathematical optimization
Contrast (statistics)
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations