Parallel Online Temporal Difference Learning for Motor Control

Wouter Caarls,Erik Schuitema

Parallel Online Temporal Difference Learning for Motor Control

2016

Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of $20\times $ to $60\times $ , with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations