Evolved Policy Gradients

Rein Houthooft,Yuhua Chen,Phillip Isola,Bradly C. Stadie,Filip Wolski,OpenAI Jonathan Ho,Pieter Abbeel

Evolved Policy Gradients

2018

Rein Houthooft
Yuhua Chen
Phillip Isola
Bradly C. Stadie
Filip Wolski
OpenAI Jonathan Ho
Pieter Abbeel

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG's learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

Keywords:

Reinforcement learning
Parametrization
Mathematical optimization
Computer science
Convolution
Gradient method
Differentiable function
Metalearning
task learning
Machine learning
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations