Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Dhruv Malik,Ashwin Pananjady,Kush Bhatia,Koulik Khamaru,Peter L. Bartlett,Martin J. Wainwright

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

2018

Dhruv Malik
Ashwin Pananjady
Kush Bhatia
Koulik Khamaru
Peter L. Bartlett
Martin J. Wainwright

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.

Keywords:

Rate of convergence
Mathematical optimization
Effective dimension
Applied mathematics
Quadratic equation
Multiplicative function
Mathematics
Curvature
Derivative (finance)
Polynomial
Optimization problem
Computer science
linear quadratic

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations