Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

2018 
We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    77
    Citations
    NaN
    KQI
    []