Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S. Du,Chi Jin,Jason D. Lee,Michael I. Jordan,Aarti Singh,Barnabás Póczos

Gradient Descent Can Take Exponential Time to Escape Saddle Points

2017

Simon S. Du
Chi Jin
Jason D. Lee
Michael I. Jordan
Aarti Singh
Barnabás Póczos

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points—it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

Keywords:

Mathematical optimization
Computer science
Almost surely
Applied mathematics
Saddle
Time complexity
Initialization
Gradient descent
Saddle point
Perturbation (astronomy)
Exponential function

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

130

Citations