Efficient Full-Matrix Adaptive Regularization.

Naman Agarwal,Brian Bullins,Xinyi Chen,Elad Hazan,Karan Singh,Cyril Zhang,Yi Zhang

Efficient Full-Matrix Adaptive Regularization.

2020

Naman Agarwal
Brian Bullins
Xinyi Chen
Elad Hazan
Karan Singh
Cyril Zhang
Yi Zhang

Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.

Keywords:

full matrix
Deep learning
Convergence (routing)
adaptive regularization
Computer science
Artificial intelligence
Descent direction
Matrix (mathematics)
Fast inverse square root
Computation
Algorithm

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations