Gradient Descent: The Ultimate Optimizer

Kartik Chandra,Erik Meijer,Samantha Andow,Emilio Arroyo-Fang,Irene Dea,Johann George,Melissa Grueter,Basil Hosmer,Steffi Stumpos,Alanna Tempest,Shannon Yang

Gradient Descent: The Ultimate Optimizer

2019

Kartik Chandra
Erik Meijer
Samantha Andow
Emilio Arroyo-Fang
Irene Dea
Johann George
Melissa Grueter
Basil Hosmer
Steffi Stumpos
Alanna Tempest
Shannon Yang

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introduce even more hyperparameters to control the hyperparameter optimization process. We propose to instead learn the hyperparameters themselves by gradient descent, and furthermore to learn the hyper-hyperparameters by gradient descent as well, and so on ad infinitum. As these towers of gradient-based optimizers grow, they become significantly less sensitive to the choice of top-level hyperparameters, hence decreasing the burden on the user to search for optimal values.

Keywords:

Mathematics
Gradient descent
Hyperparameter
Mathematical optimization
Hyperparameter optimization
Ad infinitum

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations