Annealed Gradient Descent for Deep Learning

Hengyue Pan,Xin Niu,Rongchun Li,Yong Dou,Hui Jiang

Annealed Gradient Descent for Deep Learning

2019

Abstract In this paper, we propose a novel annealed gradient descent (AGD) algorithm for deep learning. AGD optimizes a sequence of gradually improving smoother mosaic functions that approximate the original non-convex objective function according to an annealing schedule during optimization process. We present a theoretical analysis on AGD’s convergence properties and learning speed, as well as use some visualization methods to show its advantages. The proposed AGD algorithm is applied to learn both deep neural networks (DNNs) and Convolutional Neural Networks (CNNs) for variety of tasks includes image recognition and speech recognition. Experimental results on several widely-used databases, such as Switchboard, CIFAR-10 and Pascal VOC 2012, show that AGD yields better classification accuracy than SGD, and obviously accelerates the training speed of DNNs and CNNs.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations