Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix.

Sébastien M. R. Arnold,Chunming Wang

Accelerating SGD for Distributed Deep-Learning Using an Approximted Hessian Matrix.

2017

Sébastien M. R. Arnold
Chunming Wang

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients provide further information on the loss surface.

Keywords:

Mathematical optimization
Deep learning
Computer science
Stochastic optimization
Inverse
Hessian matrix
Artificial intelligence
Mathematics
distributed approximation

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations