Provable convergence of Nesterov’s accelerated gradient method for over-parameterized neural networks

2022 
, where is the condition number of a gram matrix and is the number of the iterations. Compared to the convergence rate of GD, our result provides theoretical guarantees for the acceleration of NAG in neural network training. Furthermore, our findings suggest that NAG and HB have similar convergence rate. Finally, extensive experiments on six benchmark datasets have been conducted to validate the correctness of our theoretical results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []