A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks
2022
rate when the width is near-linear in the depth of the network, where is the number of iterations and is a constant depending on the condition number of the feature matrix. Compared to the
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI