A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks

2022 
rate when the width is near-linear in the depth of the network, where is the number of iterations and is a constant depending on the condition number of the feature matrix. Compared to the
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []