InterAdam: Interpolating Dull Intervention to Adaptive Gradient Method

2021 
Most state-of-the-art first-order optimization methods, such as AMSGrad, SWATS, and AdaBound, adapt constant learning rates at the final training stage to achieve fast convergence. However, the generalizability of these methods can hardly outperform that of Adam. In this paper, we first demonstrate that these methods will encounter difficulty to escape from the neighborhood of the saddle points during the optimization process due to the SGD-like strategy, thereby resulting in generalizability problems. Our further analysis suggests that this issue can be addressed by a weaker constraint on the learning rates. Based on these findings, we proposed Inter Adam, which is a new variant of Adam with non-constant learning rates at the final training stage. Experimental results show that InterAdam outperforms SWATS-like methods, especially in escaping the saddle points and alleviating overfitting. Meanwhile, Inter Adam can maintain better convergence and robustness campared to Adam.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []