Adaptive Learning Rate Adjustment with Short-Term Pre-Training in Data-Parallel Deep Learning

Kazuki Yamada,Haruki Mori,Tetsuya Youkawa,Yuki Miyauchi,Shintaro Izumi,Masahiko Yoshimoto,Hiroshi Kawaguchi

Adaptive Learning Rate Adjustment with Short-Term Pre-Training in Data-Parallel Deep Learning

2018

This paper introduces a method to adaptively choose a learning rate (LR) with short-term pre-training (STPT). This is useful for quick model prototyping in data-parallel deep learning. For unknown models, it is necessary to tune numerous hyperparameters. The proposed method reduces computational time and increases efficiency in finding an appropriate LR; multiple LRs are evaluated by STPT in data-parallel deep learning. STPT means training only with the beginning iterations in an epoch. When eight LRs are evaluated using eight parallel workers, the proposed method can easily reduce the computational time by 87.5% in comparison with the conventional method. The accuracy is also improved by 4.8% in comparison with the conventional method with a reference LR of 0.1; thus, no deterioration in accuracy is observed. For an unknown model, this method shows a better training curve trend than other cases with fixed LRs.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations