Training Deep Models Faster With Robust, Approximate Importance Sampling

Authors:
Tyler Johnson University of Washington
Carlos Guestrin University of Washington

Introduction:

In theory, importance sampling speeds up stochastic gradient algorithms for supervised learning by prioritizing training examples.The authors propose a robust, approximate importance sampling procedure (RAIS) for stochastic gradient de- scent.

Abstract:

In theory, importance sampling speeds up stochastic gradient algorithms for supervised learning by prioritizing training examples. In practice, the cost of computing importances greatly limits the impact of importance sampling. We propose a robust, approximate importance sampling procedure (RAIS) for stochastic gradient de- scent. By approximating the ideal sampling distribution using robust optimization, RAIS provides much of the benefit of exact importance sampling with drastically reduced overhead. Empirically, we find RAIS-SGD and standard SGD follow similar learning curves, but RAIS moves faster through these paths, achieving speed-ups of at least 20% and sometimes much more.

You may want to know: