SWALP : Stochastic Weight Averaging in Low-Precision Training.

Guandao Yang,Tianyi Zhang,Polina Kirichenko,Junwen Bai,Andrew Gordon Wilson,Christopher De Sa

SWALP : Stochastic Weight Averaging in Low-Precision Training.

2019

Guandao Yang
Tianyi Zhang
Polina Kirichenko
Junwen Bai
Andrew Gordon Wilson
Christopher De Sa

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

Keywords:

Quantization (physics)
Efficient energy use
Mathematical optimization
Scalability
Software portability
Iterated function
Hydraulic accumulator
Mathematics
Quadratic equation
Convex function
Machine learning
Pattern recognition
Artificial intelligence
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations