Heterogeneous Bitwidth Binarization In Convolutional Neural Networks

Authors:
Joshua Fromm University of Washington
Shwetak Patel University of Washington
Matthai Philipose Microsoft Research

Introduction:

Recent work has shown that fast, compact low-bitwidth neural networks canbe surprisingly accurate.However, modern hardware allows efficient designs whereeach arithmetic instruction can have a custom bitwidth, motivating heterogeneousbinarization, where every parameter in the network may have a different bitwidth.In this paper, the authors show that it is feasible and useful to select bitwidths at theparameter granularity during training.

Abstract:

Recent work has shown that fast, compact low-bitwidth neural networks canbe surprisingly accurate. These networks use homogeneous binarization: allparameters in each layer or (more commonly) the whole model have the same lowbitwidth (e.g., 2 bits). However, modern hardware allows efficient designs whereeach arithmetic instruction can have a custom bitwidth, motivating heterogeneousbinarization, where every parameter in the network may have a different bitwidth.In this paper, we show that it is feasible and useful to select bitwidths at theparameter granularity during training. For instance a heterogeneously quantizedversion of modern networks such as AlexNet and MobileNet, with the right mixof 1-, 2- and 3-bit parameters that average to just 1.4 bits can equal the accuracyof homogeneous 2-bit versions of these networks. Further, we provide analysesto show that the heterogeneously binarized systems yield FPGA- and ASIC-basedimplementations that are correspondingly more efficient in both circuit area andenergy efficiency than their homogeneous counterparts.

You may want to know: