BitAllocation: A Resource Allocation Algorithm For Fixed-Point Quantization

Adam Wei

January, 2021

Abstract

The growing size and computational complexity of state-of-the-art deep convolutional networks (DCNs) have greatly increased the memory, time, and power requirements of inference in many computer vision applications. Fixed-point quantization is an effective method that can alleviate some of these issues, but comes at the cost of reduced classification accuracy. In an attempt to minimize this accuracy degradation, we developed BitAllocation, a quantization pipeline that can aggressively compress DCNs to fixed-point data types without the need for retraining. Our key insight is to formulate quantization as a variation of the discrete resource allocation problem, where a budget of bits is to be allocated across the weights and activations in a way that minimizes the total quantization error. Although this problem is NP-hard, we develop a near linear time algorithm that solves it optimally for practical applications. Using this algorithm and no further retraining, we quantized 7 ImageNet DCNs to an average bitwidth of 5.5-6.25 bits with a 1-3% drop in top-1 accuracy. This corresponded to a 5.51x and 27.5x reduction in model size and cost of multiplications respectively. Although this paper presents an application in machine learning quantization, our algorithm can be used in other fields that involve resource allocation, such as economics, project management, and computer systems.

BitAllocation: A Resource Allocation Algorithm For Fixed-Point Quantization

Abstract

Adam Wei

EECS Ph.D. Candidate