QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks

Vladimir Kryzhanovskiy, Gleb Balitskiy, Nikolay Kozyrskiy, Aleksandr Zuruev

Результат исследования: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаярецензирование


Modern deep neural networks (DNNs) cannot be effectively used in mobile and embedded devices due to strict requirements for computational complexity, memory, and power consumption. The quantization of weights and feature maps (activations) is a popular approach to solve this problem. Training-aware quantization often shows excellent results but requires a full dataset, which is not always available. Post-training quantization methods, in turn, are applied without fine-tuning but still work well for many classes of tasks like classification, segmentation, and so on. However, they either imply a big overhead for quantization parameters (QPs) calculation at runtime (dynamic methods) or lead to an accuracy drop if pre-computed static QPs are used (static methods). Moreover, most inference frameworks don't support dynamic quantization. Thus we propose a novel quantization approach called QPP: quantization parameter prediction. With a small subset of a training dataset or unlabeled data from the same domain, we find the predictor that can accurately estimate QPs of activations given only the NN's input data. Such a predictor allows us to avoid complex calculation of precise values of QPs while maintaining the quality of the model. To illustrate our method's efficiency, we added QPP into two dynamic approaches: 1) Dense+Sparse quantization, where the predetermined percentage of activations are not quantized, 2) standard quantization with equal quantization steps. We provide experiments on a wide set of tasks including super-resolution, facial landmark, segmentation, and classification.

Язык оригиналаанглийский
Название основной публикацииProceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
ИздательIEEE Computer Society
Число страниц9
ISBN (электронное издание)9781665445092
СостояниеОпубликовано - 2021
Событие2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 - Virtual, Online, Соединенные Штаты Америки
Продолжительность: 19 июн 202125 июн 2021

Серия публикаций

НазваниеProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (печатное издание)1063-6919


Конференция2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
СтранаСоединенные Штаты Америки
ГородVirtual, Online

Предметные области OECD FOS+WOS



Подробные сведения о темах исследования «QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks». Вместе они формируют уникальный семантический отпечаток (fingerprint).