Quantization noise in low bit quantization and iterative adaptation to quantization noise in quantizable neural networks

D. Chudakov, A. Goncharenko, S. Alyamkin, A. Densidov

Результат исследования: Научные публикации в периодических изданияхстатья по материалам конференциирецензирование

Аннотация

Quantization is one of the most popular and widely used methods of speeding up a neural network. At the moment, the standard is 8-bit uniform quantization. Nevertheless, the use of uniform low-bit quantization (4- and 6-bit quantization) has significant advantages in speed and resource requirements for inference. We present our quantization algorithm that offers advantages when using uniform low-bit quantization. It is faster than quantization-aware training from scratch and more accurate than methods aimed only at selecting thresholds and reducing noise from quantization. We also investigated quantization noise in neural networks for low-bit quantization and concluded that quantization noise is not always a good metric for quantization quality.

Язык оригиналаанглийский
Номер статьи012004
ЖурналJournal of Physics: Conference Series
Том2134
Номер выпуска1
DOI
СостояниеОпубликовано - 20 дек 2021
Событие8th International Young Scientists Conference on Information Technologies, Telecommunications and Control Systems, ITTCS 2021 - Innopolis, Российская Федерация
Продолжительность: 16 дек 202117 дек 2021

Предметные области OECD FOS+WOS

  • 1.03 ФИЗИЧЕСКИЕ НАУКИ И АСТРОНОМИЯ

Fingerprint

Подробные сведения о темах исследования «Quantization noise in low bit quantization and iterative adaptation to quantization noise in quantizable neural networks». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать