研究生: |
陳志傑 Chen, Chih-Chieh |
---|---|
論文名稱: |
通過調整零點進行後訓練量化 Post-Training Quantization by Adjusting Zero Points |
指導教授: |
張世杰
Chang, Shih-Chieh |
口試委員: |
何宗易
Ho, Tsung-Yi 謝明得 Shieh, Ming-Der |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 30 |
中文關鍵詞: | 後訓練量化 、混和精度 、零點調整 |
外文關鍵詞: | post-training quantization, mixed precision, zero point |
相關次數: | 點閱:52 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
量化是一種常見的模型壓縮技術,後訓練量化指的是在不進行進一步訓練的
情況下對預訓練模型進行量化。在本論文中,我們提出了兩種新穎的後訓練量
化方法。首先,我們利用比較激活值的相似性進行混合精度量化。其次,我們
引入了一種有效的零點調整方法,進一步提高了量化模型的準確性。實驗結果
顯示,與之前的方法相比,我們的方法表現更優。在將ResNet-18模型壓縮到相
同大小的情況下,我們的方法提高了1.7%的準確率。同樣地,在ResNet-50模型
上,我們的方法提高了3%的準確率。這些結果突出了我們的方法在提高量化模
型準確性方面的有效性。
Quantization is a common technique for model compression, where post-training quantization refers to quantizing a pre-trained model without further training. In this paper, we propose two novel methods for post-training quantization. Firstly, we leverage to compare the similarity of output feature map (OFM) to perform mixed-precision quantization on the model. Lastly, we introduce an effective zero-point adjustment method to enhance quantized models’ accuracy further. The
experimental results demonstrate the superiority of our approach compared to previous work. In the case of compressing the ResNet-18 model to the same size, our method achieves a 1.7% higher accuracy. Similarly, for the ResNet-50 model, our approach achieves a 3% accuracy improvement. These results highlight the effectiveness of our methods in improving the accuracy of quantized models.
[1] Ron Banner, Yury Nahshan, Elad Hoffer, and Daniel Soudry. Aciq: analytical
clipping for integer quantization of neural networks. 2018.
[2] Ron Banner, Yury Nahshan, and Daniel Soudry. Post training 4-bit quanti-
zation of convolutional networks for rapid-deployment. Advances in Neural
Information Processing Systems, 32, 2019.
[3] Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W Mahoney,
and Kurt Keutzer. Zeroq: A novel zero shot quantization framework. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 13169–13178, 2020.
[4] Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang,
Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parame-
terized clipping activation for quantized neural networks. arXiv preprint
arXiv:1805.06085, 2018.
[5] Yoni Choukroun, Eli Kravchik, Fan Yang, and Pavel Kisilev. Low-bit quan-
tization of neural networks for efficient inference. In 2019 IEEE/CVF Inter-
national Conference on Computer Vision Workshop (ICCVW), pages 3009–
3018. IEEE, 2019.
18
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual
learning for image recognition. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 770–778, 2016.
[7] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang,
Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization
and training of neural networks for efficient integer-arithmetic-only inference.
In Proceedings of the IEEE conference on computer vision and pattern recog-
nition, pages 2704–2713, 2018.
[8] Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, and Rui Fan.
Fully quantized network for object detection. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pages 2810–2819, 2019.
[9] Gil Shomron, Freddy Gabbay, Samer Kurzum, and Uri Weiser. Post-training
sparsity-aware quantization. Advances in Neural Information Processing Sys-
tems, 34:17737–17748, 2021.
[10] Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-
aware automated quantization with mixed precision. In Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition, pages
8612–8620, 2019.
[11] Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda,
and Kurt Keutzer. Mixed precision quantization of convnets via differentiable
neural architecture search. arXiv preprint arXiv:1812.00090, 2018.
[12] Haibao Yu, Tuopu Wen, Guangliang Cheng, Jiankai Sun, Qi Han, and Jian-
ping Shi. Low-bit quantization needs good distribution. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition Work-
shops, pages 680–681, 2020.
19
[13] Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, and Ian Reid.
Training quantized neural networks with a full-precision auxiliary module.
In Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pages 1488–1497, 2020.