通過調整零點進行後訓練量化｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳志傑 Chen, Chih-Chieh
論文名稱：	通過調整零點進行後訓練量化 Post-Training Quantization by Adjusting Zero Points
指導教授：	張世杰 Chang, Shih-Chieh
口試委員:	何宗易 Ho, Tsung-Yi 謝明得 Shieh, Ming-Der
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	30
中文關鍵詞：	後訓練量化、混和精度、零點調整
外文關鍵詞：	post-training quantization, mixed precision, zero point
相關次數：	點閱：52 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

量化是一種常見的模型壓縮技術，後訓練量化指的是在不進行進一步訓練的
情況下對預訓練模型進行量化。在本論文中，我們提出了兩種新穎的後訓練量
化方法。首先，我們利用比較激活值的相似性進行混合精度量化。其次，我們
引入了一種有效的零點調整方法，進一步提高了量化模型的準確性。實驗結果
顯示，與之前的方法相比，我們的方法表現更優。在將ResNet-18模型壓縮到相
同大小的情況下，我們的方法提高了1.7%的準確率。同樣地，在ResNet-50模型
上，我們的方法提高了3%的準確率。這些結果突出了我們的方法在提高量化模
型準確性方面的有效性。

Quantization is a common technique for model compression, where post-training quantization refers to quantizing a pre-trained model without further training. In this paper, we propose two novel methods for post-training quantization. Firstly, we leverage to compare the similarity of output feature map (OFM) to perform mixed-precision quantization on the model. Lastly, we introduce an effective zero-point adjustment method to enhance quantized models’ accuracy further. The
experimental results demonstrate the superiority of our approach compared to previous work. In the case of compressing the ResNet-18 model to the same size, our method achieves a 1.7% higher accuracy. Similarly, for the ResNet-50 model, our approach achieves a 3% accuracy improvement. These results highlight the effectiveness of our methods in improving the accuracy of quantized models.

Contents
Acknowledgements (Chinese) I
Abstract (Chinese) III
Abstract IV
Contents V
List of Figures VII
List of Tables VIII
List of Algorithms IX
1 Introduction 1
2 Previous Works 4
2.1 Mixed Precsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Methodology 7
3.1 Mixed Precision-Output Feature Map Comparison . . . . . . . . . . 7
3.2 Zero Point Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Experiments 14
V
4.1 Calibration data for post-training quantization . . . . . . . . . . . . 14
4.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Experimental setting . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Conclusions 17
References 18

                                

[1] Ron Banner, Yury Nahshan, Elad Hoffer, and Daniel Soudry. Aciq: analytical
clipping for integer quantization of neural networks. 2018.
[2] Ron Banner, Yury Nahshan, and Daniel Soudry. Post training 4-bit quanti-
zation of convolutional networks for rapid-deployment. Advances in Neural
Information Processing Systems, 32, 2019.
[3] Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W Mahoney,
and Kurt Keutzer. Zeroq: A novel zero shot quantization framework. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 13169–13178, 2020.
[4] Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang,
Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parame-
terized clipping activation for quantized neural networks. arXiv preprint
arXiv:1805.06085, 2018.
[5] Yoni Choukroun, Eli Kravchik, Fan Yang, and Pavel Kisilev. Low-bit quan-
tization of neural networks for efficient inference. In 2019 IEEE/CVF Inter-
national Conference on Computer Vision Workshop (ICCVW), pages 3009–
3018. IEEE, 2019.
18
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual
learning for image recognition. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 770–778, 2016.
[7] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang,
Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization
and training of neural networks for efficient integer-arithmetic-only inference.
In Proceedings of the IEEE conference on computer vision and pattern recog-
nition, pages 2704–2713, 2018.
[8] Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, and Rui Fan.
Fully quantized network for object detection. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pages 2810–2819, 2019.
[9] Gil Shomron, Freddy Gabbay, Samer Kurzum, and Uri Weiser. Post-training
sparsity-aware quantization. Advances in Neural Information Processing Sys-
tems, 34:17737–17748, 2021.
[10] Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-
aware automated quantization with mixed precision. In Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition, pages
8612–8620, 2019.
[11] Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda,
and Kurt Keutzer. Mixed precision quantization of convnets via differentiable
neural architecture search. arXiv preprint arXiv:1812.00090, 2018.
[12] Haibao Yu, Tuopu Wen, Guangliang Cheng, Jiankai Sun, Qi Han, and Jian-
ping Shi. Low-bit quantization needs good distribution. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition Work-
shops, pages 680–681, 2020.
19
[13] Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, and Ian Reid.
Training quantized neural networks with a full-precision auxiliary module.
In Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pages 1488–1497, 2020.

簡易檢索 / 詳目顯示

相關論文