簡易檢索 / 詳目顯示

研究生: 陳志傑
Chen, Chih-Chieh
論文名稱: 通過調整零點進行後訓練量化
Post-Training Quantization by Adjusting Zero Points
指導教授: 張世杰
Chang, Shih-Chieh
口試委員: 何宗易
Ho, Tsung-Yi
謝明得
Shieh, Ming-Der
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 30
中文關鍵詞: 後訓練量化混和精度零點調整
外文關鍵詞: post-training quantization, mixed precision, zero point
相關次數: 點閱:52下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 量化是一種常見的模型壓縮技術,後訓練量化指的是在不進行進一步訓練的
    情況下對預訓練模型進行量化。在本論文中,我們提出了兩種新穎的後訓練量
    化方法。首先,我們利用比較激活值的相似性進行混合精度量化。其次,我們
    引入了一種有效的零點調整方法,進一步提高了量化模型的準確性。實驗結果
    顯示,與之前的方法相比,我們的方法表現更優。在將ResNet-18模型壓縮到相
    同大小的情況下,我們的方法提高了1.7%的準確率。同樣地,在ResNet-50模型
    上,我們的方法提高了3%的準確率。這些結果突出了我們的方法在提高量化模
    型準確性方面的有效性。


    Quantization is a common technique for model compression, where post-training quantization refers to quantizing a pre-trained model without further training. In this paper, we propose two novel methods for post-training quantization. Firstly, we leverage to compare the similarity of output feature map (OFM) to perform mixed-precision quantization on the model. Lastly, we introduce an effective zero-point adjustment method to enhance quantized models’ accuracy further. The
    experimental results demonstrate the superiority of our approach compared to previous work. In the case of compressing the ResNet-18 model to the same size, our method achieves a 1.7% higher accuracy. Similarly, for the ResNet-50 model, our approach achieves a 3% accuracy improvement. These results highlight the effectiveness of our methods in improving the accuracy of quantized models.

    Contents Acknowledgements (Chinese) I Abstract (Chinese) III Abstract IV Contents V List of Figures VII List of Tables VIII List of Algorithms IX 1 Introduction 1 2 Previous Works 4 2.1 Mixed Precsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Methodology 7 3.1 Mixed Precision-Output Feature Map Comparison . . . . . . . . . . 7 3.2 Zero Point Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . 9 4 Experiments 14 V 4.1 Calibration data for post-training quantization . . . . . . . . . . . . 14 4.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Experimental setting . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5 Conclusions 17 References 18

    [1] Ron Banner, Yury Nahshan, Elad Hoffer, and Daniel Soudry. Aciq: analytical
    clipping for integer quantization of neural networks. 2018.
    [2] Ron Banner, Yury Nahshan, and Daniel Soudry. Post training 4-bit quanti-
    zation of convolutional networks for rapid-deployment. Advances in Neural
    Information Processing Systems, 32, 2019.
    [3] Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W Mahoney,
    and Kurt Keutzer. Zeroq: A novel zero shot quantization framework. In
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
    Recognition, pages 13169–13178, 2020.
    [4] Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang,
    Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parame-
    terized clipping activation for quantized neural networks. arXiv preprint
    arXiv:1805.06085, 2018.
    [5] Yoni Choukroun, Eli Kravchik, Fan Yang, and Pavel Kisilev. Low-bit quan-
    tization of neural networks for efficient inference. In 2019 IEEE/CVF Inter-
    national Conference on Computer Vision Workshop (ICCVW), pages 3009–
    3018. IEEE, 2019.
    18
    [6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual
    learning for image recognition. In Proceedings of the IEEE conference on
    computer vision and pattern recognition, pages 770–778, 2016.
    [7] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang,
    Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization
    and training of neural networks for efficient integer-arithmetic-only inference.
    In Proceedings of the IEEE conference on computer vision and pattern recog-
    nition, pages 2704–2713, 2018.
    [8] Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, and Rui Fan.
    Fully quantized network for object detection. In Proceedings of the IEEE/CVF
    conference on computer vision and pattern recognition, pages 2810–2819, 2019.
    [9] Gil Shomron, Freddy Gabbay, Samer Kurzum, and Uri Weiser. Post-training
    sparsity-aware quantization. Advances in Neural Information Processing Sys-
    tems, 34:17737–17748, 2021.
    [10] Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-
    aware automated quantization with mixed precision. In Proceedings of the
    IEEE/CVF conference on computer vision and pattern recognition, pages
    8612–8620, 2019.
    [11] Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda,
    and Kurt Keutzer. Mixed precision quantization of convnets via differentiable
    neural architecture search. arXiv preprint arXiv:1812.00090, 2018.
    [12] Haibao Yu, Tuopu Wen, Guangliang Cheng, Jiankai Sun, Qi Han, and Jian-
    ping Shi. Low-bit quantization needs good distribution. In Proceedings of the
    IEEE/CVF Conference on Computer Vision and Pattern Recognition Work-
    shops, pages 680–681, 2020.
    19
    [13] Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, and Ian Reid.
    Training quantized neural networks with a full-precision auxiliary module.
    In Proceedings of the IEEE/CVF conference on computer vision and pattern
    recognition, pages 1488–1497, 2020.

    QR CODE