研究生: |
魏子軒 Wei,Tzu-Hsuan |
---|---|
論文名稱: |
藉由粗糙度感知更新緩解分佈偏移 Mitigate Distribution Shift with Roughness Aware Update |
指導教授: |
張世杰
Chang, Shih-Chieh |
口試委員: |
陳縕儂
Chen, Yun-Nung 何宗義 Ho, Tsung-Yi 張世杰 Chang, Shih-Chieh |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 26 |
中文關鍵詞: | 分布偏移 、泛化性 、模型優化 |
外文關鍵詞: | Distribution shift, Generalization, Optimization |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分布偏移是神經網絡訓練與部署的過程中常見的現象,此現象的發生對於神經網路的性能有很大影響。過去的文章中分別討論了兩種分布偏移的情況,首先是神經網路權重的情況,第二種則是輸入資料上的變化,並分別提出了解決這些問題的方法。而在此論文中,我將兩種分布偏移同時進行討論,最終我們提出了兩種神經網路訓練過程的優化技術,粗糙感知更新和梯度掩蔽,通過引導神經網路的訓練過程收斂到位於損失平面較為平坦的區域,改善神經網絡的泛化能力,藉此來減輕分布偏移對神經網路所帶來的影響。最後我們也用實驗結果來證明我們方法的有效性,並且可以將我們的方法與現有的改進技巧相結合,來更進一步提高神經網路的泛化能力,並且在各個數據集中取得更好的準確率。
Distribution shift is a common phenomenon in training and deploying neural networks, which largely affects the model performance. Previous works have considered two cases of distribution shift, either on model weights or input data, and have proposed methods that separately address these issues.
We propose two optimization techniques, Roughness-Aware Update and Gradient Masking, to mitigate the effect of distribution shift by improving the network generalization, via guiding the optimization to converge to solutions located in the flatter region of loss surface.
Our experiments on the corrupted image datasets and the simulated environment with noisy weights show that, when combining our techniques with the existing leading optimization methods, we can further improve the generalization of the model solution and achieve even better performance.
[1] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le. Randaugment: Practical automated dataaugmentation with a reduced search space. InProceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition Workshops, pages 702–703, 2020.
[2] R. Geirhos, D. H. J. Janssen, H. H. Sch ̈utt, J. Rauber, M. Bethge, and F. A. Wichmann.Comparing deep neural networks against humans: object recognition when the signal getsweaker.arXiv:1706.06969, 2017.
[3] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014.
[4] D. Hendrycks, S. Basart, N. Mu, S. Kadavath, F. Wang, E. Dorundo, R. Desai, T. Zhu,S. Parajuli, M. Guo, D. Song, J. Steinhardt, and J. Gilmer. The many faces of robustness: Acritical analysis of out-of-distribution generalization.arXiv:2006.16241, 2020.
[5] D. Hendrycks and T. G. Dietterich. Benchmarking neural network robustness to commoncorruptions and surface variations.arXiv:1807.01697, 2018.
[6] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan. Augmix:A simple data processing method to improve robustness and uncertainty.arXiv:1912.02781,2019.
[7] H. Hosseini, B. Xiao, and R. Poovendran. Google’s cloud vision api is not robust to noise. In2017 16th IEEE international conference on machine learning and applications (ICMLA),pages 101–105. IEEE, 2017.
[8] P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson. Averaging weightsleads to wider optima and better generalization.arXiv:1803.05407, 2018.
[9] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. On large-batchtraining for deep learning: Generalization gap and sharp minima.arXiv:1609.04836, 2016.
[10] H. Kim, J.-H. Bae, S. Lim, S.-T. Lee, Y.-T. Seo, D. Kwon, B.-G. Park, and J.-H. Lee. Effi-cient precise weight tuning protocol considering variation of the synaptic devices and targetaccuracy.Neurocomputing, 378:189–196, 2020.
[11] N.Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitationsof deep learning in adversarial settings. In2016 IEEE European symposium on security andprivacy (EuroS&P), pages 372–387. IEEE, 2016.
[12] M. Qin and D. Vucinic. Training recurrent neural networks against noisy computationsduring inference. In2018 52nd Asilomar Conference on Signals, Systems, and Computers,pages 71–75, 2018.
[13] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S.Williams, and V. Srikumar. Isaac: A convolutional neural network accelerator with in-situanalog arithmetic in crossbars. In2016 ACM/IEEE 43rd Annual International Symposiumon Computer Architecture (ISCA), pages 14–26, 2016.
[14] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus.Intriguing properties of neural networks.arXiv:1312.6199, 2013.
[15] L.-H. Tsai, S.-C. Chang, Y.-T. Chen, J.-Y. Pan, W. Wei, and D.-C. Juan. Robust processing-in-memory neural networks via noise-aware normalization.arXiv:2007.03230, 2020.
[16] M. R. Zhang, J. Lucas, G. Hinton, and J. Ba. Lookahead optimizer: k steps forward, 1 stepback.arXiv:1907.08610, 2019.
[17] C. Zhou, P. Kadambi, M. Mattina, and P. N. Whatmough. Noisy machines: Understandingnoisy neural networks and enhancing robustness to analog hardware errors using distillation.arXiv:2001.04974, 2020.