研究生: |
徐郁閎 Hsu, Yu-Hung |
---|---|
論文名稱: |
遷移學習於小資料集上強健性的實證研究 Empirical Study of The Robustness in Transfer Learning on Small Dataset |
指導教授: |
吳尚鴻
Wu, Shan-Hung |
口試委員: |
邱維辰
Chiu, Wei-Chen 劉奕汶 Liu, Yi-Wen 沈之涯 Shen, Chih-Ya |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2023 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 31 |
中文關鍵詞: | 遷移學習 、強健性 、神經正切核 、黑箱攻擊 |
外文關鍵詞: | Transfer Learning, Robustness, Neural Tangent Kernel, Black-box Attack |
相關次數: | 點閱:56 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,深度神經網絡取得了顯著的進展,但也造成了更大更深的架構,導致在數據有限且訓練資源不足的情況難易使用。而為了解決這些問題,基礎模型和遷移學習的概念尤其重要。
遷移學習利用廣泛預訓練模型的知識來增強在較小目標數據集上的性能,然而,在強健性方面卻仍存在問題,特別是對抗性攻擊。與此同時,傳統的對抗訓練(Adversarial Training,AT)耗時且不適用於小數據集,在遷移學習中追求強健性通常導致在有限資料量上過擬合。
本文目標解決對於在小資料集上使用乾預訓練模型進行遷移學習的理解不足。我們引入了神經正切核(Neural Tangent Kernels,NTK)的概念,這是一種描述神經網絡在其寬度趨近無限時的理論,過去研究顯示出NTK在小數據集性能方面具有潛力。我們因此提出了NTK-attack,作為一種新型的黑盒對抗攻擊,無需訓練即可產生攻擊。
我們進行了多個實驗來研究NTK-attack在遷移學習中的應用,並試圖解釋NTK-attack對於提升遷移學習強健性無法奏效的原因。
In recent years, deep neural networks have achieved remarkable progress but with the trade-off of larger and deeper architectures, which pose challenges in scenarios with limited data and high training costs. To address these issues, the concepts of foundation models and transfer learning have become crucial.
Transfer learning harnesses the knowledge from extensively pre-trained models to enhance performance on smaller target datasets, but it struggles with ensur-ing robustness, particularly against adversarial attacks. Conventional Adversarial Training (AT) is costly and unsuitable for small datasets, and the quest for ro-bustness often leads to overfitting on limited data.
This paper aims to address the gap in understanding the transfer learning of clean pre-trained models on limited target datasets. We introduces Neural Tan-gent Kernel (NTK), a theory describing neural networks as their widths approach infinity, which has shown promise in improving small dataset performance. The paper presents NTK-attack, a novel black-box adversarial attack that eliminates the need for training a substitute model.
We conduct multiple experimental to study NTK-attack in transfer learning, and try to answer the reason of helplessness of NTK-attack in transfer learning.
[1] Sanjeev Arora et al. “Harnessing the power of infinitely wide deep nets on small-data tasks”. In: arXiv preprint arXiv:1910.01663 (2019).
[2] Battista Biggio et al. “Evasion attacks against machine learning at test time”. In: Machine Learning and Knowledge Discovery in Databases: Eu-ropean Conference, ECML PKDD 2013, Prague, Czech Republic, September
23- 27, 2013, Proceedings, Part III 13. Springer. 2013, pp. 387–402.
[3] Ting-Wu Chin, Cha Zhang, and Diana Marculescu. “Renofeation: A simple transfer learning method for improved adversarial robustness”. In: Proceed-ings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-nition. 2021, pp. 3243–3252.
[4] Lenaic Chizat, Edouard Oyallon, and Francis Bach. “On lazy training in differentiable programming”. In: Advances in neural information processing systems 32 (2019).
[5] Jia Deng et al. “ImageNet: A large-scale hierarchical image database”. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848.
[6] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples”. In: arXiv preprint arXiv:1412.6572 (2014).
[7] Pengyue Hou et al. “Adversarial Fine-tune with Dynamically Regulated Adversary”. In: 2022 International Joint Conference on Neural Networks (IJCNN). IEEE. 2022, pp. 01–08.
[8] Arthur Jacot, Franck Gabriel, and Cl´ement Hongler. “Neural tangent kernel: Convergence and generalization in neural networks”. In: Advances in neural information processing systems 31 (2018).
[9] Ahmadreza Jeddi, Mohammad Javad Shafiee, and Alexander Wong. “A sim-ple fine-tuning is all you need: Towards robust deep learning via adversarial fine-tuning”. In: arXiv preprint arXiv:2012.13628 (2020).
[10] Alex Krizhevsky, Geoffrey Hinton, et al. “Learning multiple layers of features from tiny images”. In: (2009).
[11] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. “Adversarial machine learning at scale”. In: arXiv preprint arXiv:1611.01236 (2016).
[12] Jaehoon Lee et al. “Finite versus infinite neural networks: an empirical study”. In: Advances in Neural Information Processing Systems 33 (2020), pp. 15156–15172.
[13] Jaehoon Lee et al. “Wide neural networks of any depth evolve as linear mod-els under gradient descent”. In: Advances in neural information processing systems 32 (2019).
[14] Xingjian Li et al. “Delta: Deep learning transfer using feature map with attention for convolutional networks”. In: arXiv preprint arXiv:1901.09229
(2019).
[15] Zhizhong Li and Derek Hoiem. “Learning without forgetting”. In: IEEE transactions on pattern analysis and machine intelligence 40.12 (2017), pp. 2935–2947.
[16] Aleksander Madry et al. “Towards deep learning models resistant to adver-sarial attacks”. In: arXiv preprint arXiv:1706.06083 (2017).
[17] Yuval Netzer et al. “Reading digits in natural images with unsupervised feature learning”. In: (2011).
[18] Sinno Jialin Pan and Qiang Yang. “A survey on transfer learning”. In: IEEE Transactions on knowledge and data engineering 22.10 (2009), pp. 1345–1359.
[19] Shahbaz Rezaei and Xin Liu. “A target-agnostic attack on deep models: Exploiting security vulnerabilities of transfer learning”. In: arXiv preprint arXiv:1904.04334 (2019).
[20] Leslie Rice, Eric Wong, and Zico Kolter. “Overfitting in adversarially robust deep learning”. In: International Conference on Machine Learning. PMLR. 2020, pp. 8093–8104.
[21] Ludwig Schmidt et al. “Adversarially robust generalization requires more data”. In: Advances in neural information processing systems 31 (2018).
[22] Ali Shafahi et al. “Adversarial training for free!” In: Advances in Neural Information Processing Systems 32 (2019).
[23] Ali Shafahi et al. “Adversarially robust transfer learning”. In: arXiv preprint arXiv:1905.08232 (2019).
[24] Vaishaal Shankar et al. “Neural kernels without tangents”. In: International conference on machine learning. PMLR. 2020, pp. 8614–8623.
[25] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014).
[26] Jascha Sohl-Dickstein et al. “On the infinite width limit of neural networks with a standard parameterization”. In: arXiv preprint arXiv:2001.07301 (2020).
[27] Christian Szegedy et al. “Intriguing properties of neural networks”. In: arXiv preprint arXiv:1312.6199 (2013).
[28] Jason Yosinski et al. “How transferable are features in deep neural net-works?” In: Advances in neural information processing systems 27 (2014).