AdaDefense: 利用執行期適配增進對攻擊例的穩健性

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳奕萱 Wu, Yi-Hsuan
論文名稱：	AdaDefense: 利用執行期適配增進對攻擊例的穩健性 AdaDefense: Improving Adversarial Robustness via Runtime Adaptation
指導教授：	吳尚鴻 Wu, Shan-Hung
口試委員:	李哲榮 Lee, Che-Rung 邱維辰 Chiu, Wei-Chen
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2019
畢業學年度：	108
語文別：	中文
論文頁數：	30
中文關鍵詞：	攻擊例、執行期適配、穩健性
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

深度類神經網路(Deep Neural Network)已經在多個不同的應用領域上達到極高的成就。然而，近期多個研究發現，深度類神經網路會被攻擊例(Adversarial Attack)攻擊。這些有一些人眼不可察覺擾動的攻擊例，會欺騙深度類神經網路以極高的自信程度將他分類成錯誤的類別。雖然對攻擊力的防禦方式被大量研究，目前仍沒有一個防禦方式是能夠完全抵禦任何攻擊例的。
近期有研究顯示因為我們以在高維空間中密度很低的資料學習高維度的流形(Manifold)，攻擊例的存在是無法避免的。為了增加資料密度，其中一個解決方式便是攻擊式訓練(Adversarial Training)。攻擊式訓練嘗試在每個訓練時期擾動訓練資料成為攻擊例，並藉這些攻擊例訓練模型以增加模型的穩健性(Robustness)。
即便使用攻擊式訓練，以目前的資料量學習完整的高維流形式仍是不切實際的，因此我們嘗試在執行期(Runtime)增加模型穩健性。在如攻擊式訓練的計算大量攻擊例之後，我們將模型適配(Adapt)於在特徵空間(Feature Space)中測試例的k個最鄰近的攻擊例，隨後再進行分類。進行適配可以運用測試例的資訊減小需要學習的流形區域，並且反學習(Unlearn)在訓練過程中模型無法避免的學到的有害圖樣(Pattern)。
在實驗中，我們在圖形分類領域證明，在不同的設定之下，我們的演算法都可以藉由執行期適配提升模型穩健性至最前端的表現。

Deep Neural Networks reach great achievements in multiple domains recently. However, it has been proven to be adversarial vulnerable. Adversarial examples which with imperceptible perturbations can fool the neural networks to make wrong prediction with high confidence.
Though it draws lots of attention to defending these adversarial examples, none of current defense strategies is satisfied yet. Recent study shown that adversarial vulnerability might unavoidable since learning high dimensional data manifold with low data density. To increase data density, adversarial training tries to generate more data by augmenting training data into adversarial examples in each training epochs and improve adversarial robustness by training neural network with these adversarial examples.
In additional to learning real world data manifold which is impossible in current dataset scale even with adversarial training techniques, we try to improve the adversarial robustness in runtime. After augmenting all training examples into adversarial examples as adversarial training does, our model adapts to K-nearest neighboring adversarial examples of test example in some feature space before making prediction. The adaptation can utilize the testing example information which minimize the need-to-learn region on the manifold and quickly unlearn harmful patterns which inevitably learnt during training.
In experiment, we prove that in image domain, via runtime adaptation to adversarial examples, our algorithm can improve model robustness to state-of-art performance in different settings.

Contents
Introduction    1
Related Work    3
Runtime Manifold Cleansing    5
1 Design    5
2 How RMC Works    6
Experiments    9
1 Robustness to White-Box Attacks    9
2 Robustness to Black-Box Attacks    12
3 More Experiments    13
defense-Aware Attacks    17
1 PGD-NN    17
2 PGD-Skip    18
Discussion and Future Work    21
Supplementary Materials    22
1 Attacking Algorithms Used in Experiments    22
2 Experiment Settings    23
References    27
                                

References
[1] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense
of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420,
2018. 1, 3
[2] Nicholas Carlini and DavidWagner. Adversarial examples are not easily detected: Bypassing ten
detection methods. In Proc. of the 10th ACM Workshop on Artificial Intelligence and Security,
2017. 1
[3] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In
Proc. of S&P, 2017. 1, 3, 9, 21, 22
[4] Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning (chapelle,
o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3), 2009. 4
[5] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval
networks: Improving robustness to adversarial examples. In Proc. of ICML. JMLR. org,
2017. 1
[6] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale
hierarchical image database. In Prof. of CVPR, 2009. 3
[7] Abhimanyu Dubey, Laurens van der Maaten, Zeki Yalniz, Yixuan Li, and Dhruv Mahajan. Defense
against adversarial images using web-scale nearest-neighbor search. In Proc. of CVPR,
2019. 1, 3, 9
[8] Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier.
In Prof. of NeurIPS, 2018. 1
[9] Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting adversarial
samples from artifacts. arXiv preprint arXiv:1703.00410, 2017. 1
[10] Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg,
and Ian Goodfellow. Adversarial spheres. arXiv preprint arXiv:1801.02774, 2018.
1
[11] Zhitao Gong, Wenlu Wang, and Wei-Shinn Ku. Adversarial and clean data are not twins. arXiv
preprint arXiv:1704.04960, 2017. 1
[12] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial
examples. arXiv preprint arXiv:1412.6572, 2014. 1, 3, 9, 21, 22
[13] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel.
On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.
1
[14] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial
examples. arXiv preprint arXiv:1412.5068, 2014. 1
[15] Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classifier
against adversarial manipulation. In Proc. of NIPS, pages 2266–2276, 2017. 1, 9
[16] Dan Hendrycks and Kevin Gimpel. Early methods for detecting adversarial images. arXiv
preprint arXiv:1608.00530, 2016. 1
[17] Daniel Jakubovitz and Raja Giryes. Improving dnn robustness to adversarial attacks using jacobian
regularization. In Proc. of ECCV, 2018. 1, 9
[18] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014. 9
[19] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.
Technical report, University of Toronto, 2009. 7, 8, 9
[20] Dmitry Krotov and John Hopfield. Dense associative memory is robust to adversarial inputs.
Neural computation, 30(12):3151–3167, 2018. 1
[21] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world.
arXiv preprint arXiv:1607.02533, 2016. 1, 3, 9, 22
[22] Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010. 9
[23] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine
learning research, 9(Nov):2579–2605, 2008. 8
[24] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.
Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083,
2017. 1, 3, 8, 9, 12, 17, 21, 22
[25] Saeed Mahloujifar, Dimitrios I Diochnos, and Mohammad Mahmoody. The curse of concentration
in robust learning: Evasion and poisoning attacks from concentration of measure. In Proc.
of AAAI, 2019. 1
[26] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial
perturbations. arXiv preprint arXiv:1702.04267, 2017. 1
[27] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple
and accurate method to fool deep neural networks. In Proc. of CVPR, 2016. 1
28
[28] Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey
Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan,
Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav
Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks,
Jonas Rauber, and Rujun Long. Technical report on the cleverhans v2.1.0 adversarial examples
library. arXiv preprint arXiv:1610.00768, 2018. 9, 10
[29] Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable
and robust deep learning. arXiv preprint arXiv:1803.04765, 2018. 6, 9
[30] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram
Swami. The limitations of deep learning in adversarial settings. In Proc. of EuroS&P,
2016. 1, 9, 23
[31] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation
as a defense to adversarial perturbations against deep neural networks. In Proc. of S&P, 2016. 1
[32] Andrew Slavin Ross and Finale Doshi-Velez. Improving the adversarial robustness and interpretability
of deep neural networks by regularizing their input gradients. In Proc. of AAAI, 2018.
1
[33] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry.
Adversarially robust generalization requires more data. In Prof. of NeurIPS, 2018. 1, 3
[34] Ali Shafahi, W Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. Are adversarial
examples inevitable? arXiv preprint arXiv:1809.02104, 2018. 1
[35] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,
and Rob Fergus. Intriguing properties of neural networks. In Proc. of ICLR, 2014.
1
[36] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and
Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint
arXiv:1705.07204, 2017. 12
[37] Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. Feature
denoising for improving adversarial robustness. In Proc. of CVPR, 2019. 6
[38] Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, and Anil Jain. Adversarial
attacks and defenses in images, graphs and text: A review. arXiv preprint arXiv:1909.08072,
2019. 3
29
[39] Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. Adversarial examples: Attacks and defenses
for deep learning. IEEE transactions on neural networks and learning systems, 30(9), 2019. 3
[40] Xiaojin Jerry Zhu. Semi-supervised learning literature survey. Technical report, University of
Wisconsin-Madison Department of Computer Sciences, 2005. 4

簡易檢索 / 詳目顯示

相關論文