簡易檢索 / 詳目顯示

研究生: 梁肇宏
Liang, Jhao-Hong
論文名稱: 引導式互補熵之增強神經網路架構於對抗性樣本防護性
Improving Model Robustness on Adversarial Examples via Guided Complement Entropy
指導教授: 張世杰
Chang, Shih-Chieh
口試委員: 陳煥宗
Chen, Hwann-Tzong
陳永昇
Chen, Yong-Sheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 32
中文關鍵詞: 神經網路機器學習對抗式樣本
外文關鍵詞: Neural network, Machine learning, Adversarial example
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前神經網路對於對抗式樣本的辨別能力是十分不足的,透過對原圖片特徵加上些微的擾動而產生的對抗式樣本可造成模型辨別準確率嚴重的降低,因此現今對於神經網路架構在對抗式樣本的防護是非常重要的議題。在這篇碩論中我們提出一種新的損失函數”引導式互補熵”,透過”引導式互補熵”學習的架構主要有兩個特性(a)使所有非標記類別的機率分布平攤化(b)最大化正確類別標記的機率分布。透過上面兩個特性,經過”引導式互補熵”學習的架構可在不同的類群間學習到更好的表徵,以此提升神經網路架構對於對抗式樣本的防護能力。此外,在架構受到白箱攻擊的情況下,不管神經網路是透過一般訓練或對抗式訓練,經過”引導式互補熵”學習的模型相對於透過“交叉熵”學習的模型在辨識對抗式樣本有更好的防禦性,並且在一般圖片辨識的任務下,透過”引導式互補熵”學習的模型準確率也優於一般透過“交叉熵”學習的模型。


    Improving model robustness for adversarial attacks has been shown as an essential issue. The model predictions can be drastically misled by adding small adversarial perturbations to images. In this thesis, we propose a new training objective “Guided Complement Entropy” (GCE) that has desirable dual effects: (a) neutralizing the predicted probabilities of incorrect classes (non-ground-truth label classes), and (b) maximizing the predicted probability of the ground-truth class, particularly when (a) is achieved. Training with GCE encourages models to learn latent representations where samples of different classes form distinct clusters, which we argue, improves the model robustness against adversarial perturbations. Furthermore, compared with the models trained with cross-entropy, same models trained with GCE achieve significant improvements on the robustness against white-box adversarial attacks, both with and without adversarial training. When no attack is present, the model training with GCE also outperforms cross-entropy in terms of model accuracy.

    1 Introduction 1 2 Related Work 5 2.1 Adversarial Attacks . . . . . . . . . . . . . . . . . 5 2.2 Adversarial Defenses . . . . . . . . . . . . . . . . .6 2.3 Complement Objective Training . . . . . . . . . . . . 7 3 Guided Complement Entropy 10 3.1 Guided Complement Entropy . . . . . . . . . . . . . .10 3.2 Synthetic Data Analysis . . . . . . . . . . . . . . .12 4 Experiments 16 4.1 Balancing scaling objective . . . . . . . . . . . . .16 4.2 Adversarial setting . . . . . . . . . . . . . . . . .17 4.3 Performance on natural examples . . . . . . . . . . .20 4.4 Robustness to White-box attacks . . . . . . . . . . .21 4.5 Robustness to adversarial training . . . . . . . . . 25 5 Conclusion 28 References 29

    [1] N. Carlini and D. A.Wagner. Towards evaluating the robustness of neural networks. In IEEESSP’17, 2017.
    [2] H.-Y. Chen, P.-H.Wang, C.-H. Liu, S.-C. Chang, J.-Y. Pan, Y.-T. Chen, W.Wei, and D.-C. Juan. Complement objective training. In ICLR’19, 2019.
    [3] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. Boosting adversarial attacks with momentum. In CVPR’18, 2018.
    [4] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR’15, 2015.
    [5] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Networks. In ICLR’14, 2014.
    [6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR’16, 2016.
    [7] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
    [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS’12, 2012.
    [9] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In ICLR’17 Workshop, 2017.
    [10] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In ICLR’17, 2017.
    [11] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
    [12] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick. Microsoft COCO: common objects in context. In ECCV’14, 2014.
    [13] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR’18, 2018.
    [14] P. Nakkiran. Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532, 2019.
    [15] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy, 2016.
    [16] N. Papernot and P. D. McDaniel. Extending defensive distillation. arXiv preprint arXiv:1705.05264, 2017.
    [17] N. Papernot, P. D. McDaniel, X.Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, 2015.
    [18] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. In ICLR’18, 2018.
    [19] P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987.
    [20] F. Tramr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR’18, 2018.
    [21] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. Robustness may be at odds with accuracy. In ICLR’19, 2019.
    [22] V. N. Vapnik. An overview of statistical learning theory. IEEE transactions on neural networks, 1999.
    [23] T.-W. Weng, H. Zhang, P.-Y. Chen, J. Yi, D. Su, Y. Gao, C.-J. Hsieh, and L. Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In ICLR’18, 2018.

    QR CODE