利用神經正切泛化攻擊以防禦聲音驗證碼｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	許瀞云 Hsu, Ching-Yun
論文名稱：	利用神經正切泛化攻擊以防禦聲音驗證碼 Defending an Audio CAPTCHA system by Neural Tangent Generalization Attack
指導教授：	劉奕汶 Liu, Yi-Wen
口試委員:	張正尚 Chang, Cheng-Shang 吳尚鴻 Wu, Shan-Hung 冀泰石 Chi, Tai-Shih
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	111
語文別：	中文
論文頁數：	44
中文關鍵詞：	聲音驗證碼、對抗式機器學習、神經正切核、泛化攻擊
外文關鍵詞：	Audio CAPTCHA, Adversarial machine learning, Neural tangent kernel, Generalization attack
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

驗證碼是一種用來保護個人隱私的安全措施。ㄧ個設計良好的驗證碼系統應具備方便讓使用者使用，且能夠有效防止自動程式輕易破解系統的效果。本篇論文主要針對聲音驗證碼進行研究，即驗證碼以音檔的形式所呈現。近年來，深度學習和神經網路的技術廣為流行，且普遍應用於各個領域之中。隨著深度學習日益成熟的發展，有關於深度學習對於個人隱私安全可能造成的潛在危險也逐漸被重視。基於上述理由，本研究以開發能夠有效對抗深度學型相關攻擊的聲音驗證碼為目標。神經正切泛化攻擊是以產生無法被神經網路所利用的資料集為目標的一種對抗式機器學習，該攻擊的效力已經由實驗所證實。我們的目標便是利用神經正切泛化攻擊生成無法被神經網路所利用的聲音資料集，並藉由此聲音資料集產生聲音驗證碼，而該驗證碼應因此具備抵禦深度學習相關攻擊手段的效力。本研究針對由神經正切泛化攻擊所產生的聲音資料集之有效性的實驗結果顯示，該資料集對於對抗式機器學習之抵抗性尚待證實。然而，本篇論文指出針對聲學領域之對抗式機器學習相關研究的重要性。

A CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a security measure designed for the purpose of protecting data privacy. A well-designed CAPTCHA should easily differentiate legitimate users from online bots. In the work, we focus on audio CAPTCHA, in which the CAPTCHA challenge is presented in audio format. Over the years, deep learning and neural networks have become widely popular and commonly used in various fields. The rapid development of deep learning, however, can potentially cause security threats and privacy issue. Hence, we focus on presenting an audio CAPTCHA system that is resistant to deep learning attacks. Neural Tangent Generalization Attack (NTGA) is an adversarial machine learning algorithm that aims to output dataset that is unlearnable by deep neural networks, and has been proven to be effective on dataset such as MNIST [1], CIFAR-10 [2], and ImageNet [3]. Our goal is to utilize NTGA to create an audio dataset that is unlearnable, and so, the audio CAPTCHA system generated with such dataset is highly secure. Experiments are conducted to evaluate whether the effectiveness of NTGA can be replicated in audio domain. Current empirical results show that the proposed audio CAPTCHA system is still vulnerable to deep learning attacks. Yet, research regarding adversarial machine learning in audio domain is worth further investigation.

Introduction.........................................................................1
1. Motivation..........................................................................1
2. Problem Statement...........................................................1
3. Thesis Organization..........................................................2

Related Works.....................................................................3
1. Audio CAPTCHA...............................................................3
2. Adversarial Machine Learning..........................................5

Methods..............................................................................6
1. CAPTCHA system.............................................................6
2. Threats and Countermeasures.........................................9
2.1. Offense..........................................................................9
2.2. Defense.........................................................................11
3. Neural Tangent Generalization Attack.............................11
3.1. Gaussian Process..........................................................12
3.2. Neural Tangent Kernel..................................................16
3.3. Neural Tangent Generalization Attack..........................19
4. Feature Extraction...........................................................22
5. Audio Reconstruction......................................................25

Experiments and Results....................................................27
1. Dataset.............................................................................27
2. Experiment Setup............................................................28
3. Metrics............................................................................28
4. Evaluation........................................................................29
5. Discussion.......................................................................30

Conclusions........................................................................39

References..............................................................................40

Appendix.................................................................................43
A.1. Suggestions from the Oral Defense Committee...............43
A.1.1. 張正尚教授......................................................................43
A.1.2. 冀泰石教授.....................................................................43
A.1.3. 吳尚鴻教授.....................................................................43
A.1.4. 劉奕汶教授.....................................................................44

                                

1. [1] Y. LeCun, C. Cortes, and C. Burges, “MNIST handwritten digit database,” ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2, 2010.
2. [2] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” 2009.
3. [3] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, “Imagenet: A large- scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
4. [4] C. H. Huang, P. H. Wu, Y. W. Liu, and S. H. Wu, “Attacking and defending behind a psychoacoustics-based CAPTCHA,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 895–899, IEEE, 2021.
5. [5] V. P. Singh and P. Pal, “Survey of different types of CAPTCHA,” International Journal of Computer Science and Information Technologies, vol. 5, no. 2, pp. 2242–2245, 2014.
6. [6] S. Kulkarni and H. Fadewar, “Audio CAPTCHA techniques: A review,” in Proceedings of the Second International Conference on Computational Intelligence and Informatics, pp. 359–368, Springer, 2018.
7. [7] J. Holman, J. Lazar, J. Feng, and J. D’Arcy, “Developing usable CAPTCHAs for blind users,” in ASSETS’07: Proceedings of the Ninth International ACM SIGACCESS Conference on Computers and Accessibility, pp. 245–246, January 2007.
8. [8] N. Tariq and F. A. Khan, “Match-the-sound CAPTCHA,” in Information Technology-New Generations, pp. 803–808, Springer, 2018.
9. [9] H. Meutzner and D. Kolossa, “A non-speech audio CAPTCHA based on acoustic event detection and classification,” in 2016 24th European Signal Processing Conference (EUSIPCO), pp. 2250–2254, IEEE, 2016.
10. [10] J. Lazar, J. Feng, T. Brooks, G. Melamed, B. Wentz, J. Holman, A. Olalere, and N. Ekedebe, “The soundsright CAPTCHA: An improved approach to audio human interaction proofs for blind users,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2267–2276, Association for Computing Machinery, 2012.
11. [11] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, “Adversarial machine learning,” in Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58, Association for Computing Machinery, 2011.
12. [12] G. Xu, H. Li, H. Ren, K. Yang, and R. H. Deng, “Data security issues in deep learning: Attacks, countermeasures, and opportunities,” IEEE Communications Magazine, vol. 57, no. 11, pp. 116–122, 2019.
13. [13] C. Yang, Q. Wu, H. Li, and Y. Chen, “Generative poisoning attack method against neural networks,” 2017. arXiv:1703.01340.
14. [14] K. J. Piczak, “Esc: Dataset for environmental sound classification,” in Proceedings of the 23rd Annual ACM Conference on Multimedia, pp. 1015–1018, ACM Press, 2015.
15. [15] S. Becker, M. Ackermann, S. Lapuschkin, K.-R. Müller, and W. Samek, “Interpreting and explaining deep neural networks for classification of audio signals,” 2018. arXiv:1807.03418.
16. [16] N. Takahashi, M. Gygli, B. Pfister, and L. Van Gool, “Deep convolutional neural networks and data augmentation for acoustic event detection,” 2016. arXiv:1604.07160.
17. [17] T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, and K. Takeda, “Duration-controlled LSTM for polyphonic sound event detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 11, pp. 2059–2070, 2017.
18. [18] E. Çakır, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, “Convolutional recurrent neural networks for polyphonic sound event detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1291–1303, 2017.
19. [19] C. H. Yuan and S. H. Wu, “Neural tangent generalization attacks,” in International Conference on Machine Learning, pp. 12230–12240, PMLR, 2021.
20. [20] K. P. Murphy, Machine Learning: a Probabilistic Perspective. MIT press, 2012.
21. [21] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. MIT Press, 2006.
22. [22] A. Jacot, F. Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural networks,” Advances in Neural Information Processing Systems, vol. 31, 2018.
23. [23] L. Ambrosio, N. Gigli, and G. Savaré, Gradient Flows: in Metric Spaces and in the Space of Probability Measures. Springer Science & Business Media, 2005.
24. [24] S. Arora, S. S. Du, W. Hu, Z. Li, R. Salakhutdinov, and R. Wang, “On exact computation with an infinitely wide neural net,” 2019. arXiv:1904.11955.
25. [25] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” 2012. arXiv:1206.6389.
26. [26] S. Mei and X. Zhu, “Using machine teaching to identify optimal training-set attacks on machine learners,” in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
27. [27] H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli, “Is feature selection secure against training data poisoning?,” in International Conference on Machine Learning, pp. 1689–1698, PMLR, 2015.
28. [28] A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita- Rotaru, and F. Roli, “Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks,” in 28th USENIX Security Symposium (USENIX Security 19), pp. 321–338, 2019.
29. [29] J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, J. Sohl-Dickstein, and J. Pennington, “Wide neural networks of any depth evolve as linear models under gradient descent,” Advances in neural information processing systems, vol. 32, 2019.
30. [30] L. Chizat, E. Oyallon, and F. Bach, “On lazy training in differentiable programming,” Advances in Neural Information Processing Systems, vol. 32, 2019.
31. [31] S. S. Stevens, J. Volkmann, and E. B. Newman, “A scale for the measurement of the psychological magnitude pitch,” The Journal of the Acoustical Society of America, vol. 8, no. 3, pp. 185–190, 1937.
32. [32] D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, 1984.
33. [33] N. Perraudin, P. Balazs, and P. L. Søndergaard, “A fast Griffin-Lim algorithm,” in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4, 2013.
34. [34] R. Gerchberg and W. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” SPIE Milestone Series MS, vol. 94, pp. 646–646, 1994.
35. [35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. arXiv:1412.6980.

簡易檢索 / 詳目顯示

相關論文