研究生: |
簡易 Chien, Yi |
---|---|
論文名稱: |
一個基於機器學習的網頁靜態檢測方法 Predicting Injection Vulnerabilities in Web Applications |
指導教授: |
孫宏民
Sun, Hung-Min |
口試委員: |
曾文貴
Tzeng, Wen-Guey 顏嵩銘 Yen, Sung-Ming |
學位類別: |
碩士 Master |
系所名稱: |
|
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 71 |
中文關鍵詞: | 靜態檢測 、注射型網頁攻擊 、PHP 、JavaScript 、機器學習 |
外文關鍵詞: | Static analysis, Injection type vulnerability, PHP, JavaScript, Machine learning |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
網頁已經與現代人的生活密不可分,像是訂機票、線上購物或是瀏覽Facebook,
我們的生活作息已經跟網頁息息相關。然而諸如訂機票或線上購物或是瀏
覽Facebook 的這些動作,都透露許多使用者的個人隱私資訊在這上面,因此網
頁的安全就顯得更加得重要。
而今日的網頁,有大部分是用PHP 來做後端網頁的開發,像是Facebook、
Wikipedia 或WordPress 這些大公司都是用PHP 來做後端網頁的開發與維護。另
一方面Node.js 則是近幾年來開始盛行,Node.js 的好處是可以用JavaScript 一種
語言來完成前後端的整合,越來越多開發者開始選擇使用Node.js 來進行網頁的
開發。
我們在這篇論文提出一個靜態檢測方法來檢測PHP 跟JavaScript 的Injection
Type 的漏洞,我們提出一個從PHP 跟JavaScript 中抽取特徵碼來代表該檔案
的漏洞行為,並使用特徵碼與機器學習的方法來訓練漏洞檢測的模型。最後,
給予我們的系統一個PHP 或JavaScript 的檔案,我們可以回傳該檔案可能的
Injection Type 的漏洞,並回報給開發者,讓開發者可以在網頁還沒上線前先進行
檢測,並針對可能的漏洞進行修補。
Surfing websites have become a part of modern people’s life, like online shopping,
booking flight tickets, or browsing Facebook. Our daily life has become inseparable
to the internet and websites, and our personal and private data are also uploaded
to the web services. Therefore, securing the websites becomes an important issue.
A vulnerability often comes from unnoticeable program flaws in programs. It is
developers’ obligation to make sure that web project are safe and secure.
There are numerous choices of language for developers to build a website. For
example, most of the websites are built on PHP, like Facebook, Wikipedia, and etc.
On the other hand, Node.js is becoming more and more popular with developers
nowadays. If developers can examine website’s security flaws and repair them before
release, the website’s service would be more secure, and users can surf the net
without worrying the leakage of their personal data.
In this thesis, we propose a system using static analysis method based on machine
learning to predict injection type vulnerabilities of PHP and JavaScript. We
propose a feature extraction algorithm for the source code and use machine learning
techniques to learn the possible vulnerabilities. Given a source code written in PHP
or JavaScript, our system can predict the possible injection type vulnerabilities with
the training models and return to the developers. As a consequence, developers can
detect potential vulnerabilities in a website project, and repair weak points before
the website’s release.
[1] Web statistics report. https://whitehatsec.com/categories/statisticsreport.
[2] Ldap injection owasp. https://www.owasp.org/index.php/LDAP_injection.
[3] Xml injection owasp. https://www.owasp.org/index.php/Testing_for_
XML_Injection_(OWASP-DV-008).
[4] Local file inclusion injection owasp. https://www.owasp.org/index.php/
Testing_for_Local_File_Inclusion.
[5] Static analysis wikipedia. https://en.wikipedia.org/wiki/Static_
program_analysis.
[6] Dynamic analysis wikipedia. https://en.wikipedia.org/wiki/Dynamic_
program_analysis.
[7] Abstract syntax tree wikipedia. https://en.wikipedia.org/wiki/Abstract_
syntax_tree.
[8] Machine learning wikipedia. https://en.wikipedia.org/wiki/Machine_
learning.
[9] Andy Liaw and Matthew Wiener. Classification and regression by randomforest.
R news, 2(3):18–22, 2002.
[10] Naive bayes wikipedia. https://en.wikipedia.org/wiki/Naive_Bayes_
classifier.
[11] Irina Rish. An empirical study of the naive bayes classifier. In IJCAI 2001
workshop on empirical methods in artificial intelligence, volume 3, pages 41–46.
IBM New York, 2001.
[12] Svm wikipedia. https://en.wikipedia.org/wiki/Support_vector_
machine.
[13] Chih-Chung Chang and Lin CJ LIBSVM. a library for support vector machines,
2001. Software available at http://www. csie. ntu. edu. tw/cjlin/libsvm, 2012.
[14] Bertrand Stivalet and Elizabeth Fong. Large scale generation of complex and
faulty php test cases. In Software Testing, Verification and Validation (ICST),
2016 IEEE International Conference on, pages 409–415. IEEE, 2016.
[15] Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen.
Predicting vulnerable software components via text mining. IEEE Transactions
on Software Engineering, 40(10):993–1006, 2014.
[16] Lwin Khin Shar and Hee Beng Kuan Tan. Predicting sql injection and cross
site scripting vulnerabilities through mining input sanitization patterns. Information
and Software Technology, 55(10):1767–1780, 2013.
[17] James Walden, Jeff Stuckman, and Riccardo Scandariato. Predicting vulnerable
components: Software metrics vs text mining. In Software Reliability Engineering
(ISSRE), 2014 IEEE 25th International Symposium on, pages 23–33. IEEE,
2014.
[18] Mukesh Kumar Gupta, Mahesh Chandra Govil, and Girdhari Singh. Predicting
cross-site scripting (xss) security vulnerabilities in web applications. In
Computer Science and Software Engineering (JCSSE), 2015 12th International
Joint Conference on, pages 162–167. IEEE, 2015.
[19] Ibéria Medeiros, Nuno F Neves, and Miguel Correia. Automatic detection and
correction of web application vulnerabilities using data mining to predict false
positives. In Proceedings of the 23rd international conference on World wide
web, pages 63–74. ACM, 2014.
[20] Adam Kieyzun, Philip J Guo, Karthick Jayaraman, and Michael D Ernst. Automatic
creation of sql injection and cross-site scripting attacks. In Software
Engineering, 2009. ICSE 2009. IEEE 31st International Conference on, pages
199–209. IEEE, 2009.
[21] Shih-Kun Huang, Han-Lin Lu, Wai-Meng Leong, and Huan Liu. Craxweb:
Automatic web application testing and attack generation. In Software Security
and Reliability (SERE), 2013 IEEE 7th International Conference on, pages
208–217. IEEE, 2013.
[22] Lwin Khin Shar, Hee Beng Kuan Tan, and Lionel C Briand. Mining sql injection
and cross site scripting vulnerabilities using hybrid program analysis.
In Proceedings of the 2013 International Conference on Software Engineering,
pages 642–651. IEEE Press, 2013.
[23] Abstract syntax tree for php. https://pypi.python.org/pypi/phply.
[24] Abstract syntax tree for javascript. https://pypi.python.org/pypi/slimit.
[25] re package of python. https://docs.python.org/2/library/re.html.
[26] os package of python. https://docs.python.org/2/library/os.html.
[27] sys package of python. https://docs.python.org/2/library/sys.html.
[28] json package of python. https://docs.python.org/2/library/json.html.
[29] Regular expression of python. https://docs.python.org/2/library/copy.
html.
[30] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann,
and Ian H Witten. The weka data mining software: an update. ACM
SIGKDD explorations newsletter, 11(1):10–18, 2009.
[31] Ecma. https://www.ecma-international.org/.
[32] Npm. https://www.npmjs.com/.
[33] Cross-validation wikipedia. https://en.wikipedia.org/wiki/Crossvalidation_(
statistics).
[34] Confusion matrix. https://en.wikipedia.org/wiki/Confusion_matrix.