基於靜態分析與機器學習的Android惡意軟體檢測方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	李季維 Lee, Chi-Wei
論文名稱：	基於靜態分析與機器學習的Android惡意軟體檢測方法 An Android Static-Analysis Malware Detection Method using Machine Learning
指導教授：	孫宏民 Sun, Hung-Min
口試委員:	許富皓 Hsu, Fu-Hau 黃育綸 Huang, Yu-Lun
學位類別：	碩士 Master
系所名稱：
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	29
中文關鍵詞：	機器學習、靜態分析、安卓系統、惡意軟體檢測
外文關鍵詞：	Machine Learning, Static-Analysis, Android, Malware Detection Method
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，隨著智慧型手機的普及，吸引了眾多的開發者紛紛投入開發應用程式，各式各樣不同種類的應用程式應運而生。根據研究機構Strategy Analytics在2016年第三季統計的最新報告中，Android系統的全球市占率創歷史新高，突破87.5%。也就是說，每10個人中，就有9個人手上拿的是Android手機。由於智慧型手機功能越來越強大，越來越多人將智慧型手機取代個人電腦來使用。每天有數以萬計的應用程式被下載下來，也因為如此，吸引了惡意的開發者，在應用程式內藏有惡意行為，成為駭客攻擊的手法。這些惡意軟體可能會竊取使用者的個人隱私資料，如使用者的姓名，手機號碼，通訊錄等，這些個人隱私資料，一旦遭到惡意軟體竊取並洩漏，輕者可能遭到廣告的騷擾，重者則可能會收到詐騙。另外惡意程式也可能會擅自發送付費簡訊、攔截扣費提醒簡訊等等。因此如何偵測惡意程式也成了一個很大的議題。
惡意軟體的分析主要分為兩種 : 靜態分析及動態分析。靜態分析主要是將應用程式反編譯後，取得程式內部的資訊，透過解析正常及惡意程式內部資訊，提取特徵值，經過處理、篩選找出明顯的惡意特徵值；至於動態分析主要是藉由觸發惡意程式，紀錄並分析其程式執行時的行為，提取惡意程式的特徵。本論文是使用靜態分析的方法，以往的研究中只使用Manifest檔案中的權限來偵測惡意程式，得到的準確率很低。為了增加判斷的準確率，我們將Android的Manifest檔案中其他元件都納入分析，另外使用n-gram語法模型來對smali檔的opcode資訊做分析，篩選出明顯的惡意特徵，最後透過machine learning來判斷是否為惡意程式。我們取得大量的惡意程式樣本及正常的應用程式樣本來進行實驗，以驗證我們所提出的方法可以有效辨別出惡意程式。

In recent years, with the popularity of smart phones, attracting a large number of developers have put into the development of application. A wide variety of application and different types of function came into being. According to the latest report of Strategy Analytics in the third quarter of 2016, the Android global market share hits a record high of 87.5 percent. In other words, nine out of ten smart phone users use Android system. As smart phones become more powerful, more and more people will use smart phones instead of personal computers. Tens of thousands of applications are downloaded every day, and because of this, they have attracted malicious developers, malicious behavior in the application, and hacker attack. These malware applications may steal the user’s personal privacy information, such as the user’s name, mobile phone number, address book and so on. Once these personal privacy information hacked and leaked, minor impact may be harassed by advertising, while major impact may be scam. In addition, malware applications may also send text messages without permission, and intercept reminding text messages. So how to detect malicious programs has become a big issue.
Malware application analysis is divided into two methods, static analysis and dynamic analysis. Static analysis is mainly to disassemble the application, then obtains the information inside the program. By analyzing the internal information of normal and malware applications, extract the features. After preprocessing, find the obvious malicious features. As dynamic analysis is mainly to trigger the malware applications, then record and analyze the behavior of the application execution. Extract the characteristics of malware applications. In this paper, we use static analysis method to detect malware application. The previous research only use the permission inside the Manifest file to detect malware. However, the accuracy is low. In order to increase the accuracy of judgment, we will use not only permission but also the other components inside the Manifest file to analysis. We also use n-gram grammar model to analysis the opcode information inside the smali file. Select the obvious malware features, and finally use different machine learning algorithms to classify the malware application. We have a large number of malware and normal applications samples to verify that our proposed method can effectively classify malware applications.

Contents………………………………………………………………………………………………………………………vi
List of Figures…………………………………………………………………………………………………………viii
Chapter 1 Introduction……………………………………………………………………………………………1
1    Motivation…………………………………………………………………………………………………1
2    Our  Contribution……………………………………………………………………………………3
3    Organization ……………………………………………………………………………………………3
Chapter 2 Background ………………………………………………………………………………………………4
1      Android System…………………………………………………………………………………………4
1.1 Android System Architecture…………………………………………………………4
1.2 Android Permission Mechanism………………………………………………………5
2      Malware Detection Method………………………………………………………………………7
2.1 Static Analysis ………………………………………………………………………………7
2.2 Dynamic Analysis………………………………………………………………………………8
3      WEKA ……………………………………………………………………………………………………………9
3.1 WEKA Input Format – ARFF………………………………………………………………9
3.2 Machine Learning Algorithm…………………………………………………………10
Chapter 3 Related Works…………………………………………………………………………………………12
1   Android Manifest file structure ………………………………………………………12
2   Preprocessing……………………………………………………………………………………………14
3        Feature extraction……………………………………………………………………………… 15
3.1 Selection of key permission feature…………………………………16
3.2 Feature vector generation algorithm…………………………………16
4   Architecture Diagram………………………………………………………………………18
Chapter 4 Design Framework……………………………………………………………………………19
1   Proposed Method…………………………………………………………………………………19
1.1 Preprocessing…………………………………………………………………………………20
1.2 Feature Elimination………………………………………………………………………21
1.3 Feature selection……………………………………………………………………………22
2   Architecture Diagram……………………………………………………………………………23
Chapter 5 Experimental Result…………………………………………………………………………………24
Reference………………………………………………………………………………………………………………………28

                                

[1] Smartphone os market share. IDC, Q3, 2016.
http://www.idc.com/promo/smartphone-market-share/os
[2] WIKIPEDIA, Android (operating system).
https://en.wikipedia.org/wiki/Android_(operating_system)
[3] Enck W, Ongtang M, McDaniel P. On lightweight mobile phone application
certification[C]//Proceedings of the 16th ACM conference on Computer and communications security. ACM, 2009: 235-245.
[4] Felt A P, Chin E, Hanna S, et al. Android permissions demystified[C]//Proceedings of
the 18th ACM conference on Computer and communications security. ACM, 2011: 627-638.
[5] Grace M, Zhou Y, Zhang Q, et al. Riskranker: scalable and accurate zero-day android
malware detection[C]//Proceedings of the 10th international conference on Mobile systems, applications, and services. ACM,2012: 281-294.
[6] Peng H, Gates C, Sarma B, et al. Using probabilistic generative models for ranking
risks of android apps[C]//Proceedings of the 2012 ACM conference on Computer and communications security. ACM, 2012: 241-252.
[7] WEKA.
http://www.cs.waikato.ac.nz/ml/weka/
[8] ARFF format.
https://weka.wikispaces.com/ARFF+(stable+version)
[9] X. Li, J. Liu, Y. Huo, R. Zhang and Y. Yao, "An Android malware detection method
based on AndroidManifest file," 2016 4th International Conference on Cloud
Computing and Intelligence Systems (CCIS), Beijing, 2016, pp. 239-243.doi: 10.1109/CCIS.2016.7790261
[10] SMALI file.
https://fileinfo.com/extension/smali
[11] n-gram
https://en.wikipedia.org/wiki/N-gram
[12] M. Qiao, A. H. Sung and Q. Liu, "Merging Permission and API Features for Android
Malware Detection," 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Kumamoto, 2016, pp. 566-571.doi: 10.1109/IIAI-AAI.2016.237
[13] K. Wang, T. Song and A. Liang, "Mmda: Metadata Based Malware Detection on
Android," 2016 12th International Conference on Computational Intelligence and Security (CIS), Wuxi, 2016, pp. 598-602.doi: 10.1109/CIS.2016.0145
[14] S. S. Hansen, T. M. T. Larsen, M. Stevanovic and J. M. Pedersen, "An approach for detection
and family classification of malware based on behavioral analysis," 2016 International Conference on Computing, Networking and Communications (ICNC), Kauai, HI, 2016, pp. 1-5.doi: 10.1109/ICCNC.2016.7440587
[15] L. Ouyang, F. Dong and M. Zhang, "Android malware detection using 3-level ensemble," 2016
4th International Conference on Cloud Computing and Intelligence Systems (CCIS), Beijing, 2016, pp. 393-397.doi: 10.1109/CCIS.2016.7790290
[16] R. S. Pirscoveanu, S. S. Hansen, T. M. T. Larsen, M. Stevanovic, J. M. Pedersen and A. Czech,
"Analysis of Malware behavior: Type classification using machine learning," 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), London, 2015, pp. 1-7.doi: 10.1109/CyberSA.2015.7166115
[17] M. Vajdi, A. Torkaman, M. Bahrololum, M. H. Tadayon and A. Salajegheh, "Proposed new
features to improve Android malware detection," 2016 8th International Symposium on
Telecommunications (IST), Tehran, 2016, pp. 100-104.doi: 10.1109/ISTEL.2016.7881791
[18] Chen Da, Zhang Hongmei and Zhang Xiangli, "Detection of Android malware security on
system calls," 2016 IEEE Advanced Information Management, Communicates, Electronic and
Automation Control Conference (IMCEC), Xi'an, 2016, pp. 974-978.doi: 10.1109/IMCEC.2016.7867355
[19] A. Yewale and M. Singh, "Malware detection based on opcode frequency," 2016 International
Conference on Advanced Communication Control and Computing Technologies (ICACCCT),
Ramanathapuram, 2016, pp. 646-649.doi: 10.1109/ICACCCT.2016.7831719
[20] S. Anwar, J. M. Zain, Z. Inayat, R. U. Haq, A. Karim and A. N. Jabir, "A static approach
towards mobile botnet detection," 2016 3rd International Conference on Electronic Design
(ICED), Phuket, 2016, pp. 563-567.doi: 10.1109/ICED.2016.7804708

全文公開日期 2022/08/09 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文