基於機器學習與網域名稱系統回應之應用程式辨識系統研製

簡易檢索 / 詳目顯示

回結果列表

研究生：	李哲銓 Li, Che-Chuan
論文名稱：	基於機器學習與網域名稱系統回應之應用程式辨識系統研製 Realization of Application Identification System Based on Machine Learning and DNS Responses
指導教授：	黃能富 Huang, Nen-Fu
口試委員:	石維寬 Shih, Wei-Kuan 陳俊良 Chen, Jiann-Liang
學位類別：	碩士 Master
系所名稱：
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	59
中文關鍵詞：	流量辨識、機器學習、網域名稱系統、軟體定義網路
外文關鍵詞：	flow classification, machine learning, DNS, SDN
相關次數：	點閱：90 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，針對特定應用程式的進行頻寬管理的需求逐漸成長。為了有效地針對特定應用程式進行頻寬管理，一個能夠分類流量至應用程式層級的流量辨識系統是必須的。然而，大多數的研究只能將流量分類為通訊協定或是粗略的類別。有些研究能夠將流量分類為應用程式，但拿來測試的資料集中的應用程式數量少於50個。有些研究能夠將流量分類為應用程式名稱，且測試的資料集中的應用程式數量多於100個，但是這些研究無法皆能夠產生手機應用程式與電腦桌面應用程式的訓練資料。對於一個應用程式，如果沒有其訓練資料，那麼要能夠正確的辨識其流量為其應用程式名稱是很困難的。在我們過去的研究成果中，我們能夠產生Windows桌面應用程式、Android原生手機應用程式與網站的訓練資料。最近，我們設計一套方法與撰寫程式讓系統能夠產生Mac OS X桌面應用程式與iOS原生手機應用程式的訓練資料。此外，我們設計一套方法，連結流量辨識系統擷取到的域名系統回覆中的IP位址與從作業系統取得的應用程式名稱，並只使用只連結一個應用程式名稱的IP位址來提升辨識準確率。在我們的實驗中，實驗測試的資料集有294個應用程式，此數量計算方式為將一應用程式於一平台實作的版本或一執行檔視為一個應用程式。因為有些應用程式在不同的作業系統上有不同的名稱，所以我們將指向相同應用程式的不同名稱，映射為相同的名稱。映射後，共有152個應用程式，包含Skype、Facebook與其他應用程式。所有應用程式的平均F值可達93.49%，證明此系統能精準地辨識應用程式的流量。

In recent years, the demand for application-specific quality of service (QoS) management has grown. To effectively do application-specific QoS, a system able to do flow classification at the application level is required. However, most researches can only classify flows as protocols or rough categories. Some researches can classify flows as application names, but there were fewer than 50 applications in their testing data sets. Several researches based on machine learning can classify flows as applications, and there were more than 100 applications in their testing data sets, but they cannot generate training data sets of both native mobile applications and computer desktop applications. Without a training data set of an application, it is hard to correctly identify its flow as its application name. In our previous work, we can generate reliable training data sets of desktop applications on Windows, native mobile applications on Android, and website applications. Recently, we designed the method and implemented programs to be able to generate reliable training data sets of desktop applications on Mac OS X and native mobile applications on iOS. Besides, we designed a method to link IP addresses in DNS responses captured by the flow classification system to application names obtained from operating systems (OSs). We use IP addresses which have been linked to one application name to improve classification accuracy. In our experiment, the testing data set contained 294 applications, given that each platform version or execution file of an application was one application. Because some applications had different names on different OSs, we mapped those names indicating same application into one name. After name mapping, there were 152 applications, including Skype, Facebook, and other applications. Average F-measure of all applications reached 93.49%, showing that this system can identify traffic of applications with high accuracy.

Abstract    I
中文摘要    II
Figure List    V
Table List    VI
Chapter 1 Introduction    1
Chapter 2 Related Works    4
2.1 Overview    4
2.2 Port-based Methods    4
2.3 DPI Methods    4
2.4 Machine Learning Methods    5
2.5 DNS-based Methods    7
2.6 Hybrid Methods    7
Chapter 3 System Core Design    10
3.1 Desired Type of Classification Results of Flows    10
3.2 Overall Procedure of Application Identification    10
3.3 Machine Learning    13
3.3.1 Training and Classifying Phases    13
3.3.2 Adopted Machine Learning Algorithms    13
3.3.3 Adopted Flow Attributes    14
3.4 DNS    18
3.4.1 Classification Based on Application in Server    18
3.4.2 Classification Based on Application in Client    21
3.4.3 Comparison    23
Chapter 4 System Implementation    24
4.1 Architecture of Application Identification System    24
4.2 Programs in Procedure of Supervised ML    25
4.3 Programs to Get Labels of Flows    26
4.4 Implementation of the Method Based on Inspection of DNS Responses    27
4.5 Online Training and Real Time Classifying    27
4.6 Offline Training and Classifying    28
4.7 Suitable for Integration with QoS Management Systems in SDNs    29
Chapter 5 Experiment    30
5.1 Testing Data Set    30
5.2 Mapping of Category Names    31
5.3 Performance Analysis    32
5.4 Accuracy Matrices    34
5.5 Accuracy Analysis    34
Chapter 6 Conclusion and Future Works    40
References    42
Appendix A  Table of Name Mappings of Categories    46
Appendix B  Table of F-Measure of Categories    55
                                

[1] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, “BLINC: multilevel traffic classification in the dark,” in Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, Philadelphia, Pennsylvania, USA, 2005.
[2] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian, “Traffic classification on the fly,” SIGCOMM Computer Communication Review, vol. 36, 2006, pp. 23-26.
[3] F. Constantinou and P. Mavrommatis, “Identifying Known and Unknown Peer-to-Peer Traffic,” Fifth IEEE International Symposium on Network Computing and Applications (NCA'06), Cambridge, MA, 2006, pp. 93-102.
[4] L. Deri, M. Martinelli, T. Bujlow, and A. Cardigliano, “nDPI: Open-source high-speed deep packet inspection,” 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), Nicosia, 2014, pp. 617-622.
[5] O. Beaudoux and M. Beaudouin-Lafon, “OPENDPI: A Toolkit for Developing Document-Centered Environments,” in Enterprise Information Systems VII, C.-S. Chen, J. Filipe, I. Seruca, and J. Cordeiro, Eds., ed: Springer Netherlands, 2006, pp. 231-239.
[6] C. S. Yang, M. Y. Liao, M. Y. Luo, S. M. Wang and C. E. Yeh, “A Network Management System Based on DPI,” 2010 13th International Conference on Network-Based Information Systems, Takayama, 2010, pp. 385-388.
[7] X. Lu et al., “A Real Implementation of DPI in 3G Network,” 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, Miami, FL, 2010, pp. 1-5.
[8] S. H. Lee, J. S. Park, S. H. Yoon and M. S. Kim, “High performance payload signature-based Internet traffic classification system,” 2015 17th Asia-Pacific Network Operations and Management Symposium (APNOMS), Busan, 2015, pp. 491-494.
[9] Young-Hoon Goo, Kyu-Seok Shim, Su-Kang Lee and M. S. Kim, “Payload signature structure for accurate application traffic classification,” 2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS), Kanazawa, 2016, pp. 1-4.
[10] F. Yu et al., “High speed deep packet inspection with hardware support,” PhD thesis, EECS Department, University of California at Berkeley, Berkeley, CA, 2006.
[11] P. Tian, G. Xiaoyu, Z. Chenhui, J. Junchen, W. Hao, and L. Bin, "Tracking millions of flows in high speed networks for application identification," in INFOCOM, 2012 Proceedings IEEE, 2012, pp. 1647-1655.
[12] Y. Ding, “A method of imbalanced traffic classification based on ensemble learning,” 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Ningbo, 2015, pp. 1-4.
[13] L. M. Nair and G. P. Sajeev, “Internet Traffic Classification by Aggregating Correlated Decision Tree Classifier,” 2015 Seventh International Conference on Computational Intelligence, Modelling and Simulation (CIMSim), Kuantan, 2015, pp. 135-140.
[14] F. Ghofrani, A. Jamshidi and A. Keshavarz-Haddad, “Internet traffic classification using Hidden Naive Bayes model,” 2015 23rd Iranian Conference on Electrical Engineering, Tehran, 2015, pp. 235-240.
[15] F. Ghofrani, A. Keshavarz-Haddad and A. Jamshidi, “Internet traffic classification using multiple classifiers,” 2015 7th Conference on Information and Knowledge Technology (IKT), Urmia, 2015, pp. 1-5.
[16] S. Mongkolluksamee, V. Visoottiviseth and K. Fukuda, “Enhancing the Performance of Mobile Traffic Identification with Communication Patterns,” 2015 IEEE 39th Annual Computer Software and Applications Conference, Taichung, 2015, pp. 336-345.
[17] V. Carela-Español, T. Bujlow, and P. Barlet-Ros, “Is Our Ground-Truth for Traffic Classification Reliable?,” in Passive and Active Measurement, vol. 8362, M. Faloutsos and A. Kuzmanovic, Eds., ed: Springer International Publishing, 2014, pp. 98-108.
[18] Z. A. Qazi, J. Lee, T. Jin, G. Bellala, M. Arndt, and G. Noubir,“Application-Awareness in SDN,” in Proceedings of ACM SIGCOMM, vol. 43, no. 4. ACM, 2013, pp. 487–488.
[19] T. Iwai and A. Nakao, “Adaptive mobile application identification through in-network machine learning,” 2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS), Kanazawa, 2016, pp. 1-6.
[20] D. Plonka and P. Barford, “Flexible traffic and host profiling via dns rendezvous,” in Workshop on Securing and Trusting Internet Names (SATIN), ACM, April 2011.
[21] M. Trevisan, I. Drago, M. Mellia and M. M. Munafò, "Towards web service classification using addresses and DNS," 2016 International Wireless Communications and Mobile Computing Conference (IWCMC), Paphos, 2016, pp. 38-43.
[22] B. Hullar, S. Laki, and A. Gyorgy, "Efficient Methods for Early Protocol Identification," Selected Areas in Communications, IEEE Journal on, vol. 32, pp. 1907-1918, 2014.
[23] F. Dehghani, N. Movahhedinia, M. R. Khayyambashi, and S. Kianian, "Real-Time Traffic Classification Based on Statistical and Payload Content Features," in Intelligent Systems and Applications (ISA), 2010 2nd International Workshop on, 2010, pp. 1-4.
[24] T. Bujlow, T. Riaz, and J. M. Pedersen, "Classification of HTTP traffic based on
C5.0 Machine Learning Algorithm," in Computers and Communications (ISCC),
2012 IEEE Symposium on, 2012, pp. 000882-000887.
[25] Zhanyi Wang, "The Applications of Deep Learning on Traffic Identification". Available: https://www.blackhat.com/docs/us-15/materials/us-15-Wang-The-Applications-Of-Deep-Learning-On-Traffic-Identification-wp.pdf
[26] Zhanyi Wang, "The Applications Of Deep Learning On Traffic Identification". Available: https://www.blackhat.com/docs/us-15/materials/us-15-Wang-The-Applications-Of-Deep-Learning-On-Traffic-Identification.pdf
[27] Chi-Sung Chou, “Realization of Application Identification System Based on Statistical Signatures,” M.S. thesis, Institute of Communications Engineering, Tsing Hua University, Hsinchu, Taiwan, 2015.
[28] N. F. Huang, G. Y. Jai, Chih-Hao Chen and H. C. Chao, “On the cloud-based network traffic classification and applications identification service,” 2012 International Conference on Selected Topics in Mobile and Wireless Networking, Avignon, 2012, pp. 36-41.
[29] C. a. o. i. affiliates. Snort. Available: https://www.snort.org/
[30] M. Roesch, "Snort: Lightweight Intrusion Detection for Networks," in LISA, 1999, pp. 229-238.
[31] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," SIGKDD Explorations Newsletter, vol. 11, pp. 10-18, 2009.
[32] L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001/10/01 2001.
[33] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, "Rotation Forest: A New Classifier Ensemble Method," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, pp. 1619-1630, 2006.
[34] Weka. RandomCommittee. Available: http://wiki.pentaho.com/display/DATAMINING/RandomCommittee
[35] N. F. Huang, I. J. Liao, H. W. Liu, S. J. Wu and C. S. Chou, “A dynamic QoS management system with flow classification platform for software-defined networks,” 2015 8th International Conference on Ubi-Media Computing (UMEDIA), Colombo, 2015, pp. 72-77.
[36] N. F. Huang, S. J. Wu, I. J. Liao and C. W. Lin, “Bandwidth distribution for applications in slicing network toward SDN on vCPE framework,” 2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS), Kanazawa, 2016, pp. 1-4.
[37] Nancy Chinchor, “MUC-4 Evaluation Metrics,” in Proceedings of the Fourth Message Understanding Conference, 1992, pp. 22–29.

簡易檢索 / 詳目顯示

相關論文