簡易檢索 / 詳目顯示

研究生: 周佳慶
Chou, Chia-Chin
論文名稱: 基於機器學習演算法之雲端網路應用辨識服務平台
A Cloud based Application Classification Service Platform with Machine Learning Algorithms
指導教授: 黃能富
Huang, Nen-Fu
口試委員: 黃崇銘
Huang, Chung-Ming
李維聰
Lee, Wei-Tsong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 通訊工程研究所
Communications Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 47
中文關鍵詞: 基於機器學習演算法之雲端網路應用辨識服務平台
外文關鍵詞: A Cloud based Application Classification Service Platform with Machine Learning Algorithms
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了在應用程式的通訊開始時進行網路應用流量的辨識,基於本實驗室碩士生翟敬源在2012年度提出的論文研究「基於網路應用程式通訊回合特徵之網路應用程式流量早期分類法」,此論文提出以機器學習為基礎的高準確率演算法「應用層回合制」方法,對於每條TCP/UDP的通訊流,定義了可提供高準確率與適用於流量即時辨識的統計屬性,不過此研究在線上流量辨識率只達到60%。本論文主要貢獻在於改善此研究在線上流量的辨識率與效能。將演算法「應用層回合制」加入了狀態機的概念提升流量辨識率,並加入預流量過濾器提升系統效能。本論文以應用層的角度,針對每條通訊流於應用層通訊開始的協商過程,增加了可取得協商回合特徵的統計屬性,以分類流量。本方法使用C4.5修剪決策樹演算法,針對此演算法,本論文也提出維度增量提升此演算法在做流量分類上的準確度,對校園線上流量的辨識度最多可達到91.2%的準確度,以及平均87.55%的準確率。相較未使用本論文的方法,本論文提出的方法對於以原始線上流量資料較之前的研究高27%到30%的準確率。本論文提出的方法可在短時間內完成流量測試,本論文提出方法的優點是可適用於分類加密流量、具高準確率,並可用於即時流量分類。


    The thesis is based on the thesis “On the Cloud-Based Network Traffic Classification and Applications Identification Services” proposed by Master Gin-Yuan-Jai in 2012. The thesis proposed by Master Gin-Yuan-Jai in 2012 proposes a machine learning-based high-accuracy algorithm called “APPlication Round method (APPR)” to identify network application traffic at the early stage. For each TCP/UDP flow, discriminators available at the early stage are determined to support high-accuracy traffic classification. But the accuracy for the real-time traffic classification is only 60%. This thesis proposes some methods for improving the accuracy and the efficacy of the traffic classification for real-time network. And this thesis adds the state machine to APPR for improving the accuracy of the traffic classification. Adding Pre-filter improves the efficacy of the traffic classification. The function of the Pre-filter is filtering the well-known applications previously. Such discriminators characterize the possible negotiation behaviors of each flow from an application layer perspective. By applying a pruned C4.5 tree machine learning algorithm to real traffic trace, this thesis proposes the method to add the dimension of the algorithm for increasing the accuracy of the traffic classification. The accuracy of the real-time campus network is maximal 91.2%, with an average overall accuracy of 87.55%. Compared to the thesis proposed by Master Gin-Yuan-Jai in 2012, the proposed methods provides more than 27% to 30% improvement of overall accuracy for the real-time campus network. Furthermore, the proposed method is also appropriate for identifying encrypted protocols and has the advantages of high accuracy and support for real-time classification.

    Abstract.…….…………………………………………………………………………....ii Acknowledgements.…………………………………………………………..…………iv Table of Contents.…….……………………………………………………………….... v List of Figures……... ………………………………………………………………….. vi List of Tables …………………………………………………………………………. vii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Organization 6 Chapter 2 Literature Review 7 Chapter 3 Application Classification Service Platform on Cloud 14 3.1 Training and Classification Service Platform 14 3.2 The Method of Improving the Accuracy and the Efficiency 19 3.3 Classify the Web Applications 24 3.4 Training and Classification Module 25 3.5 Redirect Threatening Flows with SDN Switch 28 Chapter 4 Experiment Results and Analyses 30 4.1 Experiment Architecture 30 4.2 Experiment Training Rules 32 4.3 Accuracy and Delay of the Experiment 35 Chapter 5 Conclusion 42 Bibliography 43

    [1] D. Erman, D. Ilie, A. Popescu, BitTorrent Traffic Characteristics, in: Proc. Int. Multi-Conference on Computing in the Global Information Technology 2006 (ICCGI '06), IEEE Computer Society, Bucharest 2006, pp. 42.
    [2] M. Izal, G. Urvoy-Keller, E.W. Biersack, P. Felber, A.A. Hamra, L. Garcés-Erice, Dissecting BitTorrent: Five Months in a Torrent’s Lifetime, in: C. Barakat, I. Pratt (Eds.), Proc. 5th Int. Passive and Active Measurement Workshop (PAM 2004) , Antibes Juan-les-Pins, France, 2004, pp. 1-11.
    [3] T. Karagiannis, A. Broido, N. Brownlee, K. Claffy, M. Faloutsos, File-sharing in the Internet: A characterization of P2P traffic in the backbone, in: Technical Report, 2003.
    [4] T. Karagiannis, A. Broido, N. Brownlee, K. Claffy, M. Faloutsos, Is P2P dying or just hiding?, in: Proc. 47th annual IEEE Global Telecommunications Conference (GLOBECOM 2004), Dallas, Texas, USA, 2004, pp. 1532-1538.
    [5] S. Sen, J. Wang, Analyzing peer-to-peer traffic across large networks, IEEE/ACM Transactions on Networking, 12(2) (2004) 219-232.
    [6] K. Suh, D.R. Figueiredo, J. Kurose, D. Towsley, Characterizing and Detecting Skype-Relayed Traffic, in: Proc. 25th IEEE Int. Conference on Computer Communications (INFOCOM 2006), Barcelona, Spain 2006, pp. 1-12.
    [7] Y. Yu, D. Liu, J. Li, C. Shen, Traffic Identification and Overlay Measurement of Skype, in: 2006 Int. Conference on Computational Intelligence and Security (CIS '06), Guangzhou, China, 2006, pp. 1043-1048.
    [8] A.W. Moore, K. Papagiannaki, Toward the Accurate Identification of Network Applications, in: C. Dovrolis (Ed.) Proc. 6th Int. Passive and Active Network Measurement Workshop, Springer, Boston, MA, USA, 2005, pp. 41-54.
    [9] S. Sen, O. Spatscheck, D. Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, in: Proc. the 13th int. conference on World Wide Web (WWW 2004), Manhattan, New York, NY, USA, 2004, pp. 512-521.
    [10] F. Constantinou, P. Mavrommatis, Identifying Known and Unknown Peer-to-Peer Traffic, in: Proc. 5th IEEE Int. Symposium on Network Computing and Applications (NCA 2006), Cambridge, MA 2006, pp. 93-102.
    [11] T. Karagiannis, A. Broido, M. Faloutsos, K. Claffy, Transport layer identification of P2P traffic, in: Proc. 4th ACM SIGCOMM conference on Internet measurement, Taormina, Sicily, Italy, 2004, pp. 121-134.
    [12] T. Karagiannis, K. Papagiannaki, M. Faloutsos, BLINC: multilevel traffic classification in the dark, in: Proc. ACM SIGCOMM conference on Applications, technologies, architectures, and protocols for computer communications, Philadelphia, Pennsylvania, PA, 2005, pp. 229-240.
    [13] Z. Li, R. Yuan, X. Guan, Accurate Classification of the Internet Traffic Based on the SVM Method, in: Proc. IEEE Int. Conference on Communications (ICC '07), Glasgow, Scotland, 2007, pp. 1373-1378.
    [14] A.W. Moore, D. Zuev, Internet traffic classification using Bayesian analysis techniques, in: Proc. ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Banff, Alberta, Canada, 2005, pp. 50-60.
    [15] S. Zander, T. Nguyen, G. Armitage, Automated traffic classification and application identification using machine learning, in: Proc. IEEE Conference on Local Computer Networks, The 30th Anniversary, Sydney, NSW, 2005, pp. 250-257.
    [16] L. Bernaille, R. Teixeira, Early recognition of encrypted applications, in: Proc. 8th international conference on Passive and active network measurement, Louvain-la-Neuve, Belgium, 2007, pp. 165-175.
    [17] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, K. Salamatian, Traffic classification on the fly, SIGCOMM Computer Communication Review, 36(2) (2006) 23-26.
    [18] L. Bernaille, R. Teixeira, K. Salamatian, Early application identification, in: Proc. 2nd ACM Int. Conference on emerging Networking EXperiments and Technologies (CoNEXT 2006), Lisboa, Portugal, 2006, pp. 1-12.
    [19] I. Dedinski, H.D. Meer, L. Han, L. Mathy, D.P. Pezaros, J.S. Sventek, Z. Xiaoying, Cross-Layer Peer-to-Peer Traffic Identification and Optimization Based on Active Networking, in: Int. Working Conference on Active and Programmable Networks (IWAN 2005), Sophia Antipolis, French Riviera, La Cote d'Azur, FRANCE, 2005.
    [20] J. Erman, M. Arlitt, A. Mahanti, Traffic classification using clustering algorithms, in: Proc. ACM SIGCOMM workshop on Mining network data, Pisa, Italy, 2006, pp. 281-286.
    [21] J. Erman, A. Mahanti, M. Arlitt, Internet Traffic Identification using Machine Learning, in: Proc. IEEE Global Telecommunications Conference (GLOBECOM '06), San Francisco, CA 2006, pp. 1-6.
    [22] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, C. Williamson, Offline/realtime traffic classification using semi-supervised learning, Performance Evaluation, 64(9-12) (2007) 1194-1213.
    [23] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, C. Williamson, Semi-supervised network traffic classification, in: Proc. the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, San Diego, California, CA, 2007, pp. 369-370.
    [24] J. Erman, A. Mahanti, M. Arlitt, C. Williamson, Identifying and discriminating between web and peer-to-peer traffic in the network core, in: Proc.16th international conference on World Wide Web (WWW), Banff, Alberta, Canada, 2007, pp. 883-892.
    [25] H. Liu, W. Feng, Y. Huang, X. Li, A Peer-To-Peer Traffic Identification Method Using Machine Learning, in: IEEE Int. Conference on Networking, Architecture, and Storage (NAS 2007), Guilin, China, 2007, pp. 155-160.
    [26] D. Nechay, Y. Pointurier, M. Coates, Controlling False Alarm/Discovery Rates in Online Internet Traffic Flow Classification, in: Proc. 28th IEEE Int. Conference on Computer Communications (INFOCOM 2009), 2009, pp. 684-692.
    [27] F. Raineri, G. Verticale, Early Internet Application Identification with Machine Learning Techniques, in: Proc. First International Conference on Evolving Internet (INTERNET '09), 2009, pp. 60-64.
    [28] G.G. Sena, P. Belzarena, Early traffic classification using support vector machines, in: Proc. 5th International Latin American Networking Conference, Pelotas, Brazil, 2009, pp. 60-66.
    [29] X. Tian, Q. Sun, X. Huang, Y. Ma, A Dynamic Online Traffic Classification Methodology Based on Data Stream Mining, in: WRI World Congress on Computer Science and Information Engineering, Los Angeles, CA 2009, pp. 298-302.
    [30] J. Zhang, Z. Qian, G. Shou, Y. Hu, Online automatic traffic classification architecture in access network, in: 9th International Conference on Electronic Measurement & Instruments (ICEMI '09.), Beijing, China 2009, pp. 3-24.
    [31] N.-F. Huang, G.-Y. Jai, H.-C. Chao, Early Identifying Application Traffic with Application Characteristics, in: Proc. IEEE Int. Conference on Communications (ICC '08), Beijing, China, 2008, pp. 5788-5792.
    [32] N.-F. Huang, G.-Y. Jai, C.-H.Chen, H.-C. Chao, On the Cloud-Based Network Traffic Classification and Applications Identification Services, in: IEEE Int. Conference on Selected Topics in Mobile and Wireless Network (iCOST 2012), Avignon, French, 2012, pp. 36-41.
    [33] C. Dewes, A. Wichmann, A. Feldmann, An analysis of Internet chat systems, in: Proc. 3rd ACM SIGCOMM conference on Internet measurement, Miami Beach, FL, 2003, pp. 51-64.
    [34] M. Tai, S. Ata, I. Oka, Environment-Independent Online Real-Time Traffic Identification, in: Proc. 4th International Conference on Networking and Services (ICNS 2008), 2008, pp. 230-235.
    [35] K.-T. Chen, J.-K. Lou, Rapid Detection of Constant-Packet-Rate Flows, in: Proc. 3rd International Conference on Availability, Reliability and Security (ARES 08) 2008, pp. 212-220.
    [36] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
    [37] E. Frank, I.H. Witten, Generating Accurate Rule Sets Without Global Optimization, in: Proc.15th International Conference on Machine Learning, Madison, Wisconsin, Morgan Kaufmann Publishers Inc., 1998, pp. 144-151.
    [38] G.F. Cooper, E. Herskovits, A Bayesian method for the induction of probabilistic networks from data, Machine Learning, 9(4) (October 1992) 309-347.
    [39] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques., in, Morgan Kaufmann, San Francisco, CA, USA, June 2005.
    [40] [Online]. L7 filter: Application Layer Packet Classifier for Linux, in: http://l7-filter.clearfoundation.com/.
    [41] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter, 11 (2009) 10-18.
    [42] M.A. Hall, Correlation-based feature selection for machine learning, in: Ph. D dissertation at Department of Computer Science, the University of Waikato, Hamilton, New Zealand, 1999.
    [43] G.H. John, P. Langley, Estimating Continuous Distributions in Bayesian Classifiers, in: The 11th Conference on Uncertainty in Artificial Intelligence, San Mateo, California, CA, 1995, pp. 338-345.
    [44] A.W. Moore, D. Zuev, M.L. Crogan, Discriminators for use in flow-based classification, in: Technical Report, RR-05-13, 2005.
    [45] Nen-Fu Huang, Gin-Yuan Jai, Han-Chieh Chao, Yih-Jou Tzang, and Hong-Yi Chang, Application Traffic Classification at the Early Stage by Characterizing Application Rounds, Information Sciences (SCI), Vol.232, May 2013, pp. 130-142.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE