簡易檢索 / 詳目顯示

研究生: 翟敬源
Jai, Gin-Yuan
論文名稱: 基於網路應用程式通訊回合特徵之網路應用程式流量早期分類法
Application Traffic Classification in Early Stage by Characterizing Application Rounds
指導教授: 黃能富
Huang, Nen-Fu
口試委員: 張瑞雄
Chang, Ruay-Shiung
趙涵捷
Chao, Han-Chieh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 66
中文關鍵詞: 流量辨識流量分類網路應用軟體機器學習演算法
外文關鍵詞: Traffic identification, Traffic classification, Network application, Machine learning algorithm
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了在應用程式的通訊開始時進行網路應用流量的辨識,本論文提出以機器學習為基礎的高準確率演算法「應用層回合制」方法。對於每條TCP/UDP的通訊流,本論文定義了可提供高準確率與適用於流量即時辨識的統計屬性。本論文以應用層的角度,針對每條通訊流於應用層通訊開始的協商過程,設計了可取得協商回合特徵的統計屬性,以分類流量。本論文並使用了數種不同的機器學習演算法測試回合制統計屬性的分類能力。本論文也使用了相同的流量取樣資料,實驗並比較了所提出的方法與其他即時流量分類的方法的準確率。本方法使用C4.5修剪決策樹演算法,對側錄自兩個校園的流量可達到最多99.21% 的準確率,以及平均92.88%的準確率。相較於其他即時流量分類的方法,本論文提出的方法對於以原始比例取樣的流量資料較其他方法高7% 到8% 的準確率,對於以固定數量取樣的流量資料則高出15% 到 30% 的準確率。本論文提出的方法可在短時間內完成流量測試,可用於線上流量辨識。本論文提出方法的優點是可適用於分類加密流量、具高準確率,並可用於即時流量分類。


    This thesis proposes a machine learning-based high-accuracy algorithm called “APPlication Round method (APPR)” to identify network application traffic at the early stage. For each TCP/UDP flow, discriminators available at the early stage are determined to support high-accuracy and real-time traffic classification. Such discriminators characterize the possible negotiation behaviors of each flow from an application layer perspective. The ability of flow attributes is tested using several machine learning algorithms. By contrast, this study also provides a comparison on the accuracy of the proposed method with other related studies that have addressed real-time traffic classification problems based on identical sample traffic sets. By applying a pruned C4.5 tree machine learning algorithm to real traffic trace, the proposed method offers a maximal 99.21%, with an average overall accuracy of 92.88% for all traffic samples. Compared to other machine learning algorithms, the proposed algorithm not only provides a minimal accuracy improvement of approximately 7% to 8% for normal ratio data sets and more than 15% to 30% improvement of overall accuracy for fixed ratio data samples, but is also suitable for on-line identification because of the low-flow test time. Furthermore, the proposed method is also appropriate for identifying encrypted protocols and has the advantages of high accuracy and support for real-time classification.

    Abstract....................................................ii Acknowledgements............................................iv Table of Contents........................................... v List of Figures.............................................vi List of Tables.............................................vii Chapter 1 Introduction.......................................1 1.1 Motivation...............................................1 1.2 Organization.............................................6 Chapter 2 Literature Review..................................7 Chapter 3 Characterizing Application Negotiation............13 3.1 Flow Attributes in Early Stage..........................13 3.2 Reconstruction of Talk Blocks...........................19 3.3 The Architecture of APPR-based Traffic Classification...20 3.4 The Machine Learning Algorithms.........................23 Chapter 4 Experiment Results and Analyses...................24 4.1 Traffic Traces for Experiment...........................24 4.2 Training Label of Each Flow in Traffic Traces...........26 4.3 Data Sampling and Evaluation Metrics....................29 4.4 Accuracy, Full Model Build Time, and Test Time Evaluation with All Attributes.......34 4.5 Accuracy Comparison among Supervised, Unsupervised, and Semi-Supervised Methods.....41 Chapter 5 Conclusion........................................54 Bibliography................................................55

    [1] [Online]. L7 filter: Application Layer Packet Classifier for Linux, in: http://l7-filter.clearfoundation.com/.
    [2] L. Bernaille, R. Teixeira, Early recognition of encrypted applications, in: Proc. 8th international conference on Passive and active network measurement, Louvain-la-Neuve, Belgium, 2007, pp. 165-175.
    [3] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, K. Salamatian, Traffic classification on the fly, SIGCOMM Computer Communication Review, 36(2) (2006) 23-26.
    [4] L. Bernaille, R. Teixeira, K. Salamatian, Early application identification, in: Proc. 2nd ACM Int. Conference on emerging Networking EXperiments and Technologies (CoNEXT 2006), Lisboa, Portugal, 2006, pp. 1-12.
    [5] K.-T. Chen, J.-K. Lou, Rapid Detection of Constant-Packet-Rate Flows, in: Proc. 3rd International Conference on Availability, Reliability and Security (ARES 08) 2008, pp. 212-220.
    [6] F. Constantinou, P. Mavrommatis, Identifying Known and Unknown Peer-to-Peer Traffic, in: Proc. 5th IEEE Int. Symposium on Network Computing and Applications (NCA 2006), Cambridge, MA 2006, pp. 93-102.
    [7] G.F. Cooper, E. Herskovits, A Bayesian method for the induction of probabilistic networks from data, Machine Learning, 9(4) (October 1992) 309-347.
    [8] I. Dedinski, H.D. Meer, L. Han, L. Mathy, D.P. Pezaros, J.S. Sventek, Z. Xiaoying, Cross-Layer Peer-to-Peer Traffic Identification and Optimization Based on Active Networking, in: Int. Working Conference on Active and Programmable Networks (IWAN 2005), Sophia Antipolis, French Riviera, La Cote d'Azur, FRANCE, 2005.
    [9] C. Dewes, A. Wichmann, A. Feldmann, An analysis of Internet chat systems, in: Proc. 3rd ACM SIGCOMM conference on Internet measurement, Miami Beach, FL, 2003, pp. 51-64.
    [10] D. Erman, D. Ilie, A. Popescu, BitTorrent Traffic Characteristics, in: Proc. Int. Multi-Conference on Computing in the Global Information Technology 2006 (ICCGI '06), IEEE Computer Society, Bucharest 2006, pp. 42.
    [11] J. Erman, M. Arlitt, A. Mahanti, Traffic classification using clustering algorithms, in: Proc. ACM SIGCOMM workshop on Mining network data, Pisa, Italy, 2006, pp. 281-286.
    [12] J. Erman, A. Mahanti, M. Arlitt, Internet Traffic Identification using Machine Learning, in: Proc. IEEE Global Telecommunications Conference (GLOBECOM '06), San Francisco, CA 2006, pp. 1-6.
    [13] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, C. Williamson, Offline/realtime traffic classification using semi-supervised learning, Performance Evaluation, 64(9-12) (2007) 1194-1213.
    [14] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, C. Williamson, Semi-supervised network traffic classification, in: Proc. the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, San Diego, California, CA, 2007, pp. 369-370.
    [15] J. Erman, A. Mahanti, M. Arlitt, C. Williamson, Identifying and discriminating between web and peer-to-peer traffic in the network core, in: Proc.16th international conference on World Wide Web (WWW), Banff, Alberta, Canada, 2007, pp. 883-892.
    [16] E. Frank, I.H. Witten, Generating Accurate Rule Sets Without Global Optimization, in: Proc.15th International Conference on Machine Learning, Madison, Wisconsin, Morgan Kaufmann Publishers Inc., 1998, pp. 144-151.
    [17] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter, 11 (2009) 10-18.
    [18] M.A. Hall, Correlation-based feature selection for machine learning, in: Ph. D dissertation at Department of Computer Science, the University of Waikato, Hamilton, New Zealand, 1999.
    [19] N.-F. Huang, G.-Y. Jai, H.-C. Chao, Early Identifying Application Traffic with Application Characteristics, in: Proc. IEEE Int. Conference on Communications (ICC '08), Beijing, China, 2008, pp. 5788-5792.
    [20] M. Izal, G. Urvoy-Keller, E.W. Biersack, P. Felber, A.A. Hamra, L. Garcés-Erice, Dissecting BitTorrent: Five Months in a Torrent’s Lifetime, in: C. Barakat, I. Pratt (Eds.), Proc. 5th Int. Passive and Active Measurement Workshop (PAM 2004) , Antibes Juan-les-Pins, France, 2004, pp. 1-11.
    [21] G.H. John, P. Langley, Estimating Continuous Distributions in Bayesian Classifiers, in: The 11th Conference on Uncertainty in Artificial Intelligence, San Mateo, California, CA, 1995, pp. 338-345.
    [22] T. Karagiannis, A. Broido, N. Brownlee, K. Claffy, M. Faloutsos, File-sharing in the Internet: A characterization of P2P traffic in the backbone, in: Technical Report, 2003.
    [23] T. Karagiannis, A. Broido, N. Brownlee, K. Claffy, M. Faloutsos, Is P2P dying or just hiding?, in: Proc. 47th annual IEEE Global Telecommunications Conference (GLOBECOM 2004), Dallas, Texas, USA, 2004, pp. 1532-1538.
    [24] T. Karagiannis, A. Broido, M. Faloutsos, K. Claffy, Transport layer identification of P2P traffic, in: Proc. 4th ACM SIGCOMM conference on Internet measurement, Taormina, Sicily, Italy, 2004, pp. 121-134.
    [25] T. Karagiannis, K. Papagiannaki, M. Faloutsos, BLINC: multilevel traffic classification in the dark, in: Proc. ACM SIGCOMM conference on Applications, technologies, architectures, and protocols for computer communications, Philadelphia, Pennsylvania, PA, 2005, pp. 229-240.
    [26] Z. Li, R. Yuan, X. Guan, Accurate Classification of the Internet Traffic Based on the SVM Method, in: Proc. IEEE Int. Conference on Communications (ICC '07), Glasgow, Scotland, 2007, pp. 1373-1378.
    [27] H. Liu, W. Feng, Y. Huang, X. Li, A Peer-To-Peer Traffic Identification Method Using Machine Learning, in: IEEE Int. Conference on Networking, Architecture, and Storage (NAS 2007), Guilin, China, 2007, pp. 155-160.
    [28] A.W. Moore, K. Papagiannaki, Toward the Accurate Identification of Network Applications, in: C. Dovrolis (Ed.) Proc. 6th Int. Passive and Active Network Measurement Workshop, Springer, Boston, MA, USA, 2005, pp. 41-54.
    [29] A.W. Moore, D. Zuev, Internet traffic classification using Bayesian analysis techniques, in: Proc. ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Banff, Alberta, Canada, 2005, pp. 50-60.
    [30] A.W. Moore, D. Zuev, M.L. Crogan, Discriminators for use in flow-based classification, in: Technical Report, RR-05-13, 2005.
    [31] D. Nechay, Y. Pointurier, M. Coates, Controlling False Alarm/Discovery Rates in Online Internet Traffic Flow Classification, in: Proc. 28th IEEE Int. Conference on Computer Communications (INFOCOM 2009), 2009, pp. 684-692.
    [32] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
    [33] F. Raineri, G. Verticale, Early Internet Application Identification with Machine Learning Techniques, in: Proc. First International Conference on Evolving Internet (INTERNET '09), 2009, pp. 60-64.
    [34] S. Sen, O. Spatscheck, D. Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, in: Proc. the 13th int. conference on World Wide Web (WWW 2004), Manhattan, New York, NY, USA, 2004, pp. 512-521.
    [35] S. Sen, J. Wang, Analyzing peer-to-peer traffic across large networks, IEEE/ACM Transactions on Networking, 12(2) (2004) 219-232.
    [36] G.G. Sena, P. Belzarena, Early traffic classification using support vector machines, in: Proc. 5th International Latin American Networking Conference, Pelotas, Brazil, 2009, pp. 60-66.
    [37] K. Suh, D.R. Figueiredo, J. Kurose, D. Towsley, Characterizing and Detecting Skype-Relayed Traffic, in: Proc. 25th IEEE Int. Conference on Computer Communications (INFOCOM 2006), Barcelona, Spain 2006, pp. 1-12.
    [38] M. Tai, S. Ata, I. Oka, Environment-Independent Online Real-Time Traffic Identification, in: Proc. 4th International Conference on Networking and Services (ICNS 2008), 2008, pp. 230-235.
    [39] X. Tian, Q. Sun, X. Huang, Y. Ma, A Dynamic Online Traffic Classification Methodology Based on Data Stream Mining, in: WRI World Congress on Computer Science and Information Engineering, Los Angeles, CA 2009, pp. 298-302.
    [40] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques., in, Morgan Kaufmann, San Francisco, CA, USA, June 2005.
    [41] Y. Yu, D. Liu, J. Li, C. Shen, Traffic Identification and Overlay Measurement of Skype, in: 2006 Int. Conference on Computational Intelligence and Security (CIS '06), Guangzhou, China, 2006, pp. 1043-1048.
    [42] S. Zander, T. Nguyen, G. Armitage, Automated traffic classification and application identification using machine learning, in: Proc. IEEE Conference on Local Computer Networks, The 30th Anniversary, Sydney, NSW, 2005, pp. 250-257.
    [43] J. Zhang, Z. Qian, G. Shou, Y. Hu, Online automatic traffic classification architecture in access network, in: 9th International Conference on Electronic Measurement & Instruments (ICEMI '09.), Beijing, China 2009, pp. 3-24.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE