研究生: |
許閔翔 Hsu, Min-Hsiang |
---|---|
論文名稱: |
邊際修正非參數核密度判別分析 之惡意網路偵測的應用 Marginally Adjusted Kernel Discriminant Analysis with an Application in Detecting Malicious Network Activity |
指導教授: |
黃禮珊
HUANG, LI-SHAN |
口試委員: |
李育杰
Lee, Yuh-Jye 謝叔蓉 Shieh, Shwu-Rong 楊承道 Yang, Cheng-Tao |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 84 |
中文關鍵詞: | 非參數核密度估計 、半參數 、判別錯誤成本 、最大概似函數 |
外文關鍵詞: | Semiparametric, Misclassication |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來判別分析在許多領域都有廣泛的應用,資料形式通常為多維度解釋變數與二元目標變數。當分析者遇到多維度資料時,時常會對多維度資料的分佈不了解,但可能對於單一維度資料的分佈有一些概念,例如:常態分佈。本文嘗試使用非參數核密度判別分析(KDA, Kernel Discriminant Analysis)加上邊際密度(Marginal)的分佈假設,提出對原始資料做修正的方法,透過去噪(Denoise)的概念改善判別結果。本文將新方法稱做「邊際修正非參數核密度判別分析(MAKDA, Marginally Adjusted Kernel Discriminant Analysis)」,因為修正方法的不同,而延伸出五種MAKDA。此外為了改善邊界值的判別效果,提出新的判別規則稱為「常態權重判別規則」,希望改善邊界值的判別效果。隨後使用不同型態模擬資料探討所提出方法的優劣,並與傳統KDA做比較,模擬資料型態包括二元常態分佈、二元不獨立t分佈、二元常態分佈加一維類別型變數以及二元常態分佈加三維類別型變數,結果顯示在不同型態的模擬資料中,有三種MAKDA表現比傳統KDA有一些優勢。最後將提出的方法應用在資安相關的實際資料,希望檢測出由殭屍IP發出的網路流量,結果顯示MAKDA有不錯的表現。
In recent years, discriminant analysis has been widely used in many fields. Usually, data involve multi-dimensional explanatory variables and one binary target response variable. When analysts encounter multi-dimensional data, they often have little information about the multivariate distribution, but they may have some knowledge about the form of the marginal densities, such as normality. In this thesis, we explore adopting kernel discriminant analysis (KDA) with marginal normality constraints, and propose a data-tilting approach for discriminant analysis. This new approach is called “Marginally Adjusted Kernel Discriminant Analysis (MAKDA).” We consider five MAKDA methods with different data tilting. In addition, in order to improve the discriminant results for boundary data, we propose a new discriminant rule call “normal weight prediction rule.” Extensive simulation studies are conducted to compare the proposed methods with conventional KDA in the following scenarios: bivariate normal distribution, bivariate dependent t-distribution, bivariate normal distribution plus one-dimensional discrete covariate, bivariate normal distribution plus three discrete covariates. The results show that three kinds of MAKDA have better performance than conventional KDA in different situations. Finally, the proposed methods are applied to analyze a real dataset in detecting malicious network activity.
[1] T. Duong. ks: Kernel smoothing. r package version 1.8. 2. See http://CRAN. R-project. org/package= ks, 2011.
[2] E. Fix and J. L. Hodges Jr. Discriminatory analysis-nonparametric discrimination: consistency properties. Technical report, California Univ Berkeley,
1951.
[3] P. Hall, J. S. Marron, and B. U. Park. Smoothed cross-validation. Probability theory and related elds, 92(1):1{20, 1992.
[4] R. A. Johnson and D. Wichern. Multivariate analysis. Wiley Online Library, 2002.
[5] P. Kalaivani and M. Vijaya. Mining based detection of botnet traffic in network flow. International Journal of computer Science and information Technology & Security, 2016.
[6] D. W. Scott. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, 2015.
[7] S. J. Sheather and M. C. Jones. A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society.
Series B (Methodological), pages 683{690, 1991.
[8] B. W. Silverman. Density estimation for statistics and data analysis. Routledge, 1986.
[9] C. Spiegelman and E. S. Park. Nearly nonparametric multivariate density estimates that incorporate marginal parametric density information. The American Statistician, 57(3):183{188, 2003.