研究生: |
陳玉玲 |
---|---|
論文名稱: |
架構於自我組織之聚類分析演算法 A Novel Clustering Algorithm Based on Self-Organization Procedure |
指導教授: | 洪文良 |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
|
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 24 |
中文關鍵詞: | 聚類分析演算法 、穩健 、自我更新過程 |
外文關鍵詞: | clustering algorithms, robust, self-updating process |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
中文摘要
本文提出一種方法選取陳與須(2007)之聚類分析演算法中的參數。在此演算法中,不須指定分群數目與起始值且資料是經由自我組織出一個理想的分群結果。根據數值模擬結果,本文所提之修正演算法具有兩種穩健性:(一)分群結果不受起始值影響;(二)分群結果不受離群值影響。綜合模擬資料與兩組實際資料顯示:本文所提出的演算法能提供一個好的聚類分析結果。
Abstract
This study presents a method to select the parameter in Chen and Shiu's (2007) clustering algorithm. The data points in the proposed clustering algorithm can self-organize local optimal cluster number without using cluster validity functions. The proposed clustering method is also robust to outliers based on the numerical experiments. Therefore, the proposed algorithms exhibits two robust clustering characteristics: (i) robust to the initialization (cluster number and initial guesses), (ii) robust to noise and outliers. Several numerical data and actual data sets are used in the proposed algorithm to show these good aspects.
References
[1] Bezdek, J.C., 1974. Cluster Validity with Fuzzy Sets. J. Cybernetics 3, 58-73.
[2] Bezdek, J.C., 1981. Pattern Reccognition with Fuzzy Objectiv Function Algorithm.Plenum Press.
[3] Chen, C. H., 2002.Generalized Association Plots: Information Visualization via Iteratively Generated Correlation Matrices. Statistica Sinica 12, 7-29.
[4] Chen, T.L., Shiu, S.Y., 2007. A new clustering algorithm based on self-updating process. Proceedings of the American Statistical Association.
[5] Chen, T.L., 2009. Image segmentation by SUP clustering algorithm. Section on Statistical Learning and Data Mining-JSM 2009.
[6] Davies, D.L., D.W. Bouldin, D.W., 1979. A Cluster Separation Measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1, 224-227.
[7] Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene Analysis, Wiley,New York.
[8] Forgy, E., 1965. Cluster analysis of multivariate data: efficiency vs. interpretability of classifications. Biometrics 21 768-780.
[9] Jain, A.K., Murty, Flynn, P.J., 1999. Data clustering: a review. ACM Computing Surveys 31, 264-323.
[10] Hathaway, R., Bezdek, J., Hu, Y., 2000. Generalized fuzzy c-means clustering strategies using Lp norm distances. IEEE Transactions on Fuzzy Systems 8, 576-582.
[11] Hansen, P., Mladenoviae, N., 2001. J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognition 34, 405-413.
[12] Hoeppner, F., Klawonn, F., Kruse, R., 1999. Fuzzy Cluster Analysis: Methods for Classification, Data Analysis, and Image Recognition, Wiley, New York.
[13] Hung, M., Yang, D., 2001. An efficient fuzzy c-means clustering algorithm. Proceedings of IEEE International Conference on Data Mining 225-232.
[14] Huang, Z., 1998. Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283-304.
[15] Kanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.,2000. An efficient K-means clustering algorithm: analysis and implementation.IEEE Transactions in Pattern Analysis and Machine Intelligence 24, 881-892.
[16] Kolen, J., Hutcheson, T., 2002. Reducing the time complexity of the fuzzy cmeans algorithm. IEEE Transactions on Fuzzy Systems 10, 263-267.
[17] McQueen, J., 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 291-297
[18] McQuitty, L.L., 1968. Multiple clusters, types, and dimensions from iterative intercolumnar correlational analysis. Multivariate Behavioral Research 3, 465-477.
[19] Milligan, G.W. and Cooper, M.C., 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159-179.
[20] Pal, N.R., Bezdek, J.C., 1995. On Cluster Validity for Fuzzy c-Means Model.IEEE Trans. Fuzzy Systems 1, 370-379.
[21] Selim, S.Z., Alsultan, K., 1991. A simulated annealing algorithm for the clustering
problem. Pattern Recognition 24, 1003-1008.
[22] Su, M., Chou, C., 2001. A modified version of the K-means algorithm with a distance based on cluster symmetry, IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 674-680.
[23] Tibshirani, R., Walther, G., Hastie, T., 2001. Estimating the number of clusters in a data set via the gap statistic. J. Roy. Statist. Soc. B 632, 411-423.
[24] Tseng, G.C., Wong, W.H., 2005. Tight clustering: a resampling-based approach for identifying stable and tight patterns in data. Biometrics 61, 10-16.
[25] Xie, X.L., Beni, G., 1991. A Validity Measure for Fuzzy Clustering. IEEE Trans.Pattern Analysis and Machine Intelligence 13, 841-847. 1
[26] Xu, R., Wunsch, D., 2005. Survey of clustering algorithms. IEEE Transactions on Neural Networks 16, 645-678.