Applying Simplified Swarm Optimization for Solving Clustering Problem

簡易檢索 / 詳目顯示

回結果列表

研究生：	賴智明 Lai, Chyh Ming
論文名稱：	Applying Simplified Swarm Optimization for Solving Clustering Problem 應用簡群最佳化演算法求解資料分群問題
指導教授：	葉維彰 Yeh, Wei Chang
口試委員:	孟昭宇楊國隆劉培林張桂琥
學位類別：	博士 Doctor
系所名稱：	工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management
論文出版年：	2016
畢業學年度：	104
語文別：	英文
論文頁數：	99
中文關鍵詞：	資料分群、簡群演算法、主成分分析、隨機取樣、K平均分群、K調和平均分群
外文關鍵詞：	data clustering, K-means, K-harmonic-means, simplified swarm optimization, principle component analysis, random sampling technique
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

資料分群廣泛應用於各種領域，是一種藉由某特定的衡量指標將資料分類成群的方法。K-means (KM) 與 K-harmonic-means (KHM) 因簡單有效率是常見且基礎的分群工具。然而，KM與KHM皆有其本質上的缺陷，導致效能或效率不佳。本文，應用簡群演算法（simplified swarm optimization）分別針對KM與KHM的缺陷提出兩種分群演算法，藉此改善KM與KHM在分群上的效能與效率。
其次，由於資訊科技發展、網路崛起，催生大數據時代的來臨，許多現有的分群演算法無法在合理時間內處理大型資料集，其中包含KM與KHM。為改善提出之分群演算法在處理大型資料集之效率，本研究結合主成分分析與新式隨機取樣技術，發展一個新的分群架構。此一架構之原理是藉由同時減少資料集之維度與資料點之取用率，大幅提昇分群演算法的效率。
為驗證本研究提出之演算法與架構，所有實驗均使用真實資料集，實驗結果均與過去文獻比較。

Data clustering is commonly employed in many disciplines. The aim of clustering is to partition a set of data into clusters, in which objects within the same cluster are similar and dissimilar to other objects that belong to different clusters. K-means (KM) and K-harmonic-means (KHM) are two common and fundamental clustering methods because of their simplicity and efficiency. However, both of them suffer from some problems. This study presents two novel algorithms based on simplified swarm optimization to deal with the drawbacks of KM and KHM, respectively.
In addition, with the advance of internet and information technologies, the data size is increasing explosively and many existing clustering approaches including KM and KHM are inefficiency for dealing with the large-size problem. For that, we propose a clustering framework by exploring the connection between principle component analysis and a novel random sampling technique into a procedure to increase the scalability of the proposed clustering algorithm.
To empirically evaluate the performance of the proposed methods, all experiments are examined using real-world datasets, and corresponding results are compared with recent works in the literature.

中文摘要    I
Abstract    II
Acknowledgement    III
Table of Contents    IV
List of Tables    VII
List of Figures    X
Chapter 1.    Introduction    1
1    Motivation    1
1.1    The drawbacks of K-means-type method    1
1.2    The drawbacks of K-harmonic-means-type method    2
1.3    Inefficiency on large-size problem    4
2    Objective and methodology    6
2.1    Objective for KM-type problem    6
2.2    Objective for KHM-type problem    7
2.3    Objective for clustering on large-size problem    7
3    Framework and organization    8
Chapter 2.    Methodology    10
1    Clustering problem    10
2    K-means clustering    10
3    K-harmonic-means clustering    11
4    Taguchi method    12
5    Simplified swarm optimization    14
6    Principal component analysis    15
Chapter 3.    Improved SSO for KM-type problem    18
1    Proposed methods    18
1.1    SSO clustering algorithm    18
1.2    Variable vibrating search (VVS)    19
1.3    Rapid centralized strategy (RCS)    21
1.4    The overall procedure of proposed method    23
2    Experiment results and discussion    25
2.1    Datasets    25
2.2    Parameter settings    26
2.3    Ex-1: Evaluation of RCS    27
2.4    Ex-2: Comparing ISSOKM with existing algorithms    30
2.5    Statistical analysis    35
3    Summary    37
Chapter 4.    Improved SSO for KHM-type problem    40
1    Proposed methods    40
1.1    Initial population    40
1.2    Minimum movement strategy (MMS)    40
1.3    The overall procedure of proposed method    41
2    Experiment results and discussion    44
2.1    Datasets    44
2.2    Ex-1: Parameter settings    45
2.3    Ex-2: Comparing ISSOKHM with existing algorithms    49
2.4    Statistical analysis    55
3    Summary    58
Chapter 5.    A framework for large-size problem    59
1    Empirical analysis of ISSOKM efficiency    59
2    Proposed framework    62
2.1    The first m PCs determination    62
2.2    Rolling random sampling (RRS)    62
2.3    The overall procedure of proposed framework    63
3    Experiment results and discussion    65
3.1    Datasets    66
3.2    Ex-1: parameter setting for Nsub and Nsam    67
3.3    Ex-2: comparing ISSOKMPR with existing algorithms    72
3.4    Time complexity of proposed algorithm    81
3.5    Statistical analysis    81
4    Summary    91
Chapter 6.    Conclusion and future work    93
Reference        95

                                

[1] A. A. Chaves and L. A. N. Lorena, "Clustering search algorithm for the capacitated centered clustering problem," Computers & Operations Research, vol. 37, pp. 552-558, 2010.
[2] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Transactions on Neural Networks, vol. 16, pp. 645-678, 2005.
[3] S. Z. Selim and M. A. Ismail, "K-means-type algorithms: a generalized convergence theorem and characterization of local optimality," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 81-87, 1984.
[4] K. Krishna and M. N. Murty, "Genetic K-means algorithm," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, pp. 433-439, 1999.
[5] U. Maulik and S. Bandyopadhyay, "Genetic algorithm-based clustering technique," Pattern recognition, vol. 33, pp. 1455-1465, 2000.
[6] Y. Liu, Z. Yi, H. Wu, M. Ye, and K. Chen, "A tabu search approach for the minimum sum-of-squares clustering problem," Information Sciences, vol. 178, pp. 2680-2704, 2008.
[7] P. Shelokar, V. K. Jayaraman, and B. D. Kulkarni, "An ant colony approach for clustering," Analytica Chimica Acta, vol. 509, pp. 187-195, 2004.
[8] L. Zhang and Q. Cao, "A novel ant-based clustering algorithm using the kernel method," Information Sciences, vol. 181, pp. 4658-4672, 2011.
[9] D. Karaboga and C. Ozturk, "A novel clustering approach: Artificial Bee Colony (ABC) algorithm," Applied soft computing, vol. 11, pp. 652-657, 2011.
[10] Y. T. Kao, E. Zahara, and I. W. Kao, "A hybridized approach to data clustering," Expert Systems with Applications, vol. 34, pp. 1754-1762, 2008.
[11] L. Y. Chuang, C. J. Hsiao, and C. H. Yang, "Chaotic particle swarm optimization for data clustering," Expert systems with Applications, vol. 38, pp. 14555-14563, 2011.
[12] C. Y. Tsai and I. W. Kao, "Particle swarm optimization with selective particle regeneration for data clustering," Expert Systems with Applications, vol. 38, pp. 6565-6576, 2011.
[13] S. Z. Selim and K. Alsultan, "A simulated annealing algorithm for the clustering problem," Pattern recognition, vol. 24, pp. 1003-1008, 1991.
[14] A. Hatamlou, S. Abdullah, and M. Hatamlou, "Data clustering using big bang-big crunch algorithm," in Innovative Computing Technology, ed: Springer, 2011, pp. 383-388.
[15] A. Hatamlou, S. Abdullah, and H. Nezamabadi-pour, "A combined approach for clustering based on K-means and gravitational search algorithms," Swarm and Evolutionary Computation, vol. 6, pp. 47-52, 2012.
[16] A. Hatamlou, "Black hole: A new heuristic optimization approach for data clustering," Information sciences, vol. 222, pp. 175-184, 2013.
[17] B. Zhang, M. Hsu, and U. Dayal, "K-harmonic means-a data clustering algorithm," Hewlett-Packard Labs Technical Report HPL-1999-124, 1999.
[18] G. Hamerly and C. Elkan, "Alternatives to the K-means algorithm that find better clusterings," in Proceedings of the eleventh international conference on Information and knowledge management, 2002, pp. 600-607.
[19] Z. Güngör and A. Ünler, "K-harmonic means data clustering with simulated annealing heuristic," Applied mathematics and computation, vol. 184, pp. 199-209, 2007.
[20] Z. Güngör and A. Ünler, "K-harmonic means data clustering with tabu-search method," Applied Mathematical Modelling, vol. 32, pp. 1115-1125, 2008.
[21] H. Jiang, S. Yi, J. Li, F. Yang, and X. Hu, "Ant clustering algorithm with K-harmonic means clustering," Expert Systems with Applications, vol. 37, pp. 8679-8684, 2010.
[22] F. Yang, T. Sun, and C. Zhang, "An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization," Expert Systems with Applications, vol. 36, pp. 9847-9852, 2009.
[23] M. Yin, Y. Hu, F. Yang, X. Li, and W. Gu, "A novel hybrid K-harmonic means and gravitational search algorithm approach for clustering," Expert Systems with Applications, vol. 38, pp. 9319-9324, 2011.
[24] C. Charu and K. Chandan, "Data Clustering: Algorithms and Applications," ed: Boca Raton, FL, USA: CRC Press, 2013.
[25] A. Banharnsakun, B. Sirinaovakul, and T. Achalakul, "The best-so-far ABC with multiple patrilines for clustering problems," Neurocomputing, vol. 116, pp. 355-366, 2013.
[26] K. Shim, "MapReduce algorithms for big data analysis," in Proceedings of the VLDB Endowment, vol. 5, pp. 2016-2017, 2012.
[27] X. Cui, P. Zhu, X. Yang, K. Li, and C. Ji, "Optimized big data K-means clustering using MapReduce," The Journal of Supercomputing, vol. 70, pp. 1249-1259, 2014.
[28] Y. Kim, K. Shim, M. S. Kim, and J. S. Lee, "DBCURE-MR: an efficient density-based clustering algorithm for large data using MapReduce," Information Systems, vol. 42, pp. 15-35, 2014.
[29] M. Chen, S. Mao, and Y. Liu, "Big data: A survey," Mobile Networks and Applications, vol. 19, pp. 171-209, 2014.
[30] L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis vol. 344: John Wiley & Sons, 2009.
[31] R. T. Ng and J. Han, "Clarans: A method for clustering objects for spatial data mining," IEEE Transactions on Knowledge and Data Engineering, vol. 14, pp. 1003-1016, 2002.
[32] S. Guha, R. Rastogi, and K. Shim, "CURE: an efficient clustering algorithm for large databases," in ACM SIGMOD Record, 1998, pp. 73-84.
[33] O. R. Zaïane, A. Foss, C. H. Lee, and W. Wang, "On data clustering analysis: Scalability, constraints, and validation," in Advances in knowledge discovery and data mining, ed: Springer, 2002, pp. 28-39.
[34] M. Dash, H. Liu, and J. Yao, "Dimensionality reduction of unsupervised data," in Proceedings of Ninth IEEE International Conference on Tools with Artificial Intelligence, 1997, pp. 532-539.
[35] M. Dash and P. W. Koot, "Feature selection for clustering," in Encyclopedia of database systems, ed: Springer, 2009, pp. 1119-1125.
[36] M. Devaney and A. Ram, "Efficient feature selection in conceptual clustering," in ICML, 1997, pp. 92-97.
[37] J. G. Dy and C. E. Brodley, "Feature subset selection and order identification for unsupervised learning," in ICML, 2000, pp. 247-254.
[38] A. L. Blum and R. L. Rivest, "Training a 3-node neural network is NP-complete," Neural Networks, vol. 5, pp. 117-127, 1992.
[39] I. Jolliffe, Principal component analysis: Wiley Online Library, 2002.
[40] K. Yata and M. Aoshima, "Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations," Journal of multivariate analysis, vol. 105, pp. 193-215, 2012.
[41] K. Y. Yeung and W. L. Ruzzo, "Principal component analysis for clustering gene expression data," Bioinformatics, vol. 17, pp. 763-774, 2001.
[42] M. Xu and P. Fränti, "A heuristic K-means clustering algorithm by kernel PCA," in Processing of International Conference on Image, 2004, pp. 3503-3506.
[43] Q. Xu, C. Ding, J. Liu, and B. Luo, "PCA-guided search for K-means," Pattern Recognition Letters, vol. 54, pp. 50-55, 2015.
[44] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, "Indexing by latent semantic analysis," JAsIs, vol. 41, pp. 391-407, 1990.
[45] V. Castelli, A. Thomasian, and C. S. Li, "CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces," IEEE Transactions on Knowledge and Data Engineering, vol. 15, pp. 671-685, 2003.
[46] V. C. Klema and A. J. Laub, "The singular value decomposition: Its computation and some applications," IEEE Transactions on Automatic Control, vol. 25, pp. 164-176, 1980.
[47] W. C. Yeh, "A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems," Expert Systems with Applications, vol. 36, pp. 9192-9200, 2009.
[48] W. C. Yeh, "An improved simplified swarm optimization," Knowledge-Based Systems, vol. 82, pp. 60-69, 2015.
[49] Y. Y. Chung and N. Wahid, "A hybrid network intrusion detection system using simplified swarm optimization (SSO)," Applied Soft Computing, vol. 12, pp. 3014-3022, 2012.
[50] R. Azizipanah-Abarghooee, "A new hybrid bacterial foraging and simplified swarm optimization algorithm for practical optimal dynamic load dispatch," International Journal of Electrical Power & Energy Systems, vol. 49, pp. 414-429, 2013.
[51] W. C. Yeh, Y. M. Yeh, P. C. Chang, Y. C. Ke, and V. Chung, "Forecasting wind power in the Mai Liao Wind Farm based on the multi-layer perception artificial neural network model with improved simplified swarm optimization," International Journal of Electrical Power & Energy Systems, vol. 55, pp. 741-748, 2014.
[52] A. B. Adib, "NP-hardness of the cluster minimization problem revisited," Journal of Physics A: Mathematical and General, vol. 38, p. 8487, 2005.
[53] S. Bandyopadhyay and U. Maulik, "An evolutionary technique based on K-means algorithm for optimal clustering in RN," Information Sciences, vol. 146, pp. 221-237, 2002.
[54] J. E. Jackson, A user's guide to principal components vol. 587: John Wiley & Sons, 2005.
[55] C. Bae, W. C. Yeh, N. Wahid, Y. Y. Chung, and Y. Liu, "A new simplified swarm optimization (SSO) using exchange local search scheme," International Journal of Innovative Computing, Information and Control, vol. 8, pp. 4391-4406, 2012.
[56] M. Clerc and J. Kennedy, "The particle swarm-explosion, stability, and convergence in a multidimensional complex space," IEEE Transactions on Evolutionary Computation, vol. 6, pp. 58-73, 2002.
[57] M. Ben Ghalia, "Particle swarm optimization with an improved exploration-exploitation balance," in Proceedings of the 51st Midwest Symposium on Circuits and Systems, 2008, pp. 759-762.
[58] W. C. Yeh, W. W. Chang, and Y. Y. Chung, "A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method," Expert Systems with Applications, vol. 36, pp. 8204-8211, 2009.
[59] M. Kudo and J. Sklansky, "Comparison of algorithms that select features for pattern classifiers," Pattern recognition, vol. 33, pp. 25-41, 2000.
[60] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, "GSA: a gravitational search algorithm," Information sciences, vol. 179, pp. 2232-2248, 2009.
[61] J. Derrac, S. García, D. Molina, and F. Herrera, "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms," Swarm and Evolutionary Computation, vol. 1, pp. 3-18, 2011.
[62] W. C. Yeh and C. M. Lai, "Accelerated Simplified Swarm Optimization with Exploitation Search Scheme for Data Clustering," PloS one, vol. 10, p. e0137246, 2015.
[63] W. C. Yeh, C. M. Lai, and K. H. Chang, "A novel hybrid clustering approach based on K-harmonic means using robust design," Neurocomputing, vol. 173, pp. 1720-1732, 2016.
[64] Y. Sebzalli and X. Wang, "Knowledge discovery from process operational data using PCA and fuzzy clustering," Engineering Applications of Artificial Intelligence, vol. 14, pp. 607-616, 2001.
[65] R. A. Johnson and D. W. Wichern, Applied multivariate statistical analysis vol. 4: Prentice hall Englewood Cliffs, NJ, 1992.
[66] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, et al., "A survey of clustering algorithms for big data: Taxonomy and empirical analysis," IEEE Transactions on Emerging Topics in Computing, vol. 2, pp. 267-279, 2014.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文