研究生: |
賴智明 Lai, Chyh Ming |
---|---|
論文名稱: |
Applying Simplified Swarm Optimization for Solving Clustering Problem 應用簡群最佳化演算法求解資料分群問題 |
指導教授: |
葉維彰
Yeh, Wei Chang |
口試委員: |
孟昭宇
楊國隆 劉培林 張桂琥 |
學位類別: |
博士 Doctor |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 99 |
中文關鍵詞: | 資料分群 、簡群演算法 、主成分分析 、隨機取樣 、K平均分群 、K調和平均分群 |
外文關鍵詞: | data clustering, K-means, K-harmonic-means, simplified swarm optimization, principle component analysis, random sampling technique |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料分群廣泛應用於各種領域,是一種藉由某特定的衡量指標將資料分類成群的方法。K-means (KM) 與 K-harmonic-means (KHM) 因簡單有效率是常見且基礎的分群工具。然而,KM與KHM皆有其本質上的缺陷,導致效能或效率不佳。本文,應用簡群演算法(simplified swarm optimization)分別針對KM與KHM的缺陷提出兩種分群演算法,藉此改善KM與KHM在分群上的效能與效率。
其次,由於資訊科技發展、網路崛起,催生大數據時代的來臨,許多現有的分群演算法無法在合理時間內處理大型資料集,其中包含KM與KHM。為改善提出之分群演算法在處理大型資料集之效率,本研究結合主成分分析與新式隨機取樣技術,發展一個新的分群架構。此一架構之原理是藉由同時減少資料集之維度與資料點之取用率,大幅提昇分群演算法的效率。
為驗證本研究提出之演算法與架構,所有實驗均使用真實資料集,實驗結果均與過去文獻比較。
Data clustering is commonly employed in many disciplines. The aim of clustering is to partition a set of data into clusters, in which objects within the same cluster are similar and dissimilar to other objects that belong to different clusters. K-means (KM) and K-harmonic-means (KHM) are two common and fundamental clustering methods because of their simplicity and efficiency. However, both of them suffer from some problems. This study presents two novel algorithms based on simplified swarm optimization to deal with the drawbacks of KM and KHM, respectively.
In addition, with the advance of internet and information technologies, the data size is increasing explosively and many existing clustering approaches including KM and KHM are inefficiency for dealing with the large-size problem. For that, we propose a clustering framework by exploring the connection between principle component analysis and a novel random sampling technique into a procedure to increase the scalability of the proposed clustering algorithm.
To empirically evaluate the performance of the proposed methods, all experiments are examined using real-world datasets, and corresponding results are compared with recent works in the literature.
[1] A. A. Chaves and L. A. N. Lorena, "Clustering search algorithm for the capacitated centered clustering problem," Computers & Operations Research, vol. 37, pp. 552-558, 2010.
[2] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Transactions on Neural Networks, vol. 16, pp. 645-678, 2005.
[3] S. Z. Selim and M. A. Ismail, "K-means-type algorithms: a generalized convergence theorem and characterization of local optimality," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 81-87, 1984.
[4] K. Krishna and M. N. Murty, "Genetic K-means algorithm," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, pp. 433-439, 1999.
[5] U. Maulik and S. Bandyopadhyay, "Genetic algorithm-based clustering technique," Pattern recognition, vol. 33, pp. 1455-1465, 2000.
[6] Y. Liu, Z. Yi, H. Wu, M. Ye, and K. Chen, "A tabu search approach for the minimum sum-of-squares clustering problem," Information Sciences, vol. 178, pp. 2680-2704, 2008.
[7] P. Shelokar, V. K. Jayaraman, and B. D. Kulkarni, "An ant colony approach for clustering," Analytica Chimica Acta, vol. 509, pp. 187-195, 2004.
[8] L. Zhang and Q. Cao, "A novel ant-based clustering algorithm using the kernel method," Information Sciences, vol. 181, pp. 4658-4672, 2011.
[9] D. Karaboga and C. Ozturk, "A novel clustering approach: Artificial Bee Colony (ABC) algorithm," Applied soft computing, vol. 11, pp. 652-657, 2011.
[10] Y. T. Kao, E. Zahara, and I. W. Kao, "A hybridized approach to data clustering," Expert Systems with Applications, vol. 34, pp. 1754-1762, 2008.
[11] L. Y. Chuang, C. J. Hsiao, and C. H. Yang, "Chaotic particle swarm optimization for data clustering," Expert systems with Applications, vol. 38, pp. 14555-14563, 2011.
[12] C. Y. Tsai and I. W. Kao, "Particle swarm optimization with selective particle regeneration for data clustering," Expert Systems with Applications, vol. 38, pp. 6565-6576, 2011.
[13] S. Z. Selim and K. Alsultan, "A simulated annealing algorithm for the clustering problem," Pattern recognition, vol. 24, pp. 1003-1008, 1991.
[14] A. Hatamlou, S. Abdullah, and M. Hatamlou, "Data clustering using big bang-big crunch algorithm," in Innovative Computing Technology, ed: Springer, 2011, pp. 383-388.
[15] A. Hatamlou, S. Abdullah, and H. Nezamabadi-pour, "A combined approach for clustering based on K-means and gravitational search algorithms," Swarm and Evolutionary Computation, vol. 6, pp. 47-52, 2012.
[16] A. Hatamlou, "Black hole: A new heuristic optimization approach for data clustering," Information sciences, vol. 222, pp. 175-184, 2013.
[17] B. Zhang, M. Hsu, and U. Dayal, "K-harmonic means-a data clustering algorithm," Hewlett-Packard Labs Technical Report HPL-1999-124, 1999.
[18] G. Hamerly and C. Elkan, "Alternatives to the K-means algorithm that find better clusterings," in Proceedings of the eleventh international conference on Information and knowledge management, 2002, pp. 600-607.
[19] Z. Güngör and A. Ünler, "K-harmonic means data clustering with simulated annealing heuristic," Applied mathematics and computation, vol. 184, pp. 199-209, 2007.
[20] Z. Güngör and A. Ünler, "K-harmonic means data clustering with tabu-search method," Applied Mathematical Modelling, vol. 32, pp. 1115-1125, 2008.
[21] H. Jiang, S. Yi, J. Li, F. Yang, and X. Hu, "Ant clustering algorithm with K-harmonic means clustering," Expert Systems with Applications, vol. 37, pp. 8679-8684, 2010.
[22] F. Yang, T. Sun, and C. Zhang, "An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization," Expert Systems with Applications, vol. 36, pp. 9847-9852, 2009.
[23] M. Yin, Y. Hu, F. Yang, X. Li, and W. Gu, "A novel hybrid K-harmonic means and gravitational search algorithm approach for clustering," Expert Systems with Applications, vol. 38, pp. 9319-9324, 2011.
[24] C. Charu and K. Chandan, "Data Clustering: Algorithms and Applications," ed: Boca Raton, FL, USA: CRC Press, 2013.
[25] A. Banharnsakun, B. Sirinaovakul, and T. Achalakul, "The best-so-far ABC with multiple patrilines for clustering problems," Neurocomputing, vol. 116, pp. 355-366, 2013.
[26] K. Shim, "MapReduce algorithms for big data analysis," in Proceedings of the VLDB Endowment, vol. 5, pp. 2016-2017, 2012.
[27] X. Cui, P. Zhu, X. Yang, K. Li, and C. Ji, "Optimized big data K-means clustering using MapReduce," The Journal of Supercomputing, vol. 70, pp. 1249-1259, 2014.
[28] Y. Kim, K. Shim, M. S. Kim, and J. S. Lee, "DBCURE-MR: an efficient density-based clustering algorithm for large data using MapReduce," Information Systems, vol. 42, pp. 15-35, 2014.
[29] M. Chen, S. Mao, and Y. Liu, "Big data: A survey," Mobile Networks and Applications, vol. 19, pp. 171-209, 2014.
[30] L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis vol. 344: John Wiley & Sons, 2009.
[31] R. T. Ng and J. Han, "Clarans: A method for clustering objects for spatial data mining," IEEE Transactions on Knowledge and Data Engineering, vol. 14, pp. 1003-1016, 2002.
[32] S. Guha, R. Rastogi, and K. Shim, "CURE: an efficient clustering algorithm for large databases," in ACM SIGMOD Record, 1998, pp. 73-84.
[33] O. R. Zaïane, A. Foss, C. H. Lee, and W. Wang, "On data clustering analysis: Scalability, constraints, and validation," in Advances in knowledge discovery and data mining, ed: Springer, 2002, pp. 28-39.
[34] M. Dash, H. Liu, and J. Yao, "Dimensionality reduction of unsupervised data," in Proceedings of Ninth IEEE International Conference on Tools with Artificial Intelligence, 1997, pp. 532-539.
[35] M. Dash and P. W. Koot, "Feature selection for clustering," in Encyclopedia of database systems, ed: Springer, 2009, pp. 1119-1125.
[36] M. Devaney and A. Ram, "Efficient feature selection in conceptual clustering," in ICML, 1997, pp. 92-97.
[37] J. G. Dy and C. E. Brodley, "Feature subset selection and order identification for unsupervised learning," in ICML, 2000, pp. 247-254.
[38] A. L. Blum and R. L. Rivest, "Training a 3-node neural network is NP-complete," Neural Networks, vol. 5, pp. 117-127, 1992.
[39] I. Jolliffe, Principal component analysis: Wiley Online Library, 2002.
[40] K. Yata and M. Aoshima, "Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations," Journal of multivariate analysis, vol. 105, pp. 193-215, 2012.
[41] K. Y. Yeung and W. L. Ruzzo, "Principal component analysis for clustering gene expression data," Bioinformatics, vol. 17, pp. 763-774, 2001.
[42] M. Xu and P. Fränti, "A heuristic K-means clustering algorithm by kernel PCA," in Processing of International Conference on Image, 2004, pp. 3503-3506.
[43] Q. Xu, C. Ding, J. Liu, and B. Luo, "PCA-guided search for K-means," Pattern Recognition Letters, vol. 54, pp. 50-55, 2015.
[44] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, "Indexing by latent semantic analysis," JAsIs, vol. 41, pp. 391-407, 1990.
[45] V. Castelli, A. Thomasian, and C. S. Li, "CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces," IEEE Transactions on Knowledge and Data Engineering, vol. 15, pp. 671-685, 2003.
[46] V. C. Klema and A. J. Laub, "The singular value decomposition: Its computation and some applications," IEEE Transactions on Automatic Control, vol. 25, pp. 164-176, 1980.
[47] W. C. Yeh, "A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems," Expert Systems with Applications, vol. 36, pp. 9192-9200, 2009.
[48] W. C. Yeh, "An improved simplified swarm optimization," Knowledge-Based Systems, vol. 82, pp. 60-69, 2015.
[49] Y. Y. Chung and N. Wahid, "A hybrid network intrusion detection system using simplified swarm optimization (SSO)," Applied Soft Computing, vol. 12, pp. 3014-3022, 2012.
[50] R. Azizipanah-Abarghooee, "A new hybrid bacterial foraging and simplified swarm optimization algorithm for practical optimal dynamic load dispatch," International Journal of Electrical Power & Energy Systems, vol. 49, pp. 414-429, 2013.
[51] W. C. Yeh, Y. M. Yeh, P. C. Chang, Y. C. Ke, and V. Chung, "Forecasting wind power in the Mai Liao Wind Farm based on the multi-layer perception artificial neural network model with improved simplified swarm optimization," International Journal of Electrical Power & Energy Systems, vol. 55, pp. 741-748, 2014.
[52] A. B. Adib, "NP-hardness of the cluster minimization problem revisited," Journal of Physics A: Mathematical and General, vol. 38, p. 8487, 2005.
[53] S. Bandyopadhyay and U. Maulik, "An evolutionary technique based on K-means algorithm for optimal clustering in RN," Information Sciences, vol. 146, pp. 221-237, 2002.
[54] J. E. Jackson, A user's guide to principal components vol. 587: John Wiley & Sons, 2005.
[55] C. Bae, W. C. Yeh, N. Wahid, Y. Y. Chung, and Y. Liu, "A new simplified swarm optimization (SSO) using exchange local search scheme," International Journal of Innovative Computing, Information and Control, vol. 8, pp. 4391-4406, 2012.
[56] M. Clerc and J. Kennedy, "The particle swarm-explosion, stability, and convergence in a multidimensional complex space," IEEE Transactions on Evolutionary Computation, vol. 6, pp. 58-73, 2002.
[57] M. Ben Ghalia, "Particle swarm optimization with an improved exploration-exploitation balance," in Proceedings of the 51st Midwest Symposium on Circuits and Systems, 2008, pp. 759-762.
[58] W. C. Yeh, W. W. Chang, and Y. Y. Chung, "A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method," Expert Systems with Applications, vol. 36, pp. 8204-8211, 2009.
[59] M. Kudo and J. Sklansky, "Comparison of algorithms that select features for pattern classifiers," Pattern recognition, vol. 33, pp. 25-41, 2000.
[60] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, "GSA: a gravitational search algorithm," Information sciences, vol. 179, pp. 2232-2248, 2009.
[61] J. Derrac, S. García, D. Molina, and F. Herrera, "A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms," Swarm and Evolutionary Computation, vol. 1, pp. 3-18, 2011.
[62] W. C. Yeh and C. M. Lai, "Accelerated Simplified Swarm Optimization with Exploitation Search Scheme for Data Clustering," PloS one, vol. 10, p. e0137246, 2015.
[63] W. C. Yeh, C. M. Lai, and K. H. Chang, "A novel hybrid clustering approach based on K-harmonic means using robust design," Neurocomputing, vol. 173, pp. 1720-1732, 2016.
[64] Y. Sebzalli and X. Wang, "Knowledge discovery from process operational data using PCA and fuzzy clustering," Engineering Applications of Artificial Intelligence, vol. 14, pp. 607-616, 2001.
[65] R. A. Johnson and D. W. Wichern, Applied multivariate statistical analysis vol. 4: Prentice hall Englewood Cliffs, NJ, 1992.
[66] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, et al., "A survey of clustering algorithms for big data: Taxonomy and empirical analysis," IEEE Transactions on Emerging Topics in Computing, vol. 2, pp. 267-279, 2014.