研究生: |
陳麗妃 Li-Fei Chen |
---|---|
論文名稱: |
整合約略集合論、支援向量機與決策樹之資料挖礦架構及其個案研究 A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies |
指導教授: |
簡禎富
Chen-Fu Chien |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 英文 |
論文頁數: | 136 |
中文關鍵詞: | 資料挖礦 、分類 、約略集合論 、支援向量機 、決策樹 、人力資源管理 |
外文關鍵詞: | data mining, classification, rough set theory, support vector machine, decision tree, human resource management |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
對於分類預測問題,除了要求其精確度之外,是否能提供簡明易懂的規則來探求相關的管理意義,通常也是決策者的重要考量之一。人力資源管理的問題即為其中一例。在資料挖礦(data mining)的分類預測問題中,約略集合論 (RST)、支援向量機 (SVM) 與決策樹 (DT) 等技術受到許多學者的青睞。約略集合論與DT可以產生規則,而SVM則無此能力。另一方面,RST的屬性選取能力令人矚目,而SVM與DT則以其預測能力受到重視。本研究結合此三項技術的優點,發展一四階段的資料挖礦整合架構以改善預測的準確度並提升規則產生的品質。第一階段是利用RST進行重要屬性的選取,在不損失分類資訊的情形下,將多餘的屬性予以剔除;第二階段則是利用SVM,以交互驗證 (cross validation) 的方式,減少樣本的雜質;觀察新樣本的分佈,若有類別不均 (class imbalance) 的現象,則需進行第三階段的類別調整程序,利用RST所產生的規則來調整分佈不均的類別。經由前三階段,能夠得到較具代表性的屬性與樣本,同時類別分佈也較為均勻。最後,將這些樣本透過DT構建預測模式並產生相關規則。另一方面,對於人力資源管理的預測所要處理的資料通常具有高維度、較為複雜且不確定性也高的特性,使得傳統的統計預測方法陷入低檢定力的窘境。本研究利用所提出的整合性方法,分別針對兩家位於新竹的高科技公司的直接人員與間接人員的甄選資料進行實證分析。結果顯示本研究所提出的方法能夠有效改善預測的準確度並提升規則產生的品質,同時其績效較傳統的RST、SVM與DT為佳。
Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT are significantly on their predictive power. This research aims to integrate the advantages of SVM, RST and DT approaches to develop a hybrid framework to enhance the quality of class prediction as well as rule generation. In addition to build up a classification model with acceptable accuracy, the capability to explain and explore how the decision made with simple, understandable and useful rules is a critical issue for human resource management. DT and RST can generate such rules, however, SVM can not offer such function. The major concept consists of four main stages. The first stage is to select most important attributes. RST is applied to eliminate the redundant and irrelative attributes without loss of any information about classification. The second stage is to reduce noisy objects, which can be accomplished by cross validation through using SVM. If the new data set would induce data imbalance problem, the rules generated by RST would be used to adjust the class distribution (stage 3). Through the stages described above, a data set with fewer dimensions and higher degree of purity could be screened out with similar class distribution and is used to generate rules by using DT which complete the last stage. In addition, the decisions concern with personnel selection prediction always involve handling data with highly dimensions, uncertainty and complexity, which cause traditional statistical methods suffering from low power of test. For validation, real cases of personnel selection of two high-tech companies containing direct and indirect labors in Hsinchu, Taiwan are studied using the proposed hybrid data mining framework. Implementation results show that the proposed approach is effective and has a better performance than that of traditional SVM, RST and DT.
Ahn, B. S., Cho, S. S. & Kim, C. Y. (2000). The integrated methodology of rough set theory and artificial neural network for business failure prediction, Expert Systems with Applications, 18 (2), 65-74.
Barbagallo, S., Consoli, S., Pappalardo, N., Greco, S. & Zimbone, S. M. (2006). Discovering reservoir operating rules by a Rough Set approach, Water Resources Management, 20(1), 19-36.
Batista, G., Prati, R. C. & Monard, M. C. (2004). A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM Special Interest Group on Knowledge Discovering and Data Mining (SIGKDD) Explorations, 6(1), 20-29.
Beckers, A. M. & Bsat, M. Z. (2002). A DSS classification model for research in Human Resource Information Systems, Information Systems Management, 19(3), 41-50.
Berry, M. J. & Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons.
Borman, W. C., Hanson, M. A. & Hedge, J. W. (1997). Personnel selection, Annual Review of Psychology, 48, 299-337.
Braha, D. & Shmilovici, A. (2002). Data mining for improving a cleaning process in the semiconductor industry, IEEE Transactions on Semiconductor Manufacturing, 15(1), 91-101.
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, P. J. (1984). Classification and regression trees, CA: Wadsworth International Group.
Byun, D. H. and Suh, E. H. (1994). Human resource management expert-systems technology, Expert Systems, 11(2), 109-119.
Cao, L. J. & Tay, F. E. H. (2003). Support vector machine with adaptive parameters in financial time series forecasting, IEEE Transactions on Neural Network, 14 (6), 1506-1518.
Cortes, C. & Vapnik, V. (1995). Support-vector networks, Machine Learning, 20 (3): 273-297.
Chen, M. S., Han, J. & Yu, P. S. (1996). Data mining: an overview from a database perspective, IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883.
Chien, C. F., Hsiao, A. & Wang, I. (2004). Constructing semiconductor manufacturing performance indexes and applying data mining for manufacturing data analysis, Journal of the Chinese Institute of Industrial Engineers, 21(4), 313-327.
Chien, C. F., Wang, I. & Chen, L. F. (2005). Using data mining to improve the quality of human resource management of operators in semiconductor manufactures, Journal of Quality, 12(1), 9-28.
Chien, C. F., Wang, W. C. & Cheng, J. C. (2007). Data mining for yield enhancement in semiconductor manufacturing and an empirical study, Expert Systems with Applications, 33(1).
Cho, V. & Ngai, E. (2003). Data mining for selection of insurance sales agents, Expert Systems, 20(3), 123-132.
Cristianini, N. & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines, Cambridge University Press.
de Souza, J. T., Matwin, S. & Japkowicz, N. (2006). Parallelizing feature selection, Algorithmica, 45(3), 433-456.
Dimitras, A. I., Slowinski, R., Susmaga, R. & Zopounidis, C. (1999). Business failure prediction using rough sets, European Journal of Operational Research, 114 (2), 263-280.
Dunham, M. H. (2003). Data mining: Introductory and advanced topics, Prentice Hall.
Fayyad, U., Piatesky-Shapiro, G., & Smyth, P. (1996). The KDD process for extracting useful knowledge from volumes of data, Communications of the ACM, 39, 27-34.
Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M. & Haussler, D. (2000). Bioinformatics, 16 (10), 906-914.
Ge, M., Du, R., Zhang, G. C. & Xu, Y. S. (2004). Fault diagnosis using support vector machine with an application in sheet metal stamping operations, Mechanical Systems And Signal Processing, 18 (1), 143-159
Hartigan, J.A. (1975). Clustering Algorithms. New York: John Wiley & Sons
Hashemi, R. R., Le Blanc, L. A., Rucks, C. T. & Rajaratnam, A., 1998, A hybrid intelligent system for predicting bank holding structures, European Journal of Operational Research, 109 (2), 390-402.
He, J. Hu, H., Harrison, R., Tai, P. C. & Pan, Y. (2006). Rule generation for protein secondary structure prediction with support vector machines and decision tree, IEEE Transactions on Nanobioscience, 5(1), 46-53.
Hooper, R. S., Galvin, T. P., Kilmer, R. A., & Liebowitz, J. (1998). Use of an expert system in a personnel selection process, Expert Systems with Applications, 14(4), 425-432.
Hough, L. M., & Oswald, F. L. (2000). Personnel selection: looking toward the future – remembering the past, Annual Review of Psychology, 51, 631-664.
Hu, R., Yu, D.& Xie, Z. (2006). Informative-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognition Letters, 27, 414-423.
Huang, K., Yang, H., King, I. & Lyu, M. (2004). Learning classifiers from imbalanced data based on biased minimax probability machine, Proceedings of the 04’ IEEE computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04), 558-563.
Huang, L.C., Kuo, R.J. & Huang, H. J. (2001). A neural network modeling for human resource talent selection, International Journal of Human Resources Development and Management, 1(2/3/4), 206-219
Huang, M. J., Tsou, Y. L. & Lee, S. C. (2006). Integrating fuzzy data mining and fuzzy artificial neural networks human discovering implicit knowledge, Knowledge-Based Systems, 19(3), 396-403
Kass, G.V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119-127
Kumar, A. & Agrawal, D. P. (2005). Advertising data analysis using rough sets model, International Journal of Information Technology & Decision Making, 4(2), 263-376.
Kovach, K. A., & Cathcart, C. E. (1999). Human Resource Information Systems (HRIS): providing business with rapid data access, information exchange and strategic advantage, Public Personnel Management, 28(2), 275-282.
Kusiak, A. (2001). A data mining tool for semiconductor manufacturing, IEEE Transactions on Electronics Packaging Manufacturing, 24 (1), 44-50.
Kusiak, A. & Kurasek, C. (2001). Data mining of printed-circuit board defects, IEEE Transactions on Robotics and Automation, 17 (2), 191-196.
Li, P. & Wang, Z. (2004). Mining classification rules using rough sets and neural networks, European Journal of Operational Research, 157, 439-448.
Li, S.T., Shiue, W. & Huang, M. H. (2006). The evaluation of consumer loans using support vector machines, Expert System with Applications, 30, 772-782.
Liao, S. H. (2003). Knowledge management technologies and applications - literature review from 1995 to 2002, Expert Systems with Applications, 25, 155-164.
Lievens, F., Van Dam, K. & Anderson, N. (2002). Recent trends and challenges in personnel selection, Personnel Review, 31(5-6), 580-601.
Mahmood, M. A., Gowan, M. A. and Wang, S. P. (1995). Developing a prototype job evaluation expert system – a compensation management application, Information & Management, 29(1), 9-28.
Mak, B. & Munakata, T. (2002). Rule extraction from expert heuristics: A comparative study of rough sets with neural networks and ID3, European Journal of Operational Reasearch, 136, 212-229.
Mohanty, R. P. & Deshmukh, S. G.. (1997). Evolution of a decision support system for human planning in a petroleum company, International Journal of Production Economics, 51(3) : 251-261.
Mone, M. A., Mueller, G. C. & Mauland, W. (1996). The perceptions and usage of statistical power in applied psychology and management research, Personnel Psychologies, 49, 103-120.
Ngai, E.W.T. & Wat, F.K.T. (2006). Human resource information systems: a review and empirical analysis, Personnel Review, 35(3), 297-314.
Ntuen, C. A. & Chestnut J. A., 1995. An expert-system for selecting manufacturing workers for training, Expert Systems with Applications, 9(3): 309-332.
Nussbaum, M, Singer, M., Rosas, R., Castillo, M., Flies, E., Lara, R. & Sommers, R. (1999). Decision support system for conflict diagnosis in personnel selection. Information & Management, 36(1), 55-62.
Pawlak, Z. (1982). Rough sets, International Journal of Information and Computer Science, 11, 341-356.
Pawlak, Z. (2002). Rough set, decision algorithm and Bayes’ theorem, European Journal of Operational Research, 99, 48-57.
Peng, C. & Chien, C. F. (2003). Data value development to enhance yield and maintain competitive advantage for semiconductor manufacturing, International Journal of Service Technology and Management, 4(4-6), 365-383.
Peng, J., Chien, C. F. & Tseng, B. (2004). Rough set theory for data mining for fault diagnosis on distribution feeder, IEE Proceedings-Generation, Transmission, and Distributions, 151(6), 689-697.
Pyle, D. (1999). Data Preparation for Data Mining, Morgan Kaufrnann, San Francisco, CA.
Quinlan, J. R. (1986). Induction of decision tree, Machine Learning, 1(1), 81-106.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Robertson, I. T. & Smith, M. (2001). Personnel selection, Journal of Occupational and Organizational Psychology, 74(4), 441-472.
Shih, P. C. & Liu, C. J. (2006). Face detection using discriminating feature analysis and Support Vector Machine, Pattern Recognition, 39 (2), 260-276.
Shih, F. Y. & Zhang, K. (2005). Support vector machine networks for multi-class classification, International Journal of Pattern Recognition and Artificial Intelligence, 19(6), 775-786.
Shiue, Y., R. & Su, C. T. (2003). An enhanced knowledge representation for decision tree based learning adaptive scheduling. International Journal of Computer Integrated Manufacturing, 16(1), 48-60.
Shaw, M. J., Subramaniam, C., Tan, G. W., & Welge, M. E. (2001). Knowledge management and data mining for marketing, Decision Support Systems, 31(1), 127-137.
Su, C. T. & Hsu, J. H. (2006). Precision parameter in the variable precision rough sets model: an application. Omega-International Journal of Management Science, 34 (2), 149-157.
Su, C. T., Yih, Y. W. & Chen, L. S. (2006). Knowledge acquisition through information granulation for imbalanced data, Expert Systems with Applications, 31(3), 531-541.
Swiniarski, R.W. & Skowron, A. (2003). Rough set methods in feature selection and recognition, Pattern Recognition Letters, 24 (6), 833-849.
Tavana, M., Kennedy, D. T. & Joglekar, P. (1996). A group decision support framework for consensus ranking of technical manager candidates, Omega-International Journal of Management Science, 24(5): 523-538.
Wang, S. Dash, M. & Chia, L.-T. (2006). Efficient data reduction in multimedia data, Applied Intelligence, 25, 359-374.
Wei, C. & Chiu, I. (2002). Turning telecommunications call details to churn prediction: a data mining approach, Expert Systems with Applications, 23(2), 103-112.
Wilk, S., Slowinski, R., Michalowski, W. & Greco, S. (2005). Supporting triage of children with abdominal pain in the emergency room. European Journal of Operational Research,160 (3), 696-709.
Wu, C. H., Kao, S. C., Su, Y. Y., & Wu, C. C. (2005). Targeting customers via discovery knowledge for the insurance industry. Expert Systems with Applications, 29(2), 291-299.
Zhan, Y. M., Zeng, X. Y. & Sun, J. C. (2005). Rough set-based feature selection method, Progress In Natural Science, 15(3), 280-284.
Zhang, G. X., Cao, Z. X. & Gu, Y. J. (2005). A hybrid classifier based on rough set theory and support vector machines, Fuzzy Systems And Knowledge Discovery, Pt 1, Proceedings Lecture Notes In Artificial Intelligence, 3613, 1287-1296.
Zhong, N., Dong, J. & Ohsuga, S. (2001). Using rough sets with heuristics for feature selection, Journal of Intelligent Information Systems, 16, 199-214.
Warsaw University, Rough Set Exploration System, version 2.2, Logic Group, Institute of Mathematics, Warsaw University, Poland, (2005). (http://logic.mimuw.edu.pl/~rses/).