研究生: |
蔡政育 Tsai, Cheng-Yu |
---|---|
論文名稱: |
結合馬氏-田口系統與支持向量機於資料分類之研究 Integrating MTS with SVM for Data Classification |
指導教授: |
蘇朝墩
Su, Chao-Ton |
口試委員: |
蕭宇翔
Hsiao, Yu-Hsiang 陳麗妃 Chen, Li-Fei 許俊欽 Hsu, Chun-Chin |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | 資料探勘 、屬性篩選 、分類 、馬氏-田口方法 、支持向量機 |
外文關鍵詞: | data mining, feature selection, classification, MTS, SVM |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技的進步,問卷資料量越來越大,使傳統統計分析方法難以發掘隱藏在巨量問卷資料內的知識,取而代之的是應用資料探勘中的分類與預測技術。馬氏-田口系統(Mahalanobis-Taguchi-system, MTS)為田口玄一博士所提出的分類與預測技術,在執行MTS時,除了能對資料進行分類與預測之外,還能篩選資料屬性。但MTS在分類的過程中,在決定臨界值時會遇到許多困難,本研究使用支持向量機(Support Vector Machine, SVM)取代MTS分類的步驟。SVM是以數學最佳化為理論基礎的分類工具,具有良好的分類效能。本研究利用MTS屬性篩選的方法選出重要特徵屬性,再以這些重要屬性執行SVM,以此邏輯建立MTS+SVM模型。
經由10組UCI數據與員工離職傾向案例分析,發現MTS+SVM的方法確實能改善傳統MTS分類的準確度,同時在使用較少的屬性下,MTS+SVM與GA+SVM和SA+SVM有幾乎相同或甚至更好的分類表現。也就是說MTS+SVM模型除了保留傳統MTS良好的屬性篩選能力,更彌補傳統不易決定臨界值的缺點,分類效能甚至不輸給現在常見的GA+SVM等方法。最後,透過此案例分析說明MTS+SVM方法的可行性,此方法可應用於實際問題中。
As the technology advances, the amount of survey data has been increasing substantially; as a result, the analysis of survey data has become a big data issue. Due to the voluminous survey data, traditional statistical methods are difficult to extract information from the data. Therefore, data mining algorithms are often applied to classify and predict data instead. The Mahalanobis-Taguchi system (MTS), proposed by Dr. Taguchi, is a classification and prediction technique. When executing MTS, we can not only classify and predict the data, but also perform feature selection. However, MTS may encounter some difficulties when it comes to threshold decisions. To avoid the circumstance, the threshold decision process of MTS is replaced by using the support vector machine (SVM) in this study. SVM is based on the theory of mathematical optimization, and it performs effectively in classification. This study combines MTS feature selection with SVM classification algorithms, which is called the MTS+SVM model.
Based on the analysis results of 10 UCI datasets and the real case of employees’ turnover intention, the MTS+SVM model can actually improve the accuracy in classification than MTS. Besides, the integrated model and GA/SA+SVM demonstrates the same or even better performance in classification when fewer features are used. That is to say, the MTS+SVM model not only remains the good feature selection performance of MTS, but also enhances the accuracy in classification. Last but not least, this study has demonstrated that the MTS+SVM model is workable and can be successfully implemented in a real case to achieve a better performance.
[1] Azmi, R., Pishgoo, B., Norozi, N., Koohzadi, M., & Baesi, F. (2010, October). A hybrid GA and SA algorithms for feature selection in recognition of hand-printed Farsi characters. In Intelligent Computing and Intelligent Systems (ICIS), 2010 IEEE International Conference, 3, 384-387.
[2] Beecroft, P. C., Dorey, F. and Wenten, M. (2008), Turnover intention in new graduate nurses: a multivariate analysis. Journal of Advanced Nursing, 62: 41-52.
[3] Blanco, A., Delgado, M., & Pegalajar, M. C. (2001). A real-coded genetic algorithm for training recurrent neural networks. Neural networks, 14(1), 93-105.
[4] Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth Annual Workshop on Computational Learning Theory, 144-152. ACM.
[5] Breiman, L. (1984). Classification and regression trees. New York: Routledge.
[6] Caplan, R. D., & Jones, K. W. (1975). Effects of work load, role ambiguity, and Type A personality on anxiety, depression, and heart rate. Journal of Applied Psychology, 60(6), 713-719.
[7] Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., & Weingessel, A. (2008). Misc functions of the Department of Statistics (e1071), TU Wien. R package, 1, 5-24.
[8] Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2-3), 103-130.
[9] Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). The KDD process for extracting useful knowledge from volumes of data. Communication of ACM 39(11), 27-34.
[10] Fowlkes, W. Y., Creveling, C. M., & Derimiggio, J. (1995). Engineering methods for robust product design: using Taguchi methods in technology and product development, 121-123. Reading, MA: Addison-Wesley.
[11] Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and Techniques. Elsevier.
[12] Hinshaw, A. S., Smeltzer, C. H., & Atwood, J. R. (1987). Innovative retention strategies for nursing staff. The Journal of Nursing Administration, 17(6), 8-16.
[13] Holland, J. H. (1975). Adaptation in natural and artificial systems. An Introductory Analysis with Application to Biology, Control, and Artificial Intelligence. University of Michigan Press.
[14] Huang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with Applications, 31(2), 231-240.
[15] Jack, L. B., & Nandi, A. K. (2002). Fault detection using support vector machines and artificial neural networks, augmented by genetic algorithms. Mechanical Systems and Signal Processing, 16(2-3), 373-390.
[16] Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680.
[17] Kubat, M., & Matwin, S. (1997, July). Addressing the curse of imbalanced training sets: one-sided selection. In Icml, 97, 179-186.
[18] Kuhn, M. (2008). Caret package. Journal of Statistical Software, 28(5), 1-26.
[19] Lin, S. W., Lee, Z. J., Chen, S. C., & Tseng, T. Y. (2008). Parameter determination of support vector machine and feature selection using simulated annealing approach. Applied Soft Computing, 8(4), 1505-1512.
[20] Meiri, R., & Zahavi, J. (2006). Using simulated annealing to optimize the feature selection problem in marketing applications. European Journal of Operational Research, 171(3), 842-858.
[21] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087-1092.
[22] Mobley, W. H., Griffeth, R. W., Hand, H. H., & Meglino, B. M. (1979). Review and conceptual analysis of the employee turnover process. Psychological Bulletin, 86(3), 493-522.
[23] Mobley, W. H., Horner, S. O., & Hollingsworth, A. T. (1978). An evaluation of precursors of hospital employee turnover. Journal of Applied Psychology, 63(4), 408.
[24] Moshkovich, H. M., Mechitov, A.I. & Olson, D.L. (2002). Rule Induction in the Data Mining: Effect of Ordinal Scales. Expert System with Applications, 22(4), pp.301-311.
[25] Porter, L. W., & Steers, R. M. (1973). Organizational, work, and personal factors in employee turnover and absenteeism. Psychological Bulletin, 80(2), 151-176.
[26] Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
[27] Sousa, A., & Henneberger, F.(2004). Analyzing job mobility with job turnover intentions: An International Comparative study. Journal of Economic Issues, 38(1), 113-137.
[28] Su, C. T. (2013). Quality engineering: off-line methods and applications. CRC press.
[29] Su, C. T., & Hsiao, Y. H. (2007). An evaluation of the robustness of MTS for imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 19(10), 1321-1332.
[30] Tett, R. P., & Meyer, J. P. (1993). Job satisfaction, organizational commitment, turnover intention, and turnover: Path analyses based on meta-analytical findings. Personnel Psychology, 46(2), 259.
[31] Therneau, T. M., Atkinson, B., & Ripley, M. B. (2010). The rpart package.
[32] Tzeng, H.M., Hsieh, J.G., Lin, Y.L. (2011). Predicting nurses’ intention to quit with a support vector machine: A new approach to set up an early warning mechanism in human resource management. Computers, Informatics, Nursing, 22 (4), 232-242.
[33] Van Scotter, J. R. (2000). Relationships of task performance and contextual performance with turnover, job satisfaction, and affective commitment. Human Resource Management Review, 10(1), 79-95.
[34] Vapnik, V. (1995). The nature of statistical learning theory. Springer science & business media.
[35] Williams, L. J., & Hazer, J. T. (1986). Antecedents and consequences of satisfaction and commitment in turnover models: A reanalysis using latent variable structural equation methods. Journal of Applied Psychology, 71(2), 219.
[36] Woodall, W. H., Koudelik Rachelle, & Tsui Kwok-Leung, (2003). A review and analysis of the Mahalanobis-Taguchi system. Technometrics, 45, 1–15.
[37] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37.
[38] Yang, J., & Honavar, V. (1998). Feature subset selection using a genetic algorithm. In Feature Extraction, Construction and Selection, 117-136. Springer, Boston, MA.
[39] 林承鋐。(2016)。運用基因表達規劃法瑜支持向量機的規則萃取,清華大學工業工程與工程管理研究所碩士論文。
[40] 高靖翔。(2008)。多項分配之分類方法比較與實證研究,政治大學統計研究所碩士論文。
[41] 許家瑜。(2017)。運用資料探勘技術探討基於ERG理論之員工需求對工作滿意與離職傾向之關聯性研究—以台灣某連鎖百貨公司為例。
[42] 傅品誠。(2017)。結合裝袋算法及減少多數抽樣法於馬氏-田口系統來解決非平衡資料問題,清華大學工業工程與工程管理研究所碩士論文。