簡易檢索 / 詳目顯示

研究生: 蔡政育
Tsai, Cheng-Yu
論文名稱: 結合馬氏-田口系統與支持向量機於資料分類之研究
Integrating MTS with SVM for Data Classification
指導教授: 蘇朝墩
Su, Chao-Ton
口試委員: 蕭宇翔
Hsiao, Yu-Hsiang
陳麗妃
Chen, Li-Fei
許俊欽
Hsu, Chun-Chin
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 59
中文關鍵詞: 資料探勘屬性篩選分類馬氏-田口方法支持向量機
外文關鍵詞: data mining, feature selection, classification, MTS, SVM
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著科技的進步,問卷資料量越來越大,使傳統統計分析方法難以發掘隱藏在巨量問卷資料內的知識,取而代之的是應用資料探勘中的分類與預測技術。馬氏-田口系統(Mahalanobis-Taguchi-system, MTS)為田口玄一博士所提出的分類與預測技術,在執行MTS時,除了能對資料進行分類與預測之外,還能篩選資料屬性。但MTS在分類的過程中,在決定臨界值時會遇到許多困難,本研究使用支持向量機(Support Vector Machine, SVM)取代MTS分類的步驟。SVM是以數學最佳化為理論基礎的分類工具,具有良好的分類效能。本研究利用MTS屬性篩選的方法選出重要特徵屬性,再以這些重要屬性執行SVM,以此邏輯建立MTS+SVM模型。
    經由10組UCI數據與員工離職傾向案例分析,發現MTS+SVM的方法確實能改善傳統MTS分類的準確度,同時在使用較少的屬性下,MTS+SVM與GA+SVM和SA+SVM有幾乎相同或甚至更好的分類表現。也就是說MTS+SVM模型除了保留傳統MTS良好的屬性篩選能力,更彌補傳統不易決定臨界值的缺點,分類效能甚至不輸給現在常見的GA+SVM等方法。最後,透過此案例分析說明MTS+SVM方法的可行性,此方法可應用於實際問題中。


    As the technology advances, the amount of survey data has been increasing substantially; as a result, the analysis of survey data has become a big data issue. Due to the voluminous survey data, traditional statistical methods are difficult to extract information from the data. Therefore, data mining algorithms are often applied to classify and predict data instead. The Mahalanobis-Taguchi system (MTS), proposed by Dr. Taguchi, is a classification and prediction technique. When executing MTS, we can not only classify and predict the data, but also perform feature selection. However, MTS may encounter some difficulties when it comes to threshold decisions. To avoid the circumstance, the threshold decision process of MTS is replaced by using the support vector machine (SVM) in this study. SVM is based on the theory of mathematical optimization, and it performs effectively in classification. This study combines MTS feature selection with SVM classification algorithms, which is called the MTS+SVM model.
    Based on the analysis results of 10 UCI datasets and the real case of employees’ turnover intention, the MTS+SVM model can actually improve the accuracy in classification than MTS. Besides, the integrated model and GA/SA+SVM demonstrates the same or even better performance in classification when fewer features are used. That is to say, the MTS+SVM model not only remains the good feature selection performance of MTS, but also enhances the accuracy in classification. Last but not least, this study has demonstrated that the MTS+SVM model is workable and can be successfully implemented in a real case to achieve a better performance.

    第一章、緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 研究架構 4 第二章、文獻探討 5 2.1 資料探勘 5 2.1.1 資料探勘的目的與步驟 5 2.1.2 資料探勘功能 6 2.2 馬氏-田口系統 7 2.2.1 多變量系統 7 2.2.2 馬氏距離 8 2.2.3 田口穩健設計 9 2.2.4 馬氏-田口系統之執行 10 2.2.5 馬氏-田口臨界值選擇 12 2.3 支持向量機 14 2.4 屬性篩選 17 2.4.1 基因演算法 18 2.4.2 模擬退火法 19 2.4.3 屬性篩選說明與應用 20 2.5 離職傾向 21 2.5.1 離職傾向定義 21 2.5.2 影響離職傾向之因素 23 第三章、研究方法 24 3.1 基本構想 24 3.2 MTS+SVM模型 25 3.2.1 資料前處理 25 3.2.2 建構完整模式量測尺度並確認量測尺度 26 3.2.3 利用直交表與SN比進行屬性篩選 28 3.2.4 以重要特徵變數執行SVM演算法 30 第四章、數據分析 31 4.1 數據 31 4.2 衡量指標 32 4.3 實驗設計 33 4.3.1 實驗技術 33 4.3.2 實驗平台 34 4.3.3 實驗步驟與分析流程 35 4.4 效能評估 36 第五章、個案分析 45 5.1 案例背景與問題定義 45 5.2 資料蒐集與整理 45 5.3 建立資料探勘系統 48 5.3.1 分析流程 48 5.3.2 各方法之效能評估 49 5.4 重要屬性 52 第六章、結論與建議 53 6.1 結論 53 6.2 未來研究建議 54 參考資料 55

    [1] Azmi, R., Pishgoo, B., Norozi, N., Koohzadi, M., & Baesi, F. (2010, October). A hybrid GA and SA algorithms for feature selection in recognition of hand-printed Farsi characters. In Intelligent Computing and Intelligent Systems (ICIS), 2010 IEEE International Conference, 3, 384-387.
    [2] Beecroft, P. C., Dorey, F. and Wenten, M. (2008), Turnover intention in new graduate nurses: a multivariate analysis. Journal of Advanced Nursing, 62: 41-52.
    [3] Blanco, A., Delgado, M., & Pegalajar, M. C. (2001). A real-coded genetic algorithm for training recurrent neural networks. Neural networks, 14(1), 93-105.
    [4] Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth Annual Workshop on Computational Learning Theory, 144-152. ACM.
    [5] Breiman, L. (1984). Classification and regression trees. New York: Routledge.
    [6] Caplan, R. D., & Jones, K. W. (1975). Effects of work load, role ambiguity, and Type A personality on anxiety, depression, and heart rate. Journal of Applied Psychology, 60(6), 713-719.
    [7] Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., & Weingessel, A. (2008). Misc functions of the Department of Statistics (e1071), TU Wien. R package, 1, 5-24.
    [8] Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2-3), 103-130.
    [9] Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). The KDD process for extracting useful knowledge from volumes of data. Communication of ACM 39(11), 27-34.
    [10] Fowlkes, W. Y., Creveling, C. M., & Derimiggio, J. (1995). Engineering methods for robust product design: using Taguchi methods in technology and product development, 121-123. Reading, MA: Addison-Wesley.
    [11] Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and Techniques. Elsevier.
    [12] Hinshaw, A. S., Smeltzer, C. H., & Atwood, J. R. (1987). Innovative retention strategies for nursing staff. The Journal of Nursing Administration, 17(6), 8-16.
    [13] Holland, J. H. (1975). Adaptation in natural and artificial systems. An Introductory Analysis with Application to Biology, Control, and Artificial Intelligence. University of Michigan Press.
    [14] Huang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with Applications, 31(2), 231-240.
    [15] Jack, L. B., & Nandi, A. K. (2002). Fault detection using support vector machines and artificial neural networks, augmented by genetic algorithms. Mechanical Systems and Signal Processing, 16(2-3), 373-390.
    [16] Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680.
    [17] Kubat, M., & Matwin, S. (1997, July). Addressing the curse of imbalanced training sets: one-sided selection. In Icml, 97, 179-186.
    [18] Kuhn, M. (2008). Caret package. Journal of Statistical Software, 28(5), 1-26.
    [19] Lin, S. W., Lee, Z. J., Chen, S. C., & Tseng, T. Y. (2008). Parameter determination of support vector machine and feature selection using simulated annealing approach. Applied Soft Computing, 8(4), 1505-1512.
    [20] Meiri, R., & Zahavi, J. (2006). Using simulated annealing to optimize the feature selection problem in marketing applications. European Journal of Operational Research, 171(3), 842-858.
    [21] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087-1092.
    [22] Mobley, W. H., Griffeth, R. W., Hand, H. H., & Meglino, B. M. (1979). Review and conceptual analysis of the employee turnover process. Psychological Bulletin, 86(3), 493-522.
    [23] Mobley, W. H., Horner, S. O., & Hollingsworth, A. T. (1978). An evaluation of precursors of hospital employee turnover. Journal of Applied Psychology, 63(4), 408.
    [24] Moshkovich, H. M., Mechitov, A.I. & Olson, D.L. (2002). Rule Induction in the Data Mining: Effect of Ordinal Scales. Expert System with Applications, 22(4), pp.301-311.
    [25] Porter, L. W., & Steers, R. M. (1973). Organizational, work, and personal factors in employee turnover and absenteeism. Psychological Bulletin, 80(2), 151-176.
    [26] Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
    [27] Sousa, A., & Henneberger, F.(2004). Analyzing job mobility with job turnover intentions: An International Comparative study. Journal of Economic Issues, 38(1), 113-137.
    [28] Su, C. T. (2013). Quality engineering: off-line methods and applications. CRC press.
    [29] Su, C. T., & Hsiao, Y. H. (2007). An evaluation of the robustness of MTS for imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 19(10), 1321-1332.
    [30] Tett, R. P., & Meyer, J. P. (1993). Job satisfaction, organizational commitment, turnover intention, and turnover: Path analyses based on meta-analytical findings. Personnel Psychology, 46(2), 259.
    [31] Therneau, T. M., Atkinson, B., & Ripley, M. B. (2010). The rpart package.
    [32] Tzeng, H.M., Hsieh, J.G., Lin, Y.L. (2011). Predicting nurses’ intention to quit with a support vector machine: A new approach to set up an early warning mechanism in human resource management. Computers, Informatics, Nursing, 22 (4), 232-242.
    [33] Van Scotter, J. R. (2000). Relationships of task performance and contextual performance with turnover, job satisfaction, and affective commitment. Human Resource Management Review, 10(1), 79-95.
    [34] Vapnik, V. (1995). The nature of statistical learning theory. Springer science & business media.
    [35] Williams, L. J., & Hazer, J. T. (1986). Antecedents and consequences of satisfaction and commitment in turnover models: A reanalysis using latent variable structural equation methods. Journal of Applied Psychology, 71(2), 219.
    [36] Woodall, W. H., Koudelik Rachelle, & Tsui Kwok-Leung, (2003). A review and analysis of the Mahalanobis-Taguchi system. Technometrics, 45, 1–15.
    [37] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37.
    [38] Yang, J., & Honavar, V. (1998). Feature subset selection using a genetic algorithm. In Feature Extraction, Construction and Selection, 117-136. Springer, Boston, MA.
    [39] 林承鋐。(2016)。運用基因表達規劃法瑜支持向量機的規則萃取,清華大學工業工程與工程管理研究所碩士論文。
    [40] 高靖翔。(2008)。多項分配之分類方法比較與實證研究,政治大學統計研究所碩士論文。
    [41] 許家瑜。(2017)。運用資料探勘技術探討基於ERG理論之員工需求對工作滿意與離職傾向之關聯性研究—以台灣某連鎖百貨公司為例。
    [42] 傅品誠。(2017)。結合裝袋算法及減少多數抽樣法於馬氏-田口系統來解決非平衡資料問題,清華大學工業工程與工程管理研究所碩士論文。

    QR CODE