研究生: |
林依祈 Yi-Chi Lin |
---|---|
論文名稱: |
改良非排序特徵選取過濾法於TFT-LCD Array製程檢測之應用 An Improved Non-ranker Filter Feature Selection Method for TFT-LCD Array Process Inspection |
指導教授: |
蘇朝墩
Chao-Ton Su |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 中文 |
論文頁數: | 88 |
中文關鍵詞: | 資料探勘 、特徵選取 、分類技術 、薄膜電晶體液晶顯示器 |
外文關鍵詞: | data mining, feature selection, classification, Thin-Film Transistor Liquid-Crystal Display (TFT-LCD) |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
特徵選取在處理資料維度縮減中是一項很有效的技術,在進行資料探勘過程中,可藉由執行特徵選取辨識出資料集中之相關屬性並剔除無相關或是重複的屬性,以提升分類績效並縮短訓練時間。特徵選取之演算法可分為三個技術,包裝法是利用本身的分類演算法去評估屬性的可用性,嵌入法選取特徵是建立在分類器結構上,而過濾法則僅評估資料本身的特性而不考慮分類器。為了能有效率地運用在高維資料中,相較於另外兩個技術,過濾法在計算上是較快速的,然而過濾法的分類績效卻無法達到另外兩個技術的水準。
在本研究中建立一個結合器架構,企圖將不同演算法下之特徵選取子集合結合為單一最終子集合,並且提出一個新的結合器方法用以提升目前已存在之過濾法分類績效。利用UCI資料庫中的資料進行實驗,實驗結果說明新的結合器方法在k個最鄰近分類演算法能顯著地提升分類績效,尤其是在定性資料集中。在實務應用上,以台灣某TFT-LCD製造廠Array製程檢測資料為研究對象,所提出之特徵選取法能有效地減少測試項目,並且在分類績效上獲得改善。
Feature selection is an effective technique in dealing with dimensionality reduction. Identifying relevant feature in the dataset and discarding everything else as irrelevant and redundant can improve the performance of classifier. Algorithm for feature selection fall into three broad techniques: wrappers use the learning algorithm itself to evaluate the usefulness of feature, embedded is built into the classifier construction, while filters assess the relevance of features by looking only at the intrinsic properties of the data. For application to large databases, filters technique have proven to be more practical than others because they are much faster. However, their performance is worse than others when the classifiers are combined.
In this study we present a general framework for creating several feature subsets and then combine them into a single subset. A new combiner is proposed for selecting features to improve the performance of filter techniques that exist. Experiment results demonstracted that the new combiner approach gives the significicant improvement for k- nearest neighbor classifier, especially using on quantitative data. Finally, the proposed method was employed to analyze the TFT-LCD array process inspection. Implementation results showed that the test items have been significantly reduced and the performance has been improved.
[1]Liu, H., and H. Motoda (1998) “Feature Selection for Knowledge Discovery and Data Mining,”, Boston, Dprdrecht London: Kluwer Academic Publishers.
[2]Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth (1996) “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, AAAI Press/ The MIT Press, pp.195-515.
[3]Yu, L. and H. Liu (2003) “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” Proc. of the 20th International Conference on Machine Learning (ICML-2003), Washington, DC USA, pp. 856-863.
[4]Saeys, Y., I. Inza, and P. Larrañaga (2007) “A Review of Feature Selection Techniques in Bioinformatics,” Bioinformatics, Vol.23, No.19, pp.2507-2517.
[5]Aha, D.W (1997) “Editorial,” Artificial Intelligence Review, 11(1-5), pp.1-6.
[6]Jonsdottir, T., E. T. Hvannberg, H. Sigurdsson,.and S. Sigurdsson (2008) “The Feasibility of Constructing A Predictive Outcome Model for Breast Cancer Using The Tools of Data Mining,” Expert Systems with Applications, No. 34, pp.108-118.
[7]Pudil, P., J. Novovičová, and J. Kittler (1994) “Floating search methods in feature selection,” Pattern Recognition Letters, 15(11), pp.1119–1125.
[8]Rietveld, T., and R. Hout (1993) “Statistical Techniques for the Study of Language and Language Behavior,” Berlin, Germany: Mouton de Gruyter.
[9]Rokach, L., B. Chizi , and O.Maimon (2007) “A Methodology for Improving The Performance of Non-ranker Feature Selection Filters,” International Journal of Recognition and Artificial Intelligence, Vol. 21, No. 5, pp.809-830.
[10]Fayyad, U. M. and K. B. Irani (1993) “Multi-interval Discretisation of Continuous-valued Attributes for Classification Learning,” In Proceedings of the 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann.
[11]Moore, A. W. and M. S. Lee (1994) “Efficient Algorithms for Minimising Cross Validation Error,” Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufmann.
[12]Press, W. H., B. P. Flannery, S. A. Teukolski, and W. T. Vetterling (1998) “Numerical Recipes in C,” Cambridge University Press.
[13]Mladenić, D.(2006) “Feature Selection for Dimensionality Reduction,” Craig Saunders et al. (Hrsg.): SLSFS 2005, LNCS 3940, pp.84-102.
[14]Hall, M. A. and G. Holmes (2000) “Benchmarking Attribute Selection Techniques for Data Mining,” Working Paper 00/10, Department of Computer Science, University of Waikato, New Zealand.
[15]Hall, M. A.and L. A. Smith (1997) “Feature Subset Selection: A Correlation Based Filter Approach,” International Conference on Neural Information Processing and Intelligent Information System, Springer, p.855-858.
[16]Chou, T. S., K. K. Yen, and J. Luo (2007) “Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms,” International Journal of Computational Intelligence, Vol. 4, No. 3, pp.196-208.
[17]Yu, L. and L. Liu (2004) “Redundancy Based Feature Selection for Microarray Data,” Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA.
[18]Kohavi, R. and G. H. John (1997) “Wrappers for Feature Subset Selection,” Artificial Intelligence, Vol. 97, No. 1-2, pp.273-324.
[19]Dash, M.and H. Liu (2003) “Consistency-based Search in Feature Selection,” Artificial Intelligence, No.151, pp.155-176.
[20]Russell, S. and P. Norving (1995) “Artificial Intelligence: A Modern Approach, ” Prentice-Hall.
[21]Kira, K. and L. A. Rendell (1992) “A practical approach to feature selection,” In Proceedings of the ninth international workshop on Machine learning, Morgan Kaufmann Publishers Inc., pp. 249-256.
[22]張云濤、龔玲,2007,資料探勘原理與技術,初版,臺北市:五南圖書。
[23]鄭宇庭、易丹輝、謝邦昌,2006,統計資料分析-以statistica為例,二版,台北市:中華資料採礦學會(CDMS)。
[24]曾龍,2003,資料探礦-概念與技術,初版,台北縣:維科圖書有限公司。
[25]曾憲雄、蔡秀滿、蘇東興、曾秋蓉、王慶堯,2005,資料探勘,台北市:旗標出版股份有限公司。
[26]Instanced-Based Learning,http://www.icl.pku.edu.cn/yujs/papers/word/IBL-ZAN.ppt (2002.10)
[27]Data Mining: Classification,http://www1.nttu.edu.tw/green/www/g02/g02-2/pp/DataMining/DataMining_06.ppt