發展主幹式決策樹法則以提昇半導體良率之研究-以DRAM廠為實證

簡易檢索 / 詳目顯示

回結果列表

研究生：	賴彥中 Yen-Chung, Lai
論文名稱：	發展主幹式決策樹法則以提昇半導體良率之研究-以DRAM廠為實證 Develop A Yield Enhancement Framework based on Main Branch Decision Tree Algorithm for Mining Semiconductor Data – An Empirical Study of A DRAM Fab
指導教授：	簡禎富 Chen-Fu, Chien
口試委員:
學位類別：	碩士 Master
系所名稱：	工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management
論文出版年：	2005
畢業學年度：	93
語文別：	英文
論文頁數：	122
中文關鍵詞：	故障排除、良率提昇、決策樹、資料挖礦、類別不平衡、半導體製程、主幹式決策樹法則
外文關鍵詞：	trouble shooting, yield enhancement, decision tree, data mining, class imbalanced problem, semiconductor manufacturing, main branch decision tree algorithm
相關次數：	點閱：100 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

為了提高生產效率、降低成本和改善品質，半導體業的工程師們需要有效的分析工具來處理相關的資料分析以及決策問題。
故障排除分析時，工程師不能僅靠專業物理知識或是電性知識，如此不能解決問題，因為問題成因過於複雜。目前研究如何從製程資料裡擷取知識或是建立生產系統決策規則的研究者很少。而在半導體的良率提昇領域，討論Class Imbalanced Problem的研究就更少了。
因此，此篇論文發展一個創新的決策樹演算法「Main Branch Decision Tree Algorithm」。我們提出的理論將考慮到Focus Class，此目標將會隨著分析者的使用環境不同而能自行訂定，例如將低良率的晶圓設定為Focus Class。此外，我們以此決策樹演算法為基礎，建立一個良率提昇分析架構，並以DRAM晶圓廠的實際資料作為驗證。
我們結合了領域專家的經驗和資料挖礦分析方法，對於製程資料進行分析，將可疑問題成因做成報告，提供給工程師作為依據。我們協助工程師縮小可疑原因的範圍，並縮短故障排除的時間，藉此將可提昇良率並降低產品損失。

In order to response the production promoting, cost-reducing and quality-improving in semiconductor industry, engineers need effective analytical method to deal with relevant data analysis and decision problem.
While making trouble shooting in semiconductor industry, engineers can not only use the expert knowledge on physics or electronics to answer the problem because of numerous relevant analysis factors. At present, only few researchers study how to acquire knowledge from manufacturing data and describe the characteristic of production system in the form of decision rules. There are almost none related research which is about yield enhancement in semiconductor manufacturing talk about the class imbalanced problem encountered in data analysis.
So, this thesis develops a new decision tree algorithm called “Main Branch Decision Tree Algorithm” which is different from the general decision tree. The proposed algorithm concerns about the user-defined focus class in dataset for the specific situation, such as finding the root cause of yield-loss wafer in tremendous instances. And we suggested a framework based on our proposed decision tree algorithm and conducted an empirical study in a DRAM FAB for yield enhancement for validation.
We combine the domain expert's experience and data mining methodology to sum up the assignable root-cause of the manufacture system, and offer the engineers the reference basis of solving the problem. We help engineers to shrink the range of possible causes, and shorten the time of trouble shooting, so as to improve yield and prevent more suffered wafer.

摘要 i
ABSTRACT    ii
Table of Content    iii
List of Figures    v
List of Tables    vii
Chapter 1 Introduction    1
1 Background, significance, and motivation    1
2 Research aims    2
3 Overview of this thesis    3
Chapter 2 Literature Review    5
1 Semiconductor manufacturing data    5
1.1 Semiconductor manufacturing process    5
1.2 Data property of semiconductor manufacturing    9
2 Data mining concepts and methods    12
2.1 Data mining and KDD    12
2.2 Data mining model functions    13
2.3 Data mining process    16
2.4 Data mining and its application    17
3 Decision tree    19
3.1 Decision tree construction    20
3.2 CART    24
3.3 C4.5    26
3.4 CHAID    27
4 Class imbalanced problem    29
4.1 Scenario of class imbalanced problem    30
4.2 Methods to address class imbalanced problem    34
Chapter 3 The Yield Enhancement Framework based on Main Branch Decision Tree Algorithm    37
1 Problem definition    40
1.1 Problem background    40
1.2 Data acquisition    41
2 Data preparation    43
2.1 Data cleaning    43
2.2 Data partition    44
2.3 Data clustering    44
2.4 Data treatment    45
3 Feature selection    48
3.1 Why we do feature selection    48
3.2 How we do feature selection    48
4 Decision tree construction    50
4.1 Main Branch Decision Tree Algorithm    51
4.2 The user-defined parameters in Main Branch Decision Tree    54
4.3 The steps of Main Branch Decision Tree Construction    55
4.4 The pruning methods of Main Branch Decision Tree Algorithm    59
4.5 Compare the effect of different set of RP and FCSIG    60
4.6 Numerical illustration    61
5 Results and validation    68
Chapter 4 Empirical study    69
1 A real case in UCI database    69
1.1 Description of real dataset “Hayes-roth”    70
1.2 Decision tree construction of real dataset “Hayes-roth”    71
2 A real case in semiconductor manufacturing DRAM fab    85
2.1 Problem definition    85
2.2 Data preparation    86
2.3 Feature selection    93
2.4 Decision Tree Construction    95
2.5 Results and Validation    98
Chapter 5 Conclusion and Future Research    105
References    108

                                

簡禎富、李培瑞、彭誠湧（2003），「半導體製程資料特徵萃取與資料挖礦之研究」，資訊管理學報，第十卷，第一期，63-84頁。
王文志（2003），「實驗設計為基礎架構之資料挖礦方法及其實證研究」，國立清華大學工業工程與工程管理學研究所碩士論文。
林鼎浩（2000），「建構半導體製程資料挖礦架構及其實證研究」，國立清華大學工業工程與工業管理研究所碩士論文。
林大欽（1997），「IC封裝業之短期生產排程之探討」，國立清華大學工業工程與工業管理研究所碩士論文。
李培瑞（2000），「半導體製程資料挖礦架構、決策樹分類法則及其實證研究」，國立清華大學工業工程與工業管理研究所碩士論文。
鄭仁傑（2003），「以混合決策樹方法分析有相互關係之半導體製造資料」，國立清華大學工業工程與工業管理研究所碩士論文。
Alex, A. F., and H. L. Simon (1998), Mining very large databases with parallel processing, Kluwer Academic, Bosten.
Batista G., A. Carvalho, and M. C. Monard (2000), “Applying Onesided Selection to Unbalanced Datasets”, Proceedings of the Mexican International Conference on Artificial Intelligence – MICAI 2000, pp.315–325.
Berry, M. and G. Linoff (1997), Data Mining Techniques for Marketing, Sales and Customer Support, John Wiley and Sons, New York.
Biggs, D., B. de ville, and E, Suen (1991), “A method of choosing multiway partitions for classification and decision trees”, Journal of Applied Statistics,Vol.18(1), pp.49-62.
Bose, I. and R. K. Mahapatra (2001), “Business data mining—a machine learning perspective”, Information and Management, Vol.39, pp.211-225.
Brachman, R. J., T. Khabaza, W. Kloesgen, G. Piatetsky-Shapiro, and E. Simoudis (1996), “Mining business database ”, Communication of ACM, Vol.39(11), pp.42-48.
Braha, D. and A. Shmilovici (2002), “Data mining for improving a cleaning process in the semiconductor industry”, IEEE Transactions on Semiconductor Manufacturing, Vol.15(1), pp.91-101.
Braha, D. and A. Shmilovici (2003), “On the use of decision tree induction for discovery of interactions in a photolithographic process,” IEEE Trans. Semiconductor Manufacturing, Vol.16, pp.644-652.
Breiman, L., and J. H. Friedman, R. J. Olshen, and C. J. Stone (1984), “Classification and regression Trees”, Belmont, CA,Wadsworth.
Cardie, C. and N. Howe (1997), “Improving minority class prediction using case-specific feature weights”, Proceedings of the Fourteenth International Conference on Machine Learning, pp.57-65, San Francisco, CA: Morgan Kaufmann.
Chawla, N., K. Bowyer, L. Hall, and W. Kegelmeyer (2002), “SMOTE: Synthetic Minority Over-sampling Technique”, Journal of Artificial Intelligence Research, Vol.16, pp.321-357.
Chen, A., R. S. Guo, and P. Lin (2000), “Statistical analysis and design of semiconductor manufacturing systems”, The Ninth International Symposium on Semiconductor Manufacturing, pp.335-338.
Chien, C. F., T. H. Lin, C. Y. Peng, and S. C. Hsu (2001), “Developing data mining framework and methods for diagnosing semiconductor manufacturing defects and an empirical study of wafer acceptance test data in a wafer fab”, Journal of the Chinese Institute of Industrial Engineers, Vol.18(4), pp.37-48
Conoverw, J. (1971), Practical nonparametric statistics, New York, Wiley.
Cunningham, S. P., C. J. Spanos, and K. Voros (1995), “Semiconductor yield improvement: Results and best practices”, IEEE Transactions on Semiconductor Manufacturing, Vol.8(2), pp.103-109.
Dietterich, T.G., R. H. Lathrop, and T. Lozano-Perez (1997), “Solving the multiple-instance problem with axis-parallel rectangles”, Artificial Intelligence, Vol.89(1-2), pp.31-71.
Deboeck, G. and T. Kohonen, Eds. (1998), Visual Exploration in Finance with Self-Organizing Maps, Springer-Verlag, London.
Domingos, P. (1999), “MetaCost: A general method for making classifiers cost-sensitive”, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp.155-164, ACM Press.
Esposito, F., D. Malerba, and M. Semeraro, (1997), “A comparative analysis of methods for pruning decision trees”, IEEE Transactions on pattern analysis and machine intelligence, Vol.19(5), pp.476-491.
Evans, S., S. Lemon, C. Deters, R. Fusaro, and H. Lynch (1997), “Automated detection of hereditary syndromes using data mining”, Computer and Biomedical Research, Vol.30, pp.337-348.
Fan, C., R. Guo, S. Chang, and C. Wei (2000), “SHEWMA: an end-of-line SPC scheme using wafer acceptance test data”, IEEE Transactions on Semiconductor manufacturing, Vol.13(3), pp.344-358.
Fan, C. M., R. S. Guo, A. Chen, K. C. Hsu, and C. S. Wei (2001), “Data Mining and fault diagnosis based on wafer acceptance test data and in-line manufacturing data”, IEEE, pp.171-174.
Fayyad, U. (1997), “Data mining and knowledge discovery in database: implication for scientific database”, Proceedings of Ninth International Workshop on Scientific and Statistical Database Management, pp.2-11.
Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth (1996), “From data mining to knowledge discovery: An overview”, in Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, S. P. Amith, and R. Uthurusamy (Eds.), Cambridge, MA, MIT Press, pp.1-36.
Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth (1996), “The KDD process for extracting useful knowledge from volumes of Data”, Communication of ACM, Vol.39(11), pp.27-34.
Feelders, A., H. Daniels, and M. Holsheimer (2000), “Methodological and practical aspects of data mining”, Information and Management, Vol.37, pp.271-281.
Feller, W. (1968), An Introduction to probability theory and its applications, New York, Wiley.
Friedman, J.H. (1995), “Introduction to computational learning and statistical prediction tutorial”, Twelfth Int. Conf. on Machine Learning, Lake Tahoe, California.
Fu, Y. (1997), “Data mining”, IEEE Potentials, Vol.164, pp.18-20.
Fukunaga, K. (1990), Introduction to Statistical Pattern Recognition, 2nd Edn, Academic Press, San Diego. California.
Gandner, M. and J. Bieker (2000), “Data mining solves tough semiconductor manufacturing problem”, Proceedings of KDD2000.
Grzymala-Busse, J., X. Zheng, L. Goodwin, and W. Grzymala-Busse (2000), “An approach to imbalanced data sets based on changing rule strength”, Learning from imbalanced data sets: Papers from the AAAI Workshop, pp.69-74, Menlo Park, CA: AAAI Press, Technical Report WS-00-05.
Han. J. and M. Kamber (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers.
Hartigan, J. A. and M. A. Wong, (1979), "Algorithm AS136: a k-means clustering algorithm," Applied Statistics, Vol.28, pp.100-108.
Hettich, S. and S. D. Bay (1999), The UCI KDD Archive [http://kdd.ics.uci.edu], Irvine, CA: University of California, Department of Information and Computer Science.
Japkowicz, N. (2000), “The class imbalance problem: Significance and strategies", In Proceeding of the 2000 International Conference on Artificial Intelligence, Vol.1, pp.111-117.
Japkowicz, N. (2001) “Concept learning in the presence of between-class and within-class imbalances”, In Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, pp.67-77, Springer-Verlag.
Japkowicz, N. (2001), “Supervised versus unsupervised binary learning by feedforward neural networks,” Machine Learning, Vol.42(1/2), pp. 97-122.
Japkowicz, N. and S. Shaju (2002), “The class imbalance problem: A systematic study”, Intelligent Data Analysis, Vol.6(5), pp.429-450.
John, G. H., P. Miller, and R. Kerber (1996), “Stock selection using rule induction”, IEEE Expert, Vol.11(5), pp.52-58.
Kass, G. V. (1975), “Significance testing in automatic interaction detection (AID)”, Applied Statistics, Vol.24, pp.178-189.
Kass, G. V. (1980), "An exploratory technique for investigating large quantities of categorical data", Applied Statistics, Vol.29(2), pp.119-127.
Kendall, M. G. and A. Stuart (1961), The advanced theory of statistics, London, Griffin.
Kleissner, C. (1998), “Data Mining for the enterprise”, IEEE Proceedings of 31st Annual Hawaii International Conference on System Sciences, Vol.7, pp.295-304.
Kotsiantis, S. B. and P. E. Pintelas (2003), “Mixture of expert agents for handling imbalanced data sets,” Annals of Mathematics, Computing and Teleinformatio, Vol.1(1), pp.46-55
Kubat, M., R. Holte, and S. Matwin (1998), “Machine learning for the detection of oil spills in satellite images”, Machine Learning, Vol.30(2), pp.195-215.
Kubat, M., R. Holte, and S. Matwin (1997), “Learning when Negative Examples Abound”, Machine Learning, ECML-97, Lecture Notes in Artificial Intelligence 1224, pp.146-153, Springer.
Kusiak, A. and C. Kurasek (2001), “Data mining of printed-circuit board defects”, IEEE Transactions on Robotics and Automation, Vol.17(2).
Lai, Y. C., C. F. Chien, and S. J. Wang (2004), “Using decision tree for mining semiconductor data for yield enhancement”, in Proceedings of the Thirteenth IEEE International Symposium on Semiconductor Manufacturing, pp.494-497.
Lim, T. S., W. Y. Loh, and Y. S. Shih (2000), “A Comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms”, Machine Learning, Vol.40, pp.203-229.
Ling, C. and Li, C. (1998), “Data mining for direct marketing: Problems and solutions”, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD '98), pp.73-79, Menlo Park, CA:AAAI Press.
Manevitz, L. M. and M. Yousef (2001), “One-class SVMs for document classification”, Journal of Machine Learning Research, Vol.2, pp.139-154.
Mieno, F. T. Sat, Y. Shibuya, K. Odagiri, H. Tsuda, and R. Take (1997), ”Yield improvement using data mining system”, IEEE, pp.391-393.
Milne, R., M. Drummond, and P. Renoux (1998), “Predicting paper making defect on-line using data mining”, Knowledge-Based Systems, Vol.11, pp.331-338.
Mitchell, T. M. (1997), Machine Learning, New York: McGraw- Hill.
Montull, J. I., A. C. Ortega, and E. Sobrino (1999), ”Using neural networks and 3D polynomial interpolation for the study of probe yield vs. E-Test correlation. application to sub-micronics mixed-signal technology”, IEEE/SEMI Advanced Semiconductor Manufacturing Conference, pp.197-201.
Morgan, J. A. and J. N. Sonquist (1963), “Problems in the analysis of survey data: and a proposal”, J. Amer. Statist. Ass., Vol.58, pp.415-434.
Morgan, J. N. and R. C. Messenger (1973), “THAI-a sequential analysis program for the analysis of nominal scale dependent variables”, Survey Research Centre, Institute for Social Research, University of Michigan.
Pearson, R., G. Goney, and J. Shwaber (2003), “Imbalanced clustering for microarray time-series”, In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets.
Preece, D. A. (1971), “Iterative procedures for missing values in experiments”, Technometrics, Vol.13, pp.743-753.
Provost, F., T. Fawcett, and R. Kohavi (1998), “The case against accuracy estimation for comparing induction algorithms”, In Proceedings of the 15th International Conference on Machine Learning, pp.445-453.
Quinlan, J. R. (1993a), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco, California.
Quinlan J. R. (1993b), “Combining instance-based and model-based learning”, Proceeding of ML'93, San Mateo, Morgan Kaufmann.
Schaffer, C. (1994), “A conservation law for generalization performance”, In Proceedings of the Eleventh international Conference on Machine Learning, pp.259-265, Morgan Kaufman.
Schölkopf, B., J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson (2001), “Estimating the support of a high-dimensional distribution”, Neural Computation, Vol.13(7), pp.1443-1472.
Subhash S. (1996), Applied Multivariate Techniques, John Wiley & Sons.
Tax, D. (2001), “One-class classification”, Ph.D thesis, Delft University of Technology.
Themen, C. W. (1989), Decision Estimation and Classification: An Introduction to Pattern Recognition and Related Topics, Wiley, New York.
Tsuda, H., H. Shiri, O. Takagi, and R. Take, (2000), ”Yield analysis and improvement by reducing manufacturing fluctuation noise”, ISSM 2000 proceeding.
Weiss, G. (2004), “Mining with rarity: A unifying framework”, SIGKDD Explorations, Vol.6(1), pp.7-19.
Weiss, G. M. and F. Provost (2001), “The effect of class distribution on classifier learning”, Technical Report ML-TR-43, Department of Computer Science, Rutgers University, January 11.
Wolpert, H. (1994), “The relationship between PAC, the statistical physics framework, the bayesian framework, and the VC framework”, in The Mathematics of Generalization, D.H. Wolpert (Ed.), Addison Wesley.
Yau, C.W. and S. L. Chang (1988), “Trouble-Shooting: A key to process improvement,” in Proceedings of the International Test Conference, pp.796-803.
Zant, P. V. (1997), Microchip Fabrication, McGraw-Hill.
Zhou, C., P. C. Nelson, W. Xiao, T. M. Tripak, and S. A. Lane, (2001), “An intelligent data mining system for drop test analysis of electronic products”, IEEE Transactions in electronics packaging manufacturing, Vol.24(3), pp.222-231.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文