簡易檢索 / 詳目顯示

研究生: 徐婉綾
論文名稱: 主成份分析及其應用
Principal Component Analysis and Its Applications
指導教授: 陳朝欽
口試委員: 陳朝欽
陳宜欣
石維寬
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 27
中文關鍵詞: 主成份分析
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   人們的日常生活中會產生大量的資料,很多研究學者希望透過分析這些收集來的資料,來改善現況或是代替人力,像是預測經濟情勢、辨認疾病等等。隨著運算速度的快速、儲存空間大幅度地增加,如何去做好資料分析成為一個重要的議題。而那些資料往往是複雜且多維度的,這也增加了分析資料的困難性。
      作為一個資料分析技術,主成份分析能夠在保有最多特徵值的情況下,有效地降低資料的維度。在這篇論文中,我們利用主成份來重新呈現8OX、大腸癌基因、乳癌基因、紅酒辨認這四組資料。為了以視覺化呈現,我們利用MATLAB來表現二維及三維的實驗結果。最後,我們討論了主成份分析的一個使用注意事項,以及其可行的解決方法。


      People produce huge amount of data in daily life. By collecting and analyzing those data, many researchers want to improve human life or to replace human labor, such as predict economic circumstances and identify diseases. With the rapidity of computing speed and the substantial increase of storage space, it is an important issue to develop data analysis excellently. Those data can be complex and in multi-dimensions, so it increases the difficulty for analyzing data.
      As a data analysis technique, principal component analysis can retain most information out of data and, at the same time, reduce dimension effectively. In this thesis, we have used principal components to represent four data sets: 8OX data, colon cancer data, breast cancer data, and wine data. For presenting visual examples, the results of our experiments are shown in two and three-dimension plots by using MATLAB tools. At the end, we will discuss a notice on usage of PCA and its possible solution to obtain accurate results.

    Chapter 1 Introduction 1 1.1 Review of Principal Component Analysis 1 1.2 Principal Component Analysis 2 Chapter 2 Background for Principal Component Analysis 5 2.1 Problem Statement and the Solution 5 2.2 Computing Principal Components 6 Chapter 3 Experiment 7 3.1 8OX 7 3.2 Colon Cancer [Alon1999] 9 3.3 Breast Cancer [Veer2002] 12 3.4 Wine Recognition [Web02] 15 3.5 MATLAB Code 18 Chapter 4 A Notice on Usage of Principal Component Analysis 19 MATLAB Code for Data Preprocessing 23 Chapter 5 Conclusion 24 References 25

    [Adle2001] N. Adler and B. Golany, “Evaluation of Deregulated Airline Networks Using Data Envelopment Analysis Combined with Principal Component Analysis with An Application to Western Europe,” European Journal of Operational Research, vol. 132, no. 2, 260-273, 2001.
    [Aebe1994] S. Aeberhard, D. Coomans, and O. de Vel, “Comparative Analysis of Statistical Pattern Recognition Methods in High Dimensional Settings,” Pattern Recognition, vol. 27, no. 8, 1065-1077, 1994.
    [Alon1999] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proceedings of the National Academy of Sciences, vol. 96, no. 12, 6745-6750, 1999.
    [Bald1989] P. Baldi, K. Hornik, “'Neural Networks and Principal Component Analysis: Learning from Examples without Local Minima,” Neural networks, vol. 2, no. 1, 53-58, 1989.
    [Bitt2009] H.R. Bittencourt, B.P.O. Pasini, D.A. de O. Moraes, B.D. dos Santos, and V. Haertel, “Comparative Analysis of Two Classes Implementing Nominal Logistic Regression,” Revista Brasileira de Biometria, vol. 27, no. 1, 115-124, 2009.
    [DLBA2013] J.D. de la Bastida Castillo, “Software for Gene Expression Data Analysis,” MS Thesis, Institute of ISA, National Tsing Hua University, Hsinchu, Taiwan, May, 2013.
    [Haes1990] J.C. de Haes, F.C. van Knippenberg, and J.P. Neijt, “Measuring Psychological and Physical Distress in Cancer Patients: Structure and Application of the Rotterdam Symptom Checklist,” British Journal of Cancer, vol. 62, no. 6, 1034-1038, 1990.

    [Hote1933] H. Hotelling, “Analysis of a Complex of Statistical Variables into Principal Components,” Journal of educational psychology, vol. 24, no. 6, 417-441, 1933.
    [Jain1988] A.K. Jain and R.C. Dubes, “Algorithms for ClusteringData,” Englewood Cliffs, NJ: Prentice-Hall, 1988.
    [Joll1986] I.T. Jolliffe, “Principal Component Analysis,” Springer, 1st edition, 1986.
    [Joll2002] I.T. Jolliffe, “Principal Component Analysis,” Springer, 2nd edition, 2002.
    [Kram1991] M.A. Kramer, “Nonlinear Principal Component Analysis Using Autoassociative Neural Networks,” AIChE Journal, vol. 37, no. 2, 233-243, 1991.
    [Krey2006] E. Kreyszig, “Advanced Engineering Mathematics,” John Wiley & Sons, 9th edition, 2006.
    [Nove2008] J. Novembre, T. Johnson, K. Bryc, Z. Kutalik, A.R. Boyko, A. Auton, A. Indap, K.S. King, S. Bergmann, M. R. Nelson, M. Stephens, and C.D. Bustamante, “Genes Mirror Geography within Europe,” Nature, vol. 456, no. 7218, 98-101, 2008.
    [Pear1901] K. Pearson, “On Lines and Planes of Closest Fit to Systems of Points in Space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 2, no. 11, 559-572, 1901.
    [Rask1988] R. Raskin and H. Terry, “A Principal-Components Analysis of the Narcissistic Personality Inventory and Further Evidence of Its Construct Validity,” Journal of Personality and Social Psychology, vol. 54, no. 5, 890-902, 1988.
    [Shyu2003] M. Shyu, S. Chen, K. Sarinnapakorn, and L. Chang, “A Novel Anomaly Detection Scheme Based on Principal Component Classifier,” Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, in conjunction with the Third IEEE International Conference on Data Mining, 172-179, 2003.

    [Veer2002] L.J. van't Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend, “Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer,” Nature, vol. 415, no. 6871, 530-536, 2002.
    [Web01] http://levis.tongji.edu.cn/gzli/data/mirror-kentridge.html, last access on June 23, 2014.
    [Web02] http://archive.ics.uci.edu/ml/datasets/Wine, UCI Machine Learning Repository, last access on June 23, 2014.
    [Zuur2007] A.F. Zuur, E.N. Ieno, and G.M. Smith, “Analyzing Ecological Data,” Springer, 2007.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE