簡易檢索 / 詳目顯示

研究生: 牧摩度
Momodou Lamim Sanyang
論文名稱: Data Visualization by Self-Organizing Map
以SOM做資料視覺化
指導教授: 陳朝欽
Chen, Chaur-Chin
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 38
中文關鍵詞: SOM資料視覺化
外文關鍵詞: SOM, Data Visualization
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Data visualization is very paramount nowadays for the simple fact that we have
    acquired huge and complex data and are increasingly accumulating more and more due
    to cheap storage devices recently. Most of those data are high-dimensional and
    therefore hard for human to visualize. Efforts were made to alleviate this
    high-dimensional visualization problem and through researchers endeavor,
    Self-Organizing Map (SOM) was born.
    The Self-Organizing Map is an unsupervised neural network algorithm that
    projects high-dimensional data onto a two-dimensional map which we human can
    easily visualize. The projection preserves the topology of the data so that similar data
    items will be mapped to nearby locations on the map. It is a powerful method for data
    mining and cluster extraction and very useful for processing data of high
    dimensionality and complexity.
    There are several visualization methods which present different aspects of the
    information learned by the SOM to gain insight and guide segmentation of the data. In
    this thesis, common visualization methods such as dendrogram, 2d-dendrogram,
    principal component projection, label of maps, U-matrix and some recently introduced
    methods such as P-matrix and the U*-Matrix plots are used to visualize the results on
    four data sets: IRIS which has 150 patterns with 3 classes, each class has 50 patterns,
    each pattern has four features; 8OX has 45 patterns with 3 classes, each class has 15
    patterns, each pattern has 8 features; A microarray data set ALL-AML Leukemia with
    38 patients of 2 classes (27 ALL, 11 AML), each patient has 7129 genes; and Colon
    Tumor with 62 samples (22 normal, 40 tumor) of 2 classes with a total of 2000 genes.
    The visualization results of each of these data sets are reported using the
    aforementioned methods, the 2d-dendrogram method seems to be a better tool for
    visualizing the microarray data and all the methods perform well on the IRIS and 8OX.


    Chapter 1 Introduction………………………………………………………………1 Chapter 2 Review of clustering Algorithms...………..……………………………..3 2.1 Distance Measures………………………………………………..…........4 2.1.1 The Minkowski distance........................................................................4 2.1.2 The Vector angle measurement...............................................................5 2.1.3 The Correlation measurement………………….....................................5 2.2 Hierarchical Clustering.....………………………………………………....5 2.2.1 Single-Linkage versus Complete-Linkage…………………………….7 2.2.2 Strength and Limitations ……………………………………………...9 2.3 Partitioning Clustering............……………………………………..............9 2.3.1 K-Means Clustering Algorithm................................................................10 2.4 Topology Preserving Mapping …………………………...........................10 2.4.1 Self-Organizing Map (SOM)………………………………………….11 Chapter 3 Self- Organizing Map ……………………...…………………………….12 3.1 Algorithm for Kohonen’s Self- Organizing Map...............………………..12 3.2 Batch Training Algorithm for SOM........................………………………13 3.3 Efficient initialization schemes for SOM..........………………………….14 Chapter 4 The Data Sets ……………………...……………………………………...16 4.1 Description of IRIS data...............……………………………………………16 4.2 Description of 8OX Data..........................……………………………………17 4.3 Description of ALL-AML_Leukemia [Go199]..........…………………………19 4.4 Description of Colon Tumor [Alo99]..........…………………………………...19 Chapter 5 Experimental Results.....................................…………………………….21 5.1 U-Matrix ……....................................................................................................21 5.2 P-Matrix.............................................................................................................22 5.3 U*-Matrix...........................................................................................................24 5.4 The Visualization Results....................................................................................26 Chapter 6 Conclusion and Future work................................................……………...35 References………………………………………………………………………............36

    [Alon1999] U. Alon et al., “Broad Patterns of Gene Expression Revealed by Clustering
    Analysis of Tumor and Normal colon Tissues Probed by Oligonucleotide Arrays”,
    Proceedings of National Academy of Sciences of the United States of America, vol.
    96, 6745-6750, 1999.
    [Arab1994] P. Arabie and L. Hubert, "Advanced methods in marketing research", Oxford:
    Blackwell, Cluster Analysis in Marketing Research, 1994.
    [Bald2002] P. Baldi and G. Hatfield, “DNA microarrays and gene expression”, Cambridge
    University Press, 2002.
    [Bere2005] M. Berens, H. Liu, L. Parsons, L. Yu, and Z. Zhao. “Fostering biological
    relevance in feature selection for microarray data”, IEEE Intelligent
    Systems, vol. 20, no. 6, 29–32, 2005.
    [Chen2005] C.C. Chen and H.T. Chu, "Similarity Measurement between Images", IEEE
    Conference on Computer Software and Algorithm (Compsac 2005),
    41~42, Edinburgh, UK, 2005.
    [Gant2008] G. Eskelsen and F. John, “The diverse and exploding digital universe”, an
    Updated Forecast of Worldwide Information Growth Through 2011, March
    2008.
    [Golu1999] T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class
    Prediction by Gene Expression Monitoring”, Science, vol. 286,
    531-537, 1999.
    [Goog2009] Google Scholar. http://scholar.google.com, February, 2009.
    [Hart1979] J.A Hartigan, M. A Wong “A K-means clustering algorithm”, J. of Royal Statistical
    Society, Ser. C, 1979.
    [Jain1988] A.K. Jain and R.C. Dukes, “Algorithms for Clustering Data”, Prentice Hall, New
    Jersey, 1988.
    [John1967] S.C. Johnson, "Hierarchical Clustering Schemes", Psychometrika, vol. 2
    241-254, 1967.
    [Juha1999] J. Vesanto, J.Himberg, E. Alhoniemi, and J. Parhankangas “Self-organizing map in
    Matlab: the SOM Toolbox”, Laboratory of Computer and
    Information Science, Helsinki University of Technology, Finland, 1999.
    [Koho1990] T. Kohonen, “The Self-Organizing Map”, Proceedings of The IEEE, vol. 78, no.
    9, 1464-1480, 1990.
    [Meye2000] R.D. Meyer and D. Cook, "Visualization of data", Mathematical and Statistical
    Sciences, Pfizer Central Research, Groton, Connecticut, USA, 2000.
    [Mika1999] S. Mika, G. Ratsch, J.Weston, B. Scholkopf, and K.R. Muller, “Fisher Discriminant
    Analysis with Kernels”, IEEE International Workshop on Neural Networks for
    Signal Processing, vol. 9, 41-48, 1999.
    [Theo2009] S. Theodoridis and K. Koutroumbas, “Pattern Recognition”, Academic Press, 4rd
    edition, 2009.
    [Tuke1977] J. Tukey and J.Wilder, "Exploratory data analysis", Addison-Wesley, 1977.
    [Ults2003a] A. Ultsch, “U*-Matrix: a Tool to visualize Clusters in high dimensional
    Data”, Data Bionics Research Lab, Department of Computer Science,
    38
    University of Marburg, Germany, 2003.
    [Ults2003b] A. Ultsch, “Pareto Density Estimation: Density Estimation for Knowledge
    Discovery”, Data Bionics Research Lab, Department of Computer Science,
    University of Marburg, Germany, 2003.
    [Ults2005] A. Ultsch, “Clustering with SOM: U*C”, Data Bionics Research Group,
    Department of Computer Science, University of Marburg, Germany, 2005.
    [Ults2007] A. Ultsch, “Emergence in Self Organizing Feature Maps”, Data Bionics
    Research Group, Department of Computer Science, University of Marburg,
    Germany, 2007.
    [Wang2007] T.Y. Wang, “A Study on Analyzing Microarray Data Using SVM and SOM”,
    Master Thesis, National Tsing Hua University, Taiwan, March, 2007.
    [Web01] http://www.cs.nthu.edu.tw/~cchen/ , last access on May 31, 2010.
    [Web02] http://www.ics.uci.edu/mlearn/MLRepository.html, last access on May 31, 2010.
    [Web03] http://www-genome.wi.mit.edu/cgi-bin/cancer/, last access on May 31, 2010.
    [Web04] http://microarray.princeton.edu/oncology/affydata/index.html, last access on May 31,
    2010.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE