研究生: |
牧摩度 Momodou Lamim Sanyang |
---|---|
論文名稱: |
Data Visualization by Self-Organizing Map 以SOM做資料視覺化 |
指導教授: |
陳朝欽
Chen, Chaur-Chin |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 38 |
中文關鍵詞: | SOM 、資料視覺化 |
外文關鍵詞: | SOM, Data Visualization |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Data visualization is very paramount nowadays for the simple fact that we have
acquired huge and complex data and are increasingly accumulating more and more due
to cheap storage devices recently. Most of those data are high-dimensional and
therefore hard for human to visualize. Efforts were made to alleviate this
high-dimensional visualization problem and through researchers endeavor,
Self-Organizing Map (SOM) was born.
The Self-Organizing Map is an unsupervised neural network algorithm that
projects high-dimensional data onto a two-dimensional map which we human can
easily visualize. The projection preserves the topology of the data so that similar data
items will be mapped to nearby locations on the map. It is a powerful method for data
mining and cluster extraction and very useful for processing data of high
dimensionality and complexity.
There are several visualization methods which present different aspects of the
information learned by the SOM to gain insight and guide segmentation of the data. In
this thesis, common visualization methods such as dendrogram, 2d-dendrogram,
principal component projection, label of maps, U-matrix and some recently introduced
methods such as P-matrix and the U*-Matrix plots are used to visualize the results on
four data sets: IRIS which has 150 patterns with 3 classes, each class has 50 patterns,
each pattern has four features; 8OX has 45 patterns with 3 classes, each class has 15
patterns, each pattern has 8 features; A microarray data set ALL-AML Leukemia with
38 patients of 2 classes (27 ALL, 11 AML), each patient has 7129 genes; and Colon
Tumor with 62 samples (22 normal, 40 tumor) of 2 classes with a total of 2000 genes.
The visualization results of each of these data sets are reported using the
aforementioned methods, the 2d-dendrogram method seems to be a better tool for
visualizing the microarray data and all the methods perform well on the IRIS and 8OX.
[Alon1999] U. Alon et al., “Broad Patterns of Gene Expression Revealed by Clustering
Analysis of Tumor and Normal colon Tissues Probed by Oligonucleotide Arrays”,
Proceedings of National Academy of Sciences of the United States of America, vol.
96, 6745-6750, 1999.
[Arab1994] P. Arabie and L. Hubert, "Advanced methods in marketing research", Oxford:
Blackwell, Cluster Analysis in Marketing Research, 1994.
[Bald2002] P. Baldi and G. Hatfield, “DNA microarrays and gene expression”, Cambridge
University Press, 2002.
[Bere2005] M. Berens, H. Liu, L. Parsons, L. Yu, and Z. Zhao. “Fostering biological
relevance in feature selection for microarray data”, IEEE Intelligent
Systems, vol. 20, no. 6, 29–32, 2005.
[Chen2005] C.C. Chen and H.T. Chu, "Similarity Measurement between Images", IEEE
Conference on Computer Software and Algorithm (Compsac 2005),
41~42, Edinburgh, UK, 2005.
[Gant2008] G. Eskelsen and F. John, “The diverse and exploding digital universe”, an
Updated Forecast of Worldwide Information Growth Through 2011, March
2008.
[Golu1999] T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class
Prediction by Gene Expression Monitoring”, Science, vol. 286,
531-537, 1999.
[Goog2009] Google Scholar. http://scholar.google.com, February, 2009.
[Hart1979] J.A Hartigan, M. A Wong “A K-means clustering algorithm”, J. of Royal Statistical
Society, Ser. C, 1979.
[Jain1988] A.K. Jain and R.C. Dukes, “Algorithms for Clustering Data”, Prentice Hall, New
Jersey, 1988.
[John1967] S.C. Johnson, "Hierarchical Clustering Schemes", Psychometrika, vol. 2
241-254, 1967.
[Juha1999] J. Vesanto, J.Himberg, E. Alhoniemi, and J. Parhankangas “Self-organizing map in
Matlab: the SOM Toolbox”, Laboratory of Computer and
Information Science, Helsinki University of Technology, Finland, 1999.
[Koho1990] T. Kohonen, “The Self-Organizing Map”, Proceedings of The IEEE, vol. 78, no.
9, 1464-1480, 1990.
[Meye2000] R.D. Meyer and D. Cook, "Visualization of data", Mathematical and Statistical
Sciences, Pfizer Central Research, Groton, Connecticut, USA, 2000.
[Mika1999] S. Mika, G. Ratsch, J.Weston, B. Scholkopf, and K.R. Muller, “Fisher Discriminant
Analysis with Kernels”, IEEE International Workshop on Neural Networks for
Signal Processing, vol. 9, 41-48, 1999.
[Theo2009] S. Theodoridis and K. Koutroumbas, “Pattern Recognition”, Academic Press, 4rd
edition, 2009.
[Tuke1977] J. Tukey and J.Wilder, "Exploratory data analysis", Addison-Wesley, 1977.
[Ults2003a] A. Ultsch, “U*-Matrix: a Tool to visualize Clusters in high dimensional
Data”, Data Bionics Research Lab, Department of Computer Science,
38
University of Marburg, Germany, 2003.
[Ults2003b] A. Ultsch, “Pareto Density Estimation: Density Estimation for Knowledge
Discovery”, Data Bionics Research Lab, Department of Computer Science,
University of Marburg, Germany, 2003.
[Ults2005] A. Ultsch, “Clustering with SOM: U*C”, Data Bionics Research Group,
Department of Computer Science, University of Marburg, Germany, 2005.
[Ults2007] A. Ultsch, “Emergence in Self Organizing Feature Maps”, Data Bionics
Research Group, Department of Computer Science, University of Marburg,
Germany, 2007.
[Wang2007] T.Y. Wang, “A Study on Analyzing Microarray Data Using SVM and SOM”,
Master Thesis, National Tsing Hua University, Taiwan, March, 2007.
[Web01] http://www.cs.nthu.edu.tw/~cchen/ , last access on May 31, 2010.
[Web02] http://www.ics.uci.edu/mlearn/MLRepository.html, last access on May 31, 2010.
[Web03] http://www-genome.wi.mit.edu/cgi-bin/cancer/, last access on May 31, 2010.
[Web04] http://microarray.princeton.edu/oncology/affydata/index.html, last access on May 31,
2010.