研究生: |
楊書欣 Yang, Shu-Hsin |
---|---|
論文名稱: |
利用基因演算法及圖型探勘來產生圖型原型以利圖型分類 Graph Prototype Generation for Graph Classification Using Genetic Algorithms and Graph Mining |
指導教授: |
蘇豐文
Soo, Von-Wun |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 英文 |
論文頁數: | 40 |
中文關鍵詞: | 圖型 、原型 、分類 、基因演算法 |
外文關鍵詞: | graph, prototype, classification, genetic algorithm |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,圖形被廣泛地運用在表示結構化的事物上。這裡所用的「圖形」是指一群有標號或沒有標號的點以及一群有標號或沒有標號,有向或無向的邊所形成的組合。舉例來說,一個圖形可以表示化合物的結構,網頁的連結狀態,以及其他數種結構化的物件。
雖然基於圖形的資料表示型態在近年來越來越流行,然而缺少強大的分析工具是它的最重大的弱點。為了改善這個弱點,有很多的研究都嘗試著把圖形化的資料映射至特徵向量上,如此就可以套用數值化的分類器,例如SVM、Boosting等等,在這些資料上。
在這篇論文中,我們提供了另一種觀點去處理圖形分類的問題。那就是原型產生的方法。透過直接運用圖形的結構特徵,利用基因演算法去產生能夠很好的表達類別間之差異的圖形原型。在這個方法中,我們提出了一種基於圖形的基因運算子來實作圖形間的交配繁殖以產生下一子代圖形的方法。另外,我們也利用gSpan來找出圖形之間的最大相同子圖(Maximal Common Subgraph),透過最大相同子圖算出圖形之間的距離,來作為圖形分類的依據以及圖形原型演化的合適度計算。
用我們的方法來輔助圖形分類的結果,可以一定程度的提高分類的正確率。在跟其他方法的比較中,是趨近於最佳的。而且,只要有一個演化的指標,這個圖形基因演算的方法就可以套用在任何其他的方法上。
Recently, graphs are widely used to represent structured objects. The term ‘graphs’ used here means a combination of labeled/unlabeled vertices and directed/undirected labeled/unlabeled edges. For example, a graph can represent the structure of chemical compounds , web page linking , and many other kinds of structured data.
Although graph-based data is becoming more and more popular lately, the lack of powerful analytic tools is its major weakness and bottleneck. To cope with the disadvantage, various approaches attempted to map graph-based data structure into feature vectors so that they can apply statistical classifiers such as Support Vector Machine (SVM) , Boosting, etc. to classify the graphs.
In this thesis, we propose a new point of view to deal with graph classification problems, i.e. prototype generation approach. That is, directly use the features of graphs (node labels, edge labels, connected components, subgraphs) to generate prototypes for each class that maximize the difference between intra-class similarity and inter-class similarity. In this approach, a graph-based genetic algorithm which includes genetic operators is used to generate offsprings, and gSpan (graph-based Substructure pattern mining) [Yan & Han 2002] is used to mining subgraphs to compute the fitness of selected prototypes.
The classification accuracy of our method is near the best compared with other approaches with statistical classifiers. And it can be applied on almost every approach if a precise objective is given.
Bringmann, B., Zimmermann, A., Raedt, L. D., Nijssen, S. Don’t Be Afraid of
Simpler Patterns. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases. 55-66, 2006.
Bunke, H. Error Correcting Graph Matching: On The Influence of The Underlying Cost
Function. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 21, No. 9, 917-922, 1999.
Bunke, H., Shearer, K., 1999. A Graph Distance Metric Based on The Maximal Common
Subgraph. Pattern Recognition Letters, Volume 19 , 255-259, 1998.
Borgwardt, K., Yan, X., Thoma, M., Cheng, H., Gretton, A., Song, L., Smola, A., Han, J.,
Yu, P., Kriegel, J. P. Combining Near-optimal Feature Selection with gSpan. 6th
International Workshop on Mining and Learning with Graphs, 2008.
Cheng, H., Yan, X., Han, J., Hsu, H. Discriminative Frequent Pattern Analysis for
Effective Classification. In Proceedings of ICDE 2007, 716-725, 2007.
Corneil, D. G., Gotlieb, C. C. An Efficient Algorithm for Graph Isomorphism. Journal of
The ACM, Volume 17, No. 1, 51-64, 1970.
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G. Frequent Substructure-Based
Approaches for Classifying Chemical Compounds. IEEE Transactions on Knowledge and Data Engineering, Volume 17, No. 8, 1036-1050, 2005.
DTP, AIDS antiviral screen, 2004
http://dtp.nci.nih.gov/docs/aids/aids_data.html
Fernández, M., Valiente, G. A Graph Distance Metric Combining Maximum Common
Subgraph and Minimum Common Supergraph. Pattern Recognition Letters, Volume 22, No. 6, 753-758, 2001.
Fischer, A., Riesen, K., Bunke, H. An Experimental Study of Graph Classification Using
Prototype Selection. International Conference Pattern Recognition, 1-4, 2008.
Horaud, R., Skordas, T. Stereo Correspondence Through Feature Grouping and Maximal
Cliques. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 11, No. 11, 1168-1180, 1989.
Kashima, H., Tsuda, K., Inokuchi, A. Marginalized Kernels Between Labeled Graphs. In
Proceedings of the 21st International Conference on Machine Learning, 321-328 , 2003.
Levinson, R., Woods, W. A., Graham, T. W. Pattern Associativity and The Retrieval of
Semantic Networks, 1992.
Liu, C., Yan, X., Yu. H., Han, J., Yu, P. S. Mining Behavior Graphs for Backtrace of
Noncrashing Bugs. In Proceedings of SIAM International Conference on Data Mining, 2005.
Raveaux, R., Eugen, B., Locteau, H., Adam, S., Héroux, P., Trupin, E. A Graph
Classification Approach Using A Multi-objective Genetic Algorithm. Application to Symbol Recognition. Lecture Notes in Computer Science, Volume 4538, 361-370, 2007.
Riesen, K., Neuhaus, M., Bunke, H. Graph Embedding in Vector Spaces by Means of
Prototype Selection. Lecture Notes in Computer Science, Volume 4538, 383-393, 2007.
Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., Tsuda, K. gBoost: A Mathematical
Programming Approach to Graph Classification and Regression. Machine Learning Journal, Volume 75, 69-89, 2008.
Shapiro, G., Haralick, M., Structural Descriptions and Inexact Matching. IEEE
Transactions on Pattern Analysis and Machine Intelligence, Volume PAMI-3, 504-519, 1981.
Yan, X., Han, J. gSpan: Graph-based Substructure Pattern Mining. In Proceedings of
the 2002 IEEE international conference on data mining, 721-724, 2002.
Yan, X., Han, J. gSpan: Graph-based Substructure Pattern Mining (Technical
report). Department of Computer Science, University of Illinois at Urbana-Champaign, 2002.