簡易檢索 / 詳目顯示

研究生: 楊書欣
Yang, Shu-Hsin
論文名稱: 利用基因演算法及圖型探勘來產生圖型原型以利圖型分類
Graph Prototype Generation for Graph Classification Using Genetic Algorithms and Graph Mining
指導教授: 蘇豐文
Soo, Von-Wun
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 40
中文關鍵詞: 圖型原型分類基因演算法
外文關鍵詞: graph, prototype, classification, genetic algorithm
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,圖形被廣泛地運用在表示結構化的事物上。這裡所用的「圖形」是指一群有標號或沒有標號的點以及一群有標號或沒有標號,有向或無向的邊所形成的組合。舉例來說,一個圖形可以表示化合物的結構,網頁的連結狀態,以及其他數種結構化的物件。
    雖然基於圖形的資料表示型態在近年來越來越流行,然而缺少強大的分析工具是它的最重大的弱點。為了改善這個弱點,有很多的研究都嘗試著把圖形化的資料映射至特徵向量上,如此就可以套用數值化的分類器,例如SVM、Boosting等等,在這些資料上。
    在這篇論文中,我們提供了另一種觀點去處理圖形分類的問題。那就是原型產生的方法。透過直接運用圖形的結構特徵,利用基因演算法去產生能夠很好的表達類別間之差異的圖形原型。在這個方法中,我們提出了一種基於圖形的基因運算子來實作圖形間的交配繁殖以產生下一子代圖形的方法。另外,我們也利用gSpan來找出圖形之間的最大相同子圖(Maximal Common Subgraph),透過最大相同子圖算出圖形之間的距離,來作為圖形分類的依據以及圖形原型演化的合適度計算。
    用我們的方法來輔助圖形分類的結果,可以一定程度的提高分類的正確率。在跟其他方法的比較中,是趨近於最佳的。而且,只要有一個演化的指標,這個圖形基因演算的方法就可以套用在任何其他的方法上。


    Recently, graphs are widely used to represent structured objects. The term ‘graphs’ used here means a combination of labeled/unlabeled vertices and directed/undirected labeled/unlabeled edges. For example, a graph can represent the structure of chemical compounds , web page linking , and many other kinds of structured data.
    Although graph-based data is becoming more and more popular lately, the lack of powerful analytic tools is its major weakness and bottleneck. To cope with the disadvantage, various approaches attempted to map graph-based data structure into feature vectors so that they can apply statistical classifiers such as Support Vector Machine (SVM) , Boosting, etc. to classify the graphs.
    In this thesis, we propose a new point of view to deal with graph classification problems, i.e. prototype generation approach. That is, directly use the features of graphs (node labels, edge labels, connected components, subgraphs) to generate prototypes for each class that maximize the difference between intra-class similarity and inter-class similarity. In this approach, a graph-based genetic algorithm which includes genetic operators is used to generate offsprings, and gSpan (graph-based Substructure pattern mining) [Yan & Han 2002] is used to mining subgraphs to compute the fitness of selected prototypes.
    The classification accuracy of our method is near the best compared with other approaches with statistical classifiers. And it can be applied on almost every approach if a precise objective is given.

    中文摘要 2 Abstract 3 Acknowledgement 5 Table of Contents..................................................................................................................6 List of Figures......................................................................................................................8 List of Tables........................................................................................................................9 Introduction 10 1.1 Problem Statement 10 1.2 Motivation..................................................................................................................11 1.2 Related Work 11 Background 14 2.1 Graphs 14 2.2 Graph Distance Measure 15 2.3 Prototype Classification in Graphs 17 2.4 Genetic Algorithm 20 2.5 gSpan 22 Methodology 24 3.1 Graph Distance Evaluation 24 3.2 Prototype Generation using Graph-Based Genetic Algorithm 26 3.3 Training and Classification Process 29 Experimental Evaluation 34 4.1 Dataset........................................................................................................................34 4.2 Data Cohesion Evaluation..........................................................................................34 4.3 Prototype Generation Performance Evaluation..........................................................35 4.4 Classification Result...................................................................................................36 Conclusion & Future Work 38 Reference 39

    Bringmann, B., Zimmermann, A., Raedt, L. D., Nijssen, S. Don’t Be Afraid of
    Simpler Patterns. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases. 55-66, 2006.
    Bunke, H. Error Correcting Graph Matching: On The Influence of The Underlying Cost
    Function. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 21, No. 9, 917-922, 1999.
    Bunke, H., Shearer, K., 1999. A Graph Distance Metric Based on The Maximal Common
    Subgraph. Pattern Recognition Letters, Volume 19 , 255-259, 1998.
    Borgwardt, K., Yan, X., Thoma, M., Cheng, H., Gretton, A., Song, L., Smola, A., Han, J.,
    Yu, P., Kriegel, J. P. Combining Near-optimal Feature Selection with gSpan. 6th
    International Workshop on Mining and Learning with Graphs, 2008.
    Cheng, H., Yan, X., Han, J., Hsu, H. Discriminative Frequent Pattern Analysis for
    Effective Classification. In Proceedings of ICDE 2007, 716-725, 2007.
    Corneil, D. G., Gotlieb, C. C. An Efficient Algorithm for Graph Isomorphism. Journal of
    The ACM, Volume 17, No. 1, 51-64, 1970.
    Deshpande, M., Kuramochi, M., Wale, N., Karypis, G. Frequent Substructure-Based
    Approaches for Classifying Chemical Compounds. IEEE Transactions on Knowledge and Data Engineering, Volume 17, No. 8, 1036-1050, 2005.
    DTP, AIDS antiviral screen, 2004
    http://dtp.nci.nih.gov/docs/aids/aids_data.html
    Fernández, M., Valiente, G. A Graph Distance Metric Combining Maximum Common
    Subgraph and Minimum Common Supergraph. Pattern Recognition Letters, Volume 22, No. 6, 753-758, 2001.
    Fischer, A., Riesen, K., Bunke, H. An Experimental Study of Graph Classification Using
    Prototype Selection. International Conference Pattern Recognition, 1-4, 2008.
    Horaud, R., Skordas, T. Stereo Correspondence Through Feature Grouping and Maximal
    Cliques. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 11, No. 11, 1168-1180, 1989.
    Kashima, H., Tsuda, K., Inokuchi, A. Marginalized Kernels Between Labeled Graphs. In
    Proceedings of the 21st International Conference on Machine Learning, 321-328 , 2003.
    Levinson, R., Woods, W. A., Graham, T. W. Pattern Associativity and The Retrieval of
    Semantic Networks, 1992.
    Liu, C., Yan, X., Yu. H., Han, J., Yu, P. S. Mining Behavior Graphs for Backtrace of
    Noncrashing Bugs. In Proceedings of SIAM International Conference on Data Mining, 2005.
    Raveaux, R., Eugen, B., Locteau, H., Adam, S., Héroux, P., Trupin, E. A Graph
    Classification Approach Using A Multi-objective Genetic Algorithm. Application to Symbol Recognition. Lecture Notes in Computer Science, Volume 4538, 361-370, 2007.
    Riesen, K., Neuhaus, M., Bunke, H. Graph Embedding in Vector Spaces by Means of
    Prototype Selection. Lecture Notes in Computer Science, Volume 4538, 383-393, 2007.
    Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., Tsuda, K. gBoost: A Mathematical
    Programming Approach to Graph Classification and Regression. Machine Learning Journal, Volume 75, 69-89, 2008.
    Shapiro, G., Haralick, M., Structural Descriptions and Inexact Matching. IEEE
    Transactions on Pattern Analysis and Machine Intelligence, Volume PAMI-3, 504-519, 1981.
    Yan, X., Han, J. gSpan: Graph-based Substructure Pattern Mining. In Proceedings of
    the 2002 IEEE international conference on data mining, 721-724, 2002.
    Yan, X., Han, J. gSpan: Graph-based Substructure Pattern Mining (Technical
    report). Department of Computer Science, University of Illinois at Urbana-Champaign, 2002.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE