簡易檢索 / 詳目顯示

研究生: 楊巧筠
Chiao Yun Yang
論文名稱: 利用資料融合技術促進蛋白質結構分類問題之效能
Novel Strategy in Data Fusion Facilitates Protein Structure Classification
指導教授: 唐傳義
C.Y.Tang
許德標
D.Frank Hsu
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 40
中文關鍵詞: 蛋白質結構資料融合多層式學習
外文關鍵詞: Protein Structure, Data fusion, HLA
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 蛋白質結構分類問題在資訊生物學上始終是個很熱門的話題。到目前為止,從蛋白質的一級結構來分類蛋白質已經有很高的準確率。然而,對於更進一步的蛋白質分類,卻由於資料量過於龐大而成為一個挑戰。近期的研究已能達到65 % 的正確率對於將蛋白質分成不同的27個folds 之中。在此篇研究中,我們結合了資料融合技術和階層式機器學習架構將之應用在Ding and Dubchak 所提出的蛋白質分類資料上。我們將準確率提昇至69.6%。我們驗證了資料融合技術是個簡單且相當有用的方法並且可以應用在多個不同領域。


    The classification of protein structures is essential for their function determination in bioinformatics. At present time, one can achieve high prediction accuracy easily from primary amino acid sequences. However, for further classification into various folding categories, presents a challenge to large number of folds. Recently study yielded high prediction accuracy of 65% on an independent set of 27 most populated folds. In this work, we combine data fusion scheme and a hierarchical learning architecture (HLA) and apply it on the data set gathered by Ding and Dubchak[12]. We are able to achieve an overall accuracy of 69.6%. We demonstrate that data fusion is a simple and useful scheme and could be applied to various fields.

    Content Abstract Acknowledgement 1.Introduction 1.1 Introduction 1 1.2 Protein Structure 4 1.3 Neural Network 7 1.4 Data Fusion 9 1.5 Data Fusion for Protein Structure Prediction 11 2.Previous Work 2.1 Ding and Dubchak 12 2.2 Huang`s work 14 3.Method of Combination 3.1 SCOP and Data set 17 3.2 Features 23 3.2.1 Direct Coding Methods 3.2.2 Indirect Coding Methods 3.3 Combination 25 3.3.1 HLA embedded data fusion 3.3.2 Diversity Graph and data fusion 3.3.2.1 Method of Combination and Feature Selection 3.3.2.2 The Diversity Rank/Score Graph 3.3.2.3 Simple case for data fusion 4.Results and Discussion 4.1 Results 33 4.2 Discussion 37 Bibliography

    Bibliography
    [1] C.J.Lin and C.W.Hsu, “A comparison of methods for multi-class support vector machines, ” IEEE Trans. on Neural Networks, Vol. 13,pp.415-235,2002.
    [2] J. Moody and C.J.Darken, “Fast learning in networks of locally tuned processing units, ” Neural Computation, Vol. 1,No. 2, pp.281-294,1989.
    [3] M.Gerstein and M.Levitt, “Comprehensive assessment of automatic structural alignment against a manual standard , the scop classification of proteins,” Protein Science,7,pp.445-456,1998.
    [4] S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squares learning algorithm for radial basis function network,” IEEE Trans. Neural Networks Vol.2(2) pp.302-309,1991.
    [5] J. A. Leonard, M. A. Kramer, and L. H. Unger, “Using radial basis function to approximate a function and its error bounds,”IEEE Trans. Neural Networks Vol.3(4),pp. 624-627,1992.
    [6] L. Lo Conte, B. Ailey, T. J. P. Hubbard, S. E. Brenner, A. G. Murzin, and C. Chothia, “SCOP: A structural classification of proteins database.” Nucleic Acids Res., vol. 28, no. 1, 2000, pp. 257-259.
    [7] F. M. Pearl, D. Lee, J. E. Bray, I. Sillitoe, A. E. Todd, A. P. Harrison, J. M. Thornton, C. A. Orengo, “Assigning genomic sequences to CATH,” Nucleic Acids Res., vol. 28, no. 2, 2000, pp. 584-599.
    [8] A. Antonina, H. Dave, E. B. Steven, J. P. H. Tim, C. Cyrus, and G. M. Alexey, “SCOP database in 2004: refinements integrate structure and sequence family data,” Nucleic Acids Research, vol. 32, 2004, pp.226-229.
    [9] K. C. Chou and C. T. Zhang, “Prediction of protein structural classes,” Crit. Rev. in Biochem. Mol. Biol., vol. 30, no. 4, 1995, pp. 275-349.
    [10] I. Dubchak, I. Muchnik, S. R. Holbrook, and S. H. Kim, “Prediction of protein folding class using global description of amino acid sequence,” Proc. Natl. Acad. Sci. USA, vol. 92, 1995, pp. 8700-8704.
    [11] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, “SCOP: A structural classification of proteins database for the investigation of sequence and structures,” J. Mol1. Biol., vol. 247, 1995, pp. 536-540.
    [12] C. H. Q. Ding and I. Dubchak, “Multi-class protein fold recognition using support vector machines and neural networks,” Bioinformatics, vol. 17, no. 4, 2001, pp. 349-358.
    [13] V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.
    [14] C. D. Huang, C. T. Lin, and N. R. Pal, “Hierarchical Learning Architecture with Automatic Feature Selection for Multi-Class Protein Fold Classification,” IEEE Trans. NanoBioscience, vol. 2, no. 4, 2003, pp. 503-517.
    [15] A Verikas and M. Bacauskiene, “Feature selection with neural networks,” Pattern Recognition Letter, Vol.23,pp. 1323-1335, 2002.
    [16] I. Dubchak, I. Muchnik, C. Mayor, I. Dralyuk, and S. H. Kim, “Recognition of a protein fold in the context of the SCOP classification,” Proteins, vol. 35, 1999, pp. 401-407.
    [17] L. L. Conte, S. E. Brenner, T. J. Hubbard, S. E. Brenner, A. G.. Murzin, and C. Chothia, “SCOP database in 2002: Refinements accommodate structural genomics,” Nucleic Acids Res., vol. 30, no. 1, 2002, pp. 264-267.
    [18] P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, MA.
    [19] C. H. Wu, Neural Networks and Genome Informatics. Amsterdam. The Netherlands: Elsevier, 2000.
    [20] J. Moody and C. J. Darken, “Fast learning in networks of locally tuned processing units,” Neural Computation, vol. 1, no. 2, 1989, pp. 281-294.
    [21] N. J. Belken, P. B. Kantor, E. A. Fox, and J. A. Shaw, “Combining evidence of multiple query representation for information retrieval,” Information Processing and Management, vol. 31, no. 3, 1995, pp. 431-448.
    [22] C. C. Vogt and G. W. Cotrell, “Fusion via a linear combination of scores,” Information Retrieval, vol. 1, 1999, pp. 151-172.
    [23] K. B. Ng and P. B. Kantor, “Predicting the effectiveness of naïve data fusion on the basis of system characteristics,” J. American Society for Information Sci., vol. 51. no. 13, 2000, pp. 1177:1189.
    [24] D. F. Hsu, J. Shapiro, and I. Taksa, “Methods of data fusion in information retreival: rank vs. score combination,” DIMACS Technical Report 58, 2002.
    [25] D. F. Hsu and I. Taksa, “Comparing rank and score combination methods for data fusion in information retrieval,” Information Retrieval, in press 2004.
    [26] L. Xu, A. Krzyzak, and C. Y. Suen, “Method of Combining Multiple Classifiers and their Application to Handwriting Recognition,” IEEE Trans SMC, vol. 22, 1992, pp. 418-435.
    [27] C. M. R. Ginn, P. Willett, and J. Bradshaw, “Combination of Molecular Similarity Measures Using Data Fusion,” Perspectives in Drug Discovery and Design, vol. 20, 2000, pp.1-16.
    [28] M. A. Kuriakose, W. T. Chen, Z. M. He, A. G. Sikora, P. Zhang, Z. Y. Zhang, W. L. Qiu, D. F. Hsu, C. M. Coffran, S. M. Brown, E. M. Elango, M. D. Delacure, and F. A. Chen, “Selection and Validation of Differentially Expressed Genes in Head and Neck Cancer,” Cellular and Mol. Life Sci., vol. 61, 2004, pp. 1372-1383.
    [29] H. Y. Chuang, H. F. Liu, S. Brown, C. M. Coffran, and D. F. Hsu, “Identifying significant genes from microarray Data,” Proc. IEEE Symp. Bioinformatics and Bioengineering, pp. 358-365.
    [30] H. Y. Chuang, H. F. Liu, F. A. Chen, C. Y. Kao, and D. F. Hsu, “Combination methods in microarray analysis,” Proc. 7th Intl. Symp. Parallel Architectures, Algorithms and Networks, pp. 625-630.
    [31] B. Rost, C. Sander, “Prediction of protein secondary structure at better than 70% accuracy,” J. Mol. Biol., vol. 232, 1993, pp. 584-599.
    [32] J.-M Yang, Y.-F. Chen, T. –W Shen, B.S. Kristal and D.F. Hsu; Consensus scoring criteria for improving enrichment in virtual screening, Journal of Chemical Information and Modeling,(2005),in press.
    [33] D.F.Hsu and A. Palumbo; A study of data fusion in Cayley graphs G(Sn,Pn), Proceeding of the 7th International Symposium on Parallel Architecture,Algorithms,and Networks(I-SPAN ’04),IEEE Computer Society,(2004),p.557-562.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE