簡易檢索 / 詳目顯示

研究生: 張育魁
Yu-Kwei Chang
論文名稱: 3D-SARST: To improve the accuracy of protein structural similarity search by three dimensional SARST maps
3D-SARST:藉由三維SARST圖譜來改進蛋白質結構搜尋之準確度
指導教授: 呂平江
Ping-Ching Lyu
口試委員:
學位類別: 碩士
Master
系所名稱: 生命科學暨醫學院 - 生物資訊與結構生物研究所
Institute of Bioinformatics and Structural Biology
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 97
中文關鍵詞: 蛋白質結構資料庫搜尋比對準確率
外文關鍵詞: 3D-SARST, database, protein, search, structure
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 中文摘要
    隨著生物科技的進步,被解出的蛋白質結構越來越多。本研究重點在於開發兼具高速度與高準確率的蛋白質結構搜尋軟體,而我們稱其為SARST ( Protein Structure Similarity Search by Ramachandran Codes )。此研究應用由蛋白質架構上的phi角和psi角組成的Ramachandran Plot可以偵測出蛋白質二級結構的原理,期望將複雜的三維蛋白質結構轉成簡單的一級序列,而如何定義與製造出能保留重要蛋白質結構特性的序列便是我們最大的難題與挑戰。SARST和現今常被拿來做蛋白質結構搜尋的兩個軟體BLAST及CE相比,已經同時擁有幾乎同等BLAST的速度與略遜CE僅4%的準確率。

    本研究重點在於利用加入更多蛋白質結構訊息進入將Ramachandran Plot的概念,把二維的圖譜展開成三維的立體方塊,並利用此方塊做結構轉換成序列的動作,企圖製造出更有意義的序列,我們稱此方法為3D-SARST。

    我們的3D-SARST在每個不同的情況下各自有較好的參數組合,透過事先將每個蛋白質分類,再使用最佳參數做搜尋,目前已經提升了2%的準確率,並且和SARST一樣擁有相當快的速度,相信經過程式的最佳化,3D-SARST將會更有效率。在這資訊爆炸的時代裡,蛋白質結構的資料每天都在增加,而蛋白質結構更堪稱是解開生命科學之秘的基石,我們可以相信擁有好的搜尋軟體就像是擁有一把最鋒利的劍,3D-SARST將會是這把劍,帶領我們解開生物的奧秘。


    Abstract
    The amount of protein structural data is growing so rapidly that fast and accurate structure similarity search tool is in a strong demand. We have developed a structural similarity search tool SARST (Structural similarity search Aided by Ramachandran Sequential Transformation) that is able to perform extremely rapid database search with accuracy comparable to CE (Combinatorial Extension) by using a linear encoding methodology. Now we aim to modify the linear encoding strategy of SARST by integrating more protein structural information to improve its accuracy.
    SARST linearly encode protein structures by utilizing a Ramachandran map organized by nearest-neighbor clustering. Traditionally, Ramachandran map is a two-dimensional (2D) plot displaying the distribution of dihedral angles (φ, ψ) of residues. Different regions on this map represent different secondary structural preferences of backbone local structures; however, structural information can be lost in the process of transforming the three-dimensional (3D) protein structure into the 2D map. Our speculation is that, if we can extend the Ramachandran plot into a 3D map by adding an extra axis describing another structural property of backbone conformation, more structural information can be preserved in the transformation processes and thus improves the performance of SARST. Hence, we call the new search tool developed based on this speculation 3D-SARST.
    3D-SARST, adopting the advantage of SARST, is a rapid database search tool with reasonable compromise of accuracy. Although we have not found a suitable condition to make it generally outperform SARST, we do find that 3D-SARST can achieve higher accuracy for various structural classes under specific conditions. According to the results, we can firstly determine the structural class of the query protein and then use 3D-SARST running under appropriate condition and parameter settings for that class to increase the accuracy of database searching. This two-step strategy has improved the precision of SARST by 2%, making its accuracy closer to CE. As the amount of protein structural data increases ever rapidly nowadays, we suppose that an efficient database search engine such as 3D-SARST can be valuable in many post-genomic research fields.

    Contents Abstract 5 中文摘要 6 Abbreviations 7 Chapter 1. Introduction 8 1.1 Motivation ………………………………………………………....8 1.2 Background ………………………………………………………..10 1.3 Purposes ………………………………………………………..........12 Chapter 2. Materials and Methods 13 Materials ………………………………………………………..................13 Methods ………………………………………………………..................13 2.1 Background ……………………………………………..................14 2.1.1 Conventional ramachandran plots …………………..................14 2.1.2 Angular parameters of 3D-SARST: omega, kappa, alpha, TCO and A3 ………………………………………………………..15 2.1.3 Distance parameters of 3D-SARST ……………..................16 2.2 Preparation of the training set ……………………………..................16 2.3 Map construction ………………………………………..................17 2.3.1 Parameter for map clustering: angular distance …..................18 2.3.2 The parameters for map clustering, TN, TD and Group number ………………………………………………………………....18 2.3.3 Optimization of the 3D SARST Map ……………..................19 2.4 Scoring matrix building …………………………..............................19 2.5 Structure alignment searching by FSA-BLAST ………..................21 2.6 Information retrieval …………………………......................................21 2.7 Optimization added by genetic algorithm …………………..................23 2.8 Class Analysis ………………….…………………………..................24 Chapter 3. Results 25 3.1 Adjustment of parameter ..……………………….................................25 3.1.1 Number of codes …………………………..............................25 3.1.2 Scaling constant of the scoring matrix and the gap penalties ……………………………………………………………25 3.2 Performance …………………………..............................................26 3.2.1 Speed evaluation …………………………..............................26 3.2.2 Initial evaluation of the accuracy ……………..................27 3.2.3 Improvement of the accuracy ……………..........................27 3.3 Other Experiments …………………..................................................28 3.3.1 RMSD ……………..................................................................28 3.3.2 Contour map …………………..................................................29 3.3.3 The partially fouth axis of SARST map ....……..................29 3.4 Implementary .………………….........................................................31 Chapter 4. Discussions 32 4.1 SARST maps …………………..........................................................32 4.1.1 From two dimensional map to three dimensional map ………..32 4.1.1.1 Angular information …………………..................................32 4.1.1.2 Distance information………………….................................33 4.1.2 From three dimensional Map to partially four dimensional Map ……………………………………………..…………………..34 4.1.2.1 Residue properties …………………....................................34 4.1.2.2 Other partially forth axis…………………...........................35 4.1.3 Other map modification …………………..................................35 4.2 Genetic algorithm.……………………………………………………..37 4.3 Overview …………………..................................................................37 4.4 Future perspectives …………………..................................................39 Chapter 5.Conclusions 40 Major Contributions 41 References 42 Contents of Tables and Figures Attached Tables and Figures………………………………………………………..45 Table 1.  Summary of 108 queries selected from SCOP all and SCOP 95…………………………………………………………….45 Table 2. Statistics of SARST Code for Different Structural Classes…..47 Table 3 Parameters of 3D-SARST……………………………………48 Table 4. Better parameters under different condition…………………..49 Table 5. 3D-SARST Speed Evaluation…………………………………50 Figure 1. Overview of 3D-SARST methodology. …………………51 Figure 2. Analysis of the training set: amino acid compositions……….52 Figure 3. Ramachandran plot with the third axes: alpha………………..53 Figure 4. Ramachandran plot with the third axes: kappa…....………….55 Figure 5. Ramachandran plot with the third axes: TCO………………..56 Figure 6. Ramachandran plot with the third axes: A3 …………………57 Figure 7. Nearest neighbor clustering method for RM Map Construct ..58 Figure 8. 3D-Maps …………………………………………………59 Figure 9. The flowchart of 3D Map construction …………………60 Figure 10. Optimization of the 3D SARST Map ………………………61 Figure 11. Code number Analyses …………………………………62 Figure 12. The F-measures of 3D-SARST with different scaling constant fs…………………………………………………………...64 Figure 13. The F-measures of 3D-SARST with different gap penalties …………………………………………………...65 Figure 14. Optimization of the performances of 3D-SARST by genetic algorithm. …………………………………………………66 Figure 15. Small scale IR performance of 3D-SARST under different condition …………………………………………………..67 Figure16. Large scale IR performance of 3D-SARST under different condition…………………………………………………...68 Figure 17. IR experiment results of 3D-SARST in class a ……………69 Figure 18 IR experiment results of 3D-SARST in class b ……………71 Figure 19 IR experiment results of 3D-SARST in class c ……………73 Figure 20 IR experiment results of 3D-SARST in class d ……………75 Figure 21 Overall IR performance of 3D-SARST …………………76 Figure 22. RMSD map…………………………………………………77 Figure 23. Contour map………………………………………………..78 Figure 24 Hydrophobicity used to be the partial 4th axis to construct maps. ………………………………………………………79 Figure 25 The information of distance deviation. D7-3 ……………80 Figure 26 The analysis of SARST codes with the third axis:alpha…...81 Figure 27 The performances of other experiments……..………….….82 Figure 28 Web service of 3D-SARST ….……………………………..83 Figure 29 Code analysis of 3D-SARST with the third axis of alpha.....84 Appendix ………………………………………………………………………….....85 Appendix 1.  Flowchart of SARST approaches ..………..…………….85 Appendix 2. The Ramachandran map of SARST ….……………....86 Appendix 3.  SARST Speed Evaluation ………..……………...…....87 Appendix 4. IR of SARST and other tools …………………………....88 Appendix 5. Angular information. ………………………………….89 Appendix 6. The scoring matrix of SARST…………………………….90 Appendix 7. Command of FSA-BLAST………………………………..91 Appendix 8. The flowchart of GA for 3D-SARST ………………….92 Appendix 9. The manuscript of the web service of 3D-SARST. ……….93

    References
    1. Berman, H.M., The Protein Data Bank: a historical perspective. Acta Crystallogr A, 2008. 64(Pt 1): p. 88-95.
    2. Sauder, J.M., J.W. Arthur, and R.L. Dunbrack, Jr., Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins, 2000. 40(1): p. 6-22.
    3. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25(17): p. 3389-402.
    4. Kolodny, R., P. Koehl, and M. Levitt, Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol, 2005. 346(4): p. 1173-88.
    5. Murzin, A.G., How far divergent evolution goes in proteins. Curr Opin Struct Biol, 1998. 8(3): p. 380-7.
    6. Shindyalov, I.N. and P.E. Bourne, Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng, 1998. 11(9): p. 739-47.
    7. Holm, L. and C. Sander, Protein structure comparison by alignment of distance matrices. J Mol Biol, 1993. 233(1): p. 123-38.
    8. Richardson, J. and D. Richardson, Principles and patterns of protein conformation. In: Prediction of protein structure and the principles of protein conformations, 1989. New York: Plenum: p. 1-98.
    9. Efimov, A.V., Standard structures in proteins. Prog Biophys Mol Biol, 1993. 60(3): p. 201-39.
    10. Guyon, F., et al., SA-Search: a web tool for protein structure mining based on a Structural Alphabet. Nucleic Acids Res, 2004. 32(Web Server issue): p. W545-8.
    11. Lesk, A., Application of sequence alignment methods to multiple structural alignment and superposition. In: Prague Stringology Club Workshop '98, 1998. Prague: p. 95-100.
    12. Levine, M., D. Stuart, and J. Williams, A method for the systematic comparison of the three-dimensional structures of proteins and some results. Acta Crystallogr A, 1984. A40: p. 600-610.
    13. Martin, A.C., The ups and downs of protein topology; rapid comparison of protein structure. Protein Eng, 2000. 13(12): p. 829-37.
    14. Carpentier, M., S. Brouillet, and J. Pothier, YAKUSA: a fast structural database scanning method. Proteins, 2005. 61(1): p. 137-51.
    15. Lo, W.C., et al., Protein structural similarity search by Ramachandran codes. BMC Bioinformatics, 2007. 8: p. 307.
    16. Tyagi, M., et al., Protein structure mining using a structural alphabet. Proteins, 2008. 71(2): p. 920-37.
    17. Yang, J.M. and C.H. Tung, Protein structure database search and evolutionary classification. Nucleic Acids Res, 2006. 34(13): p. 3646-59.
    18. Ramachandran, G.N. and V. Sasisekharan, Conformation of polypeptides and proteins. Adv Protein Chem, 1968. 23: p. 283-438.
    19. Huang, P. and P. Lyu, SARST: Structure Alignment by Ramachandran Search Tool. 2002.
    20. Chang, C.H. and P.C. Lyu, SARST: Structure Alignment by Ramachandran Search Tool - Intergrated Service over Internet. 2004.
    21. Aung, Z. and K.L. Tan, Rapid 3D protein structure database searching using information retrieval techniques. Bioinformatics, 2004. 20(7): p. 1045-52.
    22. Bertino, E., O. BC, and S.-D. R, Indexing techniques for advanced database systems. Kluwer Academic Publisher, 1997.
    23. Chandonia, J.M., et al., The ASTRAL Compendium in 2004. Nucleic Acids Res, 2004. 32(Database issue): p. D189-92.
    24. Cameron, M., H.E. Williams, and A. Cannane, A deterministic finite automaton for faster protein hit detection in BLAST. J Comput Biol, 2006. 13(4): p. 965-78.
    25. Holland, J.H., Adaptation in Natural and Artificial Systems. 1975.
    26. Schettino, V., The 41st IUPAC World Chemistry Congress. 5-11 August 2007, Turin, Italy. IDrugs, 2007. 10(10): p. 706-8.
    27. Nelson, D.L. and M.M. Cox, Lehninger Principles of Biochemistry. Worth Publishers. Third edition.
    28. Rooman, M.J., J.P. Kocher, and S.J. Wodak, Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. J Mol Biol, 1991. 221(3): p. 961-79.
    29. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22(12): p. 2577-637.
    30. Hubbard, T.J., et al., SCOP: a structural classification of proteins database. Nucleic Acids Res, 1997. 25(1): p. 236-9.
    31. Jain, A. and R. Dubes, Algorithms for clustering data. New Jersey: Prentice Hall, 1988.
    32. DeLano, W., The PyMOL molecular graphics system. In. San Carlos, CA, USA: DeLano Scientific, 2002.
    33. Sayle, R.A. and E.J. Milner-White, RASMOL: biomolecular graphics for all. Trends Biochem Sci, 1995. 20(9): p. 374.
    34. Henikoff, S. and J.G. Henikoff, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A, 1992. 89(22): p. 10915-9.
    35. Zhu, J. and Z. Weng, FAST: a novel protein structure alignment algorithm. Proteins, 2005. 58(3): p. 618-27.
    36. Hripcsak, G. and A.S. Rothschild, Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc, 2005. 12(3): p. 296-8.
    37. Paul, J.B., A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992. 14: p. 239-256.
    38. Ho, B.K. and R. Brasseur, The Ramachandran plots of glycine and pre-proline. BMC Struct Biol, 2005. 5: p. 14.
    39. Berjanskii, M.V., S. Neal, and D.S. Wishart, PREDITOR: a web server for predicting protein torsion angle restraints. Nucleic Acids Res, 2006. 34(Web Server issue): p. W63-9.
    40. Xue, B., et al., Real-value prediction of backbone torsion angles. Proteins, 2008. 72(1): p. 427-33.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE