研究生: |
張育魁 Yu-Kwei Chang |
---|---|
論文名稱: |
3D-SARST: To improve the accuracy of protein structural similarity search by three dimensional SARST maps 3D-SARST:藉由三維SARST圖譜來改進蛋白質結構搜尋之準確度 |
指導教授: |
呂平江
Ping-Ching Lyu |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
生命科學暨醫學院 - 生物資訊與結構生物研究所 Institute of Bioinformatics and Structural Biology |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 英文 |
論文頁數: | 97 |
中文關鍵詞: | 蛋白質 、結構 、資料庫 、搜尋 、比對 、準確率 |
外文關鍵詞: | 3D-SARST, database, protein, search, structure |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
中文摘要
隨著生物科技的進步,被解出的蛋白質結構越來越多。本研究重點在於開發兼具高速度與高準確率的蛋白質結構搜尋軟體,而我們稱其為SARST ( Protein Structure Similarity Search by Ramachandran Codes )。此研究應用由蛋白質架構上的phi角和psi角組成的Ramachandran Plot可以偵測出蛋白質二級結構的原理,期望將複雜的三維蛋白質結構轉成簡單的一級序列,而如何定義與製造出能保留重要蛋白質結構特性的序列便是我們最大的難題與挑戰。SARST和現今常被拿來做蛋白質結構搜尋的兩個軟體BLAST及CE相比,已經同時擁有幾乎同等BLAST的速度與略遜CE僅4%的準確率。
本研究重點在於利用加入更多蛋白質結構訊息進入將Ramachandran Plot的概念,把二維的圖譜展開成三維的立體方塊,並利用此方塊做結構轉換成序列的動作,企圖製造出更有意義的序列,我們稱此方法為3D-SARST。
我們的3D-SARST在每個不同的情況下各自有較好的參數組合,透過事先將每個蛋白質分類,再使用最佳參數做搜尋,目前已經提升了2%的準確率,並且和SARST一樣擁有相當快的速度,相信經過程式的最佳化,3D-SARST將會更有效率。在這資訊爆炸的時代裡,蛋白質結構的資料每天都在增加,而蛋白質結構更堪稱是解開生命科學之秘的基石,我們可以相信擁有好的搜尋軟體就像是擁有一把最鋒利的劍,3D-SARST將會是這把劍,帶領我們解開生物的奧秘。
Abstract
The amount of protein structural data is growing so rapidly that fast and accurate structure similarity search tool is in a strong demand. We have developed a structural similarity search tool SARST (Structural similarity search Aided by Ramachandran Sequential Transformation) that is able to perform extremely rapid database search with accuracy comparable to CE (Combinatorial Extension) by using a linear encoding methodology. Now we aim to modify the linear encoding strategy of SARST by integrating more protein structural information to improve its accuracy.
SARST linearly encode protein structures by utilizing a Ramachandran map organized by nearest-neighbor clustering. Traditionally, Ramachandran map is a two-dimensional (2D) plot displaying the distribution of dihedral angles (φ, ψ) of residues. Different regions on this map represent different secondary structural preferences of backbone local structures; however, structural information can be lost in the process of transforming the three-dimensional (3D) protein structure into the 2D map. Our speculation is that, if we can extend the Ramachandran plot into a 3D map by adding an extra axis describing another structural property of backbone conformation, more structural information can be preserved in the transformation processes and thus improves the performance of SARST. Hence, we call the new search tool developed based on this speculation 3D-SARST.
3D-SARST, adopting the advantage of SARST, is a rapid database search tool with reasonable compromise of accuracy. Although we have not found a suitable condition to make it generally outperform SARST, we do find that 3D-SARST can achieve higher accuracy for various structural classes under specific conditions. According to the results, we can firstly determine the structural class of the query protein and then use 3D-SARST running under appropriate condition and parameter settings for that class to increase the accuracy of database searching. This two-step strategy has improved the precision of SARST by 2%, making its accuracy closer to CE. As the amount of protein structural data increases ever rapidly nowadays, we suppose that an efficient database search engine such as 3D-SARST can be valuable in many post-genomic research fields.
References
1. Berman, H.M., The Protein Data Bank: a historical perspective. Acta Crystallogr A, 2008. 64(Pt 1): p. 88-95.
2. Sauder, J.M., J.W. Arthur, and R.L. Dunbrack, Jr., Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins, 2000. 40(1): p. 6-22.
3. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25(17): p. 3389-402.
4. Kolodny, R., P. Koehl, and M. Levitt, Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol, 2005. 346(4): p. 1173-88.
5. Murzin, A.G., How far divergent evolution goes in proteins. Curr Opin Struct Biol, 1998. 8(3): p. 380-7.
6. Shindyalov, I.N. and P.E. Bourne, Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng, 1998. 11(9): p. 739-47.
7. Holm, L. and C. Sander, Protein structure comparison by alignment of distance matrices. J Mol Biol, 1993. 233(1): p. 123-38.
8. Richardson, J. and D. Richardson, Principles and patterns of protein conformation. In: Prediction of protein structure and the principles of protein conformations, 1989. New York: Plenum: p. 1-98.
9. Efimov, A.V., Standard structures in proteins. Prog Biophys Mol Biol, 1993. 60(3): p. 201-39.
10. Guyon, F., et al., SA-Search: a web tool for protein structure mining based on a Structural Alphabet. Nucleic Acids Res, 2004. 32(Web Server issue): p. W545-8.
11. Lesk, A., Application of sequence alignment methods to multiple structural alignment and superposition. In: Prague Stringology Club Workshop '98, 1998. Prague: p. 95-100.
12. Levine, M., D. Stuart, and J. Williams, A method for the systematic comparison of the three-dimensional structures of proteins and some results. Acta Crystallogr A, 1984. A40: p. 600-610.
13. Martin, A.C., The ups and downs of protein topology; rapid comparison of protein structure. Protein Eng, 2000. 13(12): p. 829-37.
14. Carpentier, M., S. Brouillet, and J. Pothier, YAKUSA: a fast structural database scanning method. Proteins, 2005. 61(1): p. 137-51.
15. Lo, W.C., et al., Protein structural similarity search by Ramachandran codes. BMC Bioinformatics, 2007. 8: p. 307.
16. Tyagi, M., et al., Protein structure mining using a structural alphabet. Proteins, 2008. 71(2): p. 920-37.
17. Yang, J.M. and C.H. Tung, Protein structure database search and evolutionary classification. Nucleic Acids Res, 2006. 34(13): p. 3646-59.
18. Ramachandran, G.N. and V. Sasisekharan, Conformation of polypeptides and proteins. Adv Protein Chem, 1968. 23: p. 283-438.
19. Huang, P. and P. Lyu, SARST: Structure Alignment by Ramachandran Search Tool. 2002.
20. Chang, C.H. and P.C. Lyu, SARST: Structure Alignment by Ramachandran Search Tool - Intergrated Service over Internet. 2004.
21. Aung, Z. and K.L. Tan, Rapid 3D protein structure database searching using information retrieval techniques. Bioinformatics, 2004. 20(7): p. 1045-52.
22. Bertino, E., O. BC, and S.-D. R, Indexing techniques for advanced database systems. Kluwer Academic Publisher, 1997.
23. Chandonia, J.M., et al., The ASTRAL Compendium in 2004. Nucleic Acids Res, 2004. 32(Database issue): p. D189-92.
24. Cameron, M., H.E. Williams, and A. Cannane, A deterministic finite automaton for faster protein hit detection in BLAST. J Comput Biol, 2006. 13(4): p. 965-78.
25. Holland, J.H., Adaptation in Natural and Artificial Systems. 1975.
26. Schettino, V., The 41st IUPAC World Chemistry Congress. 5-11 August 2007, Turin, Italy. IDrugs, 2007. 10(10): p. 706-8.
27. Nelson, D.L. and M.M. Cox, Lehninger Principles of Biochemistry. Worth Publishers. Third edition.
28. Rooman, M.J., J.P. Kocher, and S.J. Wodak, Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. J Mol Biol, 1991. 221(3): p. 961-79.
29. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22(12): p. 2577-637.
30. Hubbard, T.J., et al., SCOP: a structural classification of proteins database. Nucleic Acids Res, 1997. 25(1): p. 236-9.
31. Jain, A. and R. Dubes, Algorithms for clustering data. New Jersey: Prentice Hall, 1988.
32. DeLano, W., The PyMOL molecular graphics system. In. San Carlos, CA, USA: DeLano Scientific, 2002.
33. Sayle, R.A. and E.J. Milner-White, RASMOL: biomolecular graphics for all. Trends Biochem Sci, 1995. 20(9): p. 374.
34. Henikoff, S. and J.G. Henikoff, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A, 1992. 89(22): p. 10915-9.
35. Zhu, J. and Z. Weng, FAST: a novel protein structure alignment algorithm. Proteins, 2005. 58(3): p. 618-27.
36. Hripcsak, G. and A.S. Rothschild, Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc, 2005. 12(3): p. 296-8.
37. Paul, J.B., A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992. 14: p. 239-256.
38. Ho, B.K. and R. Brasseur, The Ramachandran plots of glycine and pre-proline. BMC Struct Biol, 2005. 5: p. 14.
39. Berjanskii, M.V., S. Neal, and D.S. Wishart, PREDITOR: a web server for predicting protein torsion angle restraints. Nucleic Acids Res, 2006. 34(Web Server issue): p. W63-9.
40. Xue, B., et al., Real-value prediction of backbone torsion angles. Proteins, 2008. 72(1): p. 427-33.