研究生: |
張哲維 Chang, Che-Wei |
---|---|
論文名稱: |
應用Ramachandran 序列轉換法於多重蛋白質結構比對及酵素功能預測 Application of Ramachandran Sequential Transformation on Multiple Protein Structural Alignment and Enzymatic Prediction |
指導教授: |
呂平江
Lyu, Ping-Chiang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
生命科學暨醫學院 - 生物資訊與結構生物研究所 Institute of Bioinformatics and Structural Biology |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 137 |
中文關鍵詞: | 生物資訊 、蛋白質結構 、多重結構比對 、酵素功能 、功能預測 、代謝物關係 、演化樹 |
外文關鍵詞: | Bioinformatics, Protein structure, Multiple structural alignment, Enzyme function, Functional prediction, Metabolite relationship, Phylogenetic tree |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
In this post-genomics era, protein structural comparison is one of the hot topics in bioinformatics. This thesis will lead you to two derivatives of linear encoding algorithm developed in our laboratory together with some demonstrations of the new tools that we developed for comparative structural bioinformatics studies.
As protein structures are determined rapidly, more and more new cases with which people are not familiar with remain to be studied; thus, more efficient structural comparison tools are in an urgent need. Previously, an algorithm named Ramachandran Sequential Transformation (RST) has been proposed to linearly encode protein structural data into a sequence form. It has been applied to rapid protein structural similarity searches and the determination of novel structural relationships. We further take the advantages of RST to develop a multiple protein structure alignment tool MSARST (Multiple Structural Alignment aided by RST) and an enzyme functions prediction tool EC-SARST (Enzyme Commission number Search tool Aided by RST). Evolutionary relationships among protein structures can be analyzed by MSARST and novel structures newly discovered by structural genomics projects can be automatically annotated by EC-SASRST.
In the third chapter, we would like to present our reconstruction on several recent phylogenetic trees using metabolic substrate-product relationships. Comparing to the whole genome approaches, its simplicity and scalability are suitable for reconstructing the phylogeny in upcoming future.
To summarize, we developed tools that reduce the time expense of exploring large scale structural data. We hope that the web services described in this thesis may facilitate scientists to compare, analyze protein structures and understand their evolutionary/functional relationships.
在後基因體的時代裡,蛋白質結構比對是生物資訊領域裡的熱門課題之一。本篇文章將會帶領讀者了解三維結構線性編碼的兩個應用以及一些結構綜合比較工具的展示。
目前,蛋白質結構正迅速地被解出,但有許多的結構還尚待研究,因此急需開發結構比對工具。先前我們提出了「拉馬銓德朗氏線性轉換法」,能將蛋白質三維結構資料轉換為一維文字字串,它已被應用在結構搜尋以及尋找新興的結構關係上。我們採用它快速的特點並研發出多重蛋白質結構比對(MSARST)以及酵素功能預測工具 (EC-SARST)。藉此,我們能夠使用MSARST分析蛋白質結構的演化關係,而剛被解出的新興蛋白質結構也可以透由EC-SARST預測其酵素功能。
在本文第三章,我們另外提出了利用代謝反應中反應物和生成物的關係來重建演化樹,相較於利用整個基因體的資訊,此方法的簡易性及延展性非常適合用來重建演化樹。
綜合以上,我們發展的這些工具縮短了分析大規模資料的時間,並預期前述這些網路服務將有助於科學家更快地分析比較蛋白質結構與了解其演化和功能上的關係。
1. Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC: A
three-dimensional model of the myoglobin molecule obtained by x-ray
analysis. Nature 1958, 181:662-666.
2. Perutz MF: Structure of hemoglobin. Brookhaven Symp Biol 1960,
13:165-183.
3. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr., Brice MD, Rodgers JR,
Kennard O, Shimanouchi T, Tasumi M: The Protein Data Bank: a
computer-based archival file for macromolecular structures. J Mol Biol
1977, 112:535-542.
4. Rost B: Twilight zone of protein sequence alignments. Protein Eng 1999,
12:85-94.
5. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM:
CATH--a hierarchic classification of protein domain structures. Structure
1997, 5:1093-1108.
6. Yang AS, Honig B: An integrated approach to the analysis and modeling
of protein sequences and structures. III. A comparative study of sequence
conservation in protein structural families using multiple structural
alignments. J Mol Biol 2000, 301:691-711.
7. Gerstein M, Altman RB: Average core structures and variability measures
for protein families: application to the immunoglobulins. J Mol Biol 1995,
251:161-175.
8. Altman RB, Hughes C, Gerstein MB: Methods for displaying
macromolecular structural uncertainty: application to the globins. J Mol
Graph 1995, 13:142-152, 109-142.
9. Teichmann SA, Murzin AG, Chothia C: Determination of protein function,
evolution and interactions by structural genomics. Curr Opin Struct Biol
2001, 11:354-363.
10. Menke M, Berger B, Cowen L: Matt: local flexibility aids protein multiple
structure alignment. PLoS Comput Biol 2008, 4:e10.
11. Guda C, Lu S, Scheeff ED, Bourne PE, Shindyalov IN: CE-MC: a multiple
protein structure alignment server. Nucleic Acids Res 2004, 32:W100-103.
12. Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM: MUSTANG: a
multiple structural alignment algorithm. Proteins 2006, 64:559-574.
13. Karpen ME, de Haseth PL, Neet KE: Comparing short protein
substructures by a method based on backbone torsion angles. Proteins
1989, 6:155-167.
131
14. Lo WC, Huang PJ, Chang CH, Lyu PC: Protein structural similarity search
by Ramachandran codes. BMC Bioinformatics 2007, 8:307.
15. Yang JM, Tung CH: Protein structure database search and evolutionary
classification. Nucleic Acids Res 2006, 34:3646-3659.
16. Carpentier M, Brouillet S, Pothier J: YAKUSA: a fast structural database
scanning method. Proteins 2005, 61:137-151.
17. Plewczynski D, Pas J, von Grotthuss M, Rychlewski L: 3D-Hit: fast
structural comparison of proteins. Appl Bioinformatics 2002, 1:223-225.
18. Lu G: TOP: a new method for protein structure comparisons and
similarity searches. Journal of Applied Crystallography 2000, 33:176-183.
19. Shatsky M, Nussinov R, Wolfson HJ: A method for simultaneous alignment
of multiple protein structures. Proteins 2004, 56:143-156.
20. Dror O, Benyamini H, Nussinov R, Wolfson HJ: Multiple structural
alignment by secondary structures: algorithm and applications. Protein
Sci 2003, 12:2492-2507.
21. Chothia C, Lesk AM: The relation between the divergence of sequence and
structure in proteins. EMBO J 1986, 5:823-826.
22. Del Carpio-Munoz CA, Carbajal JC: Folding pattern recognition in proteins
using spectral analysis methods. Genome Inform 2002, 13:163-172.
23. O'Sullivan O, Suhre K, Abergel C, Higgins DG, Notredame C: 3DCoffee:
combining protein sequences and structures within multiple sequence
alignments. J Mol Biol 2004, 340:385-395.
24. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy
and high throughput. Nucleic Acids Res 2004, 32:1792-1797.
25. Higgins DG, Thompson JD, Gibson TJ: Using CLUSTAL for multiple
sequence alignments. Methods Enzymol 1996, 266:383-402.
26. Lesk AM: Application of Sequence Alignment Methods to Multiple
Structural Alignment and Superposition. Proceedings of the Prague
Stringology Club Workshop `98 1998:6.
27. Levine MaS, D. and Williams, J.: A method for the systematic comparison
of the three-dimensional structures of proteins and some results. Acta
Crystallographica Section A 1984, 40:600--610.
28. Tyagi M, Gowri VS, Srinivasan N, de Brevern AG, Offmann B: A substitution
matrix for structural alphabet based on structural alignment of
homologous proteins and its applications. Proteins 2006, 65:32-39.
29. Lo WC, Lyu PC: CPSARST: an efficient circular permutation search tool
applied to the detection of novel protein structural relationships. Genome
Biol 2008, 9:R11.
132
30. O'Donoghue P, Luthey-Schulten Z: Evolutionary profiles derived from the
QR factorization of multiple structural alignments gives an economy of
information. J Mol Biol 2005, 346:875-894.
31. Garau G, Di Guilmi AM, Hall BG: Structure-based phylogeny of the
metallo-beta-lactamases. Antimicrob Agents Chemother 2005, 49:2778-2784.
32. Hall BG, Barlow M: Evolution of the serine beta-lactamases: past, present
and future. Drug Resist Updat 2004, 7:111-123.
33. O'Donoghue P, Luthey-Schulten Z: On the evolution of structure in
aminoacyl-tRNA synthetases. Microbiol Mol Biol Rev 2003, 67:550-573.
34. Balaji S, Srinivasan N: Use of a database of structural alignments and
phylogenetic trees in investigating the relationship between sequence and
structural variability among homologous proteins. Protein Eng 2001,
14:219-226.
35. Caetano-Anolles G, Kim HS, Mittenthal JE: The origin of modern metabolic
networks inferred from phylogenomic analysis of protein architecture.
Proc Natl Acad Sci U S A 2007, 104:9358-9363.
36. Hill AD, Reilly PJ: Comparing programs for rigid-body multiple
structural superposition of proteins. Proteins 2006, 64:219-226.
37. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment
search tool. J Mol Biol 1990, 215:403-410.
38. de Hoon MJ, Imoto S, Nolan J, Miyano S: Open source clustering software.
Bioinformatics 2004, 20:1453-1454.
39. Jordan GE, Piel WH: PhyloWidget: web-based visualizations for the tree of
life. Bioinformatics 2008, 24:1641-1642.
40. Mizuguchi K, Deane CM, Blundell TL, Overington JP: HOMSTRAD: a
database of protein structure alignments for homologous families. Protein
Sci 1998, 7:2469-2471.
41. Henikoff S, Henikoff JG: Amino acid substitution matrices from protein
blocks. Proc Natl Acad Sci U S A 1992, 89:10915-10919.
42. Zhu J, Weng Z: FAST: a novel protein structure alignment algorithm.
Proteins 2005, 58:618-627.
43. Edgar RC: MUSCLE: a multiple sequence alignment method with reduced
time and space complexity. BMC Bioinformatics 2004, 5:113.
44. Defays D: An efficient algorithm for a complete link method. The
Computer Journa 1977, 20:3.
45. Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R:
Dendroscope: An interactive viewer for large phylogenetic trees. BMC
Bioinformatics 2007, 8:460.
133
46. Sawai MV, Jia HP, Liu L, Aseyev V, Wiencek JM, McCray PB, Jr., Ganz T,
Kearney WR, Tack BF: The NMR structure of human beta-defensin-2
reveals a novel alpha-helical segment. Biochemistry 2001, 40:3810-3816.
47. Hoover DM, Rajashankar KR, Blumenthal R, Puri A, Oppenheim JJ, Chertov
O, Lubkowski J: The structure of human beta-defensin-2 shows evidence
of higher order oligomerization. J Biol Chem 2000, 275:32911-32918.
48. Agarwal G, Rajavel M, Gopal B, Srinivasan N: Structure-based phylogeny
as a diagnostic for functional characterization of proteins with a cupin
fold. PLoS One 2009, 4:e5736.
49. Fu X, Yu LJ, Mao-Teng L, Wei L, Wu C, Yun-Feng M: Evolution of structure
in gamma-class carbonic anhydrase and structurally related proteins. Mol
Phylogenet Evol 2008, 47:211-220.
50. Jiang H, Blouin C: Insertions and the emergence of novel protein structure:
a structure-based phylogenetic study of insertions. BMC Bioinformatics
2007, 8:444.
51. Valencia A: Automatic annotation of protein function. Curr Opin Struct
Biol 2005, 15:267-274.
52. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP,
Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the
unification of biology. The Gene Ontology Consortium. Nat Genet 2000,
25:25-29.
53. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman
DJ: Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 1997, 25:3389-3402.
54. Shah I, Hunter L: Identification of divergent functions in homologous
proteins by induction over conserved modules. Proc Int Conf Intell Syst Mol
Biol 1998, 6:157-164.
55. Shah I, Hunter L: Predicting enzyme function from sequence: a systematic
appraisal. Proc Int Conf Intell Syst Mol Biol 1997, 5:276-283.
56. Tian W, Skolnick J: How well is enzyme function conserved as a function of
pairwise sequence identity? J Mol Biol 2003, 333:863-882.
57. Hulo N, Sigrist CJ, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker
A, De Castro E, Bucher P, Bairoch A: Recent improvements to the
PROSITE database. Nucleic Acids Res 2004, 32:D134-137.
58. Attwood TK, Bradley P, Flower DR, Gaulton A, Maudling N, Mitchell AL,
Moulton G, Nordle A, Paine K, Taylor P, et al: PRINTS and its automatic
supplement, prePRINTS. Nucleic Acids Res 2003, 31:400-402.
59. Henikoff JG, Greene EA, Pietrokovski S, Henikoff S: Increased coverage of
134
protein families with the blocks database servers. Nucleic Acids Res 2000,
28:228-230.
60. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO:
Assigning protein functions by comparative genome analysis: protein
phylogenetic profiles. Proc Natl Acad Sci U S A 1999, 96:4285-4288.
61. Shakhnovich BE, Dokholyan NV, DeLisi C, Shakhnovich EI: Functional
fingerprints of folds: evidence for correlated structure-function evolution.
J Mol Biol 2003, 326:1-9.
62. Rost B: Protein structures sustain evolutionary drift. Fold Des 1997,
2:S19-24.
63. Goldsmith-Fischman S, Honig B: Structural genomics: computational
methods for structure analysis. Protein Sci 2003, 12:1813-1821.
64. Bertone P, Kluger Y, Lan N, Zheng D, Christendat D, Yee A, Edwards AM,
Arrowsmith CH, Montelione GT, Gerstein M: SPINE: an integrated
tracking database and data mining approach for identifying feasible
targets in high-throughput structural proteomics. Nucleic Acids Res 2001,
29:2884-2898.
65. Shindyalov IN, Bourne PE: Protein structure alignment by incremental
combinatorial extension (CE) of the optimal path. Protein Eng 1998,
11:739-747.
66. Holm L, Sander C: Mapping the protein universe. Science 1996,
273:595-603.
67. Pal D, Eisenberg D: Inference of protein function from protein structure.
Structure 2005, 13:121-130.
68. Laskowski RA, Watson JD, Thornton JM: ProFunc: a server for predicting
protein function from 3D structure. Nucleic Acids Res 2005, 33:W89-93.
69. Pazos F, Sternberg MJ: Automated prediction of protein function and
detection of functional sites from structure. Proc Natl Acad Sci U S A 2004,
101:14754-14759.
70. Aloy P, Querol E, Aviles FX, Sternberg MJ: Automated structure-based
prediction of functional sites in proteins: applications to assessing the
validity of inheriting protein function from homology in genome
annotation and to protein docking. J Mol Biol 2001, 311:395-408.
71. Adams MA, Suits MD, Zheng J, Jia Z: Piecing together the
structure-function puzzle: experiences in structure-based functional
annotation of hypothetical proteins. Proteomics 2007, 7:2920-2932.
72. Friedberg I: Automated protein function prediction--the genomic challenge.
Brief Bioinform 2006, 7:225-242.
135
73. Kristensen DM, Ward RM, Lisewski AM, Erdin S, Chen BY, Fofanov VY,
Kimmel M, Kavraki LE, Lichtarge O: Prediction of enzyme function based
on 3D templates of evolutionarily important amino acids. BMC
Bioinformatics 2008, 9:17.
74. Laskowski RA, Watson JD, Thornton JM: Protein function prediction using
local 3D templates. J Mol Biol 2005, 351:614-626.
75. von Grotthuss M, Plewczynski D, Vriend G, Rychlewski L: 3D-Fun:
predicting enzyme function from structure. Nucleic Acids Res 2008,
36:W303-307.
76. Sussman JL, Lin D, Jiang J, Manning NO, Prilusky J, Ritter O, Abola EE:
Protein Data Bank (PDB): database of three-dimensional structural
information of biological macromolecules. Acta Crystallogr D Biol
Crystallogr 1998, 54:1078-1084.
77. Bairoch A, Boeckmann B: The SWISS-PROT protein sequence data bank.
Nucleic Acids Res 1991, 19 Suppl:2247-2249.
78. Martin AC: PDBSprotEC: a Web-accessible database linking PDB chains
to EC numbers via SwissProt. Bioinformatics 2004, 20:986-988.
79. Brenner SE: A tour of structural genomics. Nat Rev Genet 2001, 2:801-809.
80. Rost B: Enzyme function less conserved than anticipated. J Mol Biol 2002,
318:595-608.
81. Woese CR, Kandler O, Wheelis ML: Towards a natural system of
organisms: proposal for the domains Archaea, Bacteria, and Eucarya.
Proc Natl Acad Sci U S A 1990, 87:4576-4579.
82. Ma HW, Zeng AP: Phylogenetic comparison of metabolic capacities of
organisms at genome level. Mol Phylogenet Evol 2004, 31:204-213.
83. Lin J, Gerstein M: Whole-genome trees based on the occurrence of folds
and orthologs: implications for comparing genomes on different levels.
Genome Res 2000, 10:808-818.
84. Aguilar D, Aviles FX, Querol E, Sternberg MJ: Analysis of phenetic trees
based on metabolic capabilites across the three domains of life. J Mol Biol
2004, 340:491-512.
85. Almaas E, Kovacs B, Vicsek T, Oltvai ZN, Barabasi AL: Global organization
of metabolic fluxes in the bacterium Escherichia coli. Nature 2004,
427:839-843.
86. Varma A, Palsson BO: Stoichiometric flux balance models quantitatively
predict growth and metabolic by-product secretion in wild-type
Escherichia coli W3110. Appl Environ Microbiol 1994, 60:3724-3731.
87. Oh SJ, Joung JG, Chang JH, Zhang BT: Construction of phylogenetic trees
136
by kernel-based comparative analysis of metabolic networks. BMC
Bioinformatics 2006, 7:284.
88. Zhang Y, Li S, Skogerbo G, Zhang Z, Zhu X, Sun S, Lu H, Shi B, Chen R:
Phylophenetic properties of metabolic pathway topologies as revealed by
global analysis. BMC Bioinformatics 2006, 7:252.
89. Clemente JC, Satou K, Valiente G: Phylogenetic reconstruction from
non-genomic data. Bioinformatics 2007, 23:e110-115.
90. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto
Encyclopedia of Genes and Genomes. Nucleic Acids Res 1999, 27:29-34.
91. Siebers B, Schonheit P: Unusual pathways and enzymes of central
carbohydrate metabolism in Archaea. Curr Opin Microbiol 2005,
8:695-705.
92. Arita M: The metabolic world of Escherichia coli is not small. Proc Natl
Acad Sci U S A 2004, 101:1543-1547.
93. Forst CV, Flamm C, Hofacker IL, Stadler PF: Algebraic comparison of
metabolic networks, phylogenetic inference, and metabolic innovation.
BMC Bioinformatics 2006, 7:67.
94. Borenstein E, Kupiec M, Feldman MW, Ruppin E: Large-scale
reconstruction and phylogenetic analysis of metabolic environments. Proc
Natl Acad Sci U S A 2008, 105:14482-14487.
95. Arita M: In silico atomic tracing by substrate-product relationships in
Escherichia coli intermediary metabolism. Genome Res 2003,
13:2455-2466.
96. Kotera M, Okuno Y, Hattori M, Goto S, Kanehisa M: Computational
assignment of the EC numbers for genomic-scale analysis of enzymatic
reactions. J Am Chem Soc 2004, 126:16487-16498.
97. Pitkanen E, Jouhten P, Rousu J: Inferring branching pathways in
genome-scale metabolic networks. BMC Syst Biol 2009, 3:103.
98. Faust K, Croes D, van Helden J: Metabolic pathfinding using RPAIR
annotation. J Mol Biol 2009, 388:390-414.
99. Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P: Toward
automatic reconstruction of a highly resolved tree of life. Science 2006,
311:1283-1287.
100. Gamulin V, Muller, I. M., and Muller, W. E. G.: Sponge proteins are more
similar to those of Homo sapiens than to Caenorhabditis elegans. Biol J
Linnean Soc 2000, 71:821–828.
101. Kanhere A, Vingron M: Horizontal Gene Transfers in prokaryotes show
differential preferences for metabolic and translational genes. BMC Evol
137
Biol 2009, 9:9.
102. Boussau B, Gueguen L, Gouy M: Accounting for horizontal gene transfers
explains conflicting hypotheses regarding the position of aquificales in the
phylogeny of Bacteria. BMC Evol Biol 2008, 8:272.
103. Whitaker JW, McConkey GA, Westhead DR: The transferome of metabolic
genes explored: analysis of the horizontal transfer of enzyme encoding
genes in unicellular eukaryotes. Genome Biol 2009, 10:R36.
104. Rodriguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF, Philippe
H: Detecting and overcoming systematic errors in genome-scale
phylogenies. Syst Biol 2007, 56:389-399.