研究生: |
吳智遠 Wu, Chih-Yuan |
---|---|
論文名稱: |
Using a Structural Alphabet to Find Structural Motifs across Protein Families 利用結構字母表尋找跨家族蛋白質模體 |
指導教授: |
林小喬
Lim, Carmay |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
理學院 - 化學系 Department of Chemistry |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 結構模體 、結構字母表 、角型模體 、關鍵元件 |
外文關鍵詞: | structural motif, structural alphabet, corner motif, key component |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
To identify the characteristics and networking of biological macromolecules is one of the main works of current biology. While the solved structures are available, the three major factors which researchers concern about biological macromolecules are fundamental principles, structures, and functions. Revealing how fundamental principles in physics drive macromolecules folding into their correct structures, how structures possess the characteristics for binding ligands and interacting with each other, and how structures cooperate with fundamental principles together to guide macromolecules function correctly are the main tasks nowadays. Among biological macromolecules, proteins participate in virtually every process within cells. Hence, topics in related fields are in great demand today. In this work, we tried to approach the physic and structural basis underlying interactions between protein and macromolecules/other ligands.
In summary, technically we have developed two complementary structural-alphabet-based methods for approaching structural motif discovery problems: (i) a fully automated strategy to find structural motifs across protein families without requiring a query motif or essential residues and (ii) a strategy using descriptors of key components defined from known motifs to find structurally and functionally equivalent regions in other protein families. We also combined the first strategy with a method based on electrostatic stabilization and evolutionary conservation to illustrate the usefulness for detecting binding sites. Biologically, we pointed out a local structural unit stabilized by conserved intra-interactions employed as the core region for specific function in a known motif can be also found for the same purposes in other proteins with different folds. These kinds of units were defined as key components, such as the ‘corner’ architecture in helix-turn-helix motif and the ‘βα’ components in Rossmann fold domains. The results suggest that these proteins with the same function but divergent in sequences and morphologic structures may from the same ancient species possessing proteins with the simplest forms containing the key components. It may provide another viewpoint to approach the mystery of evolution.
識別生物巨分子的特性與關係網路是現代生物學主要的研究範疇之一,在可以取得結構的情況下,我們主要關注的是基本物理法則、結構以及功能三者之間的關係。揭示物理法則如何驅使生物巨分子摺疊成正確的結構,結構本身如何保有與配體結合或彼此間交互作用所需的特殊構造,以及結構與物理法則之間如何互相配合使生物巨分子能有正確的功能乃是最主要的任務。在各類生物巨分子中,蛋白質參與了細胞內的幾乎每一項反應,因此幾個以蛋白質為中心的相關研究領域,例如:蛋白質摺疊、蛋白質結構模體搜尋以及蛋白質功能注釋等,都有很大的研究需求。在本研究中,我們試圖瞭解蛋白質與其他生物巨分子或其他配體間交互作用的物理及結構基礎。
總結而言,在技術上我們發展出了兩種基於結構字母表的互補策略來尋找蛋白質上的結構模體:(i)一種完全自動化的跨家族蛋白質結構模體搜尋策略,本方法不需要任何已知的模體結構或必要殘基作為參考,以及(ii)一種結構模體關鍵元件搜尋策略,本方法係從已知結構模體中定義出關鍵元件,並利用結構字母表為其建立描述子(descriptors),再利用描述子來搜尋其他蛋白質家族中結構與功能等價的區域;我們也結合第一種策略以及DNA結合蛋白質結構上的靜電及演化訊息,將其應用於偵測蛋白質上的DNA結合位置。在生物上,我們指出由守恆的內部作用所穩定的區域結構可被已知的結構模體用做執行特定功能所需的核心區域,也可在其他種類的蛋白質上被發現並扮演相同角色。這類的單元結構即是“關鍵元件”,例如在螺旋-轉角-螺旋模體(helix-turn-helix motif)上的角型結構以及Rossmann型蛋白質(Rossmann fold)上的‘βα’結構,這項結果提示部份具有相同功能但是在序列以及整體結構上相異的蛋白質可能演化自具有關鍵元件的最簡單蛋白質形式,而源自相同的祖先物種,這或可為研究演化問題提供另一種觀點。
Chapter 2
1. Jones S, Barker JA, Nobeli I, Thornton JM. Using structural motif templates to identify proteins with DNA binding function. Nucleic Acids Res 2003; 31: 2811–2823.
2. Stawiski EW, Gregoret LM, Mandel-Gutfreund Y. Annotating nucleic acid binding function based on protein structure. J Mol Biol 2003; 326: 1065–1079.
3. Ahmad S, Gromiha MM, Sarai A. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 2004; 20: 477–486.
4. Tsuchiya Y, Kinoshita K, Nakamura H. Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces. Proteins 2004; 55: 885–894.
5. Shanahan HP, Garcia MA, Jones S, Thornton JM. Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res 2004; 32: 4732–4741.
6. Bhardwaj N, Langlois RE, Zhao G, Lu H. Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res 2005; 33: 6486–6493.
7. Jones S, Shanahan HP, Berman HM, Thornton JM. Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res 2003; 31: 7189–7198.
8. McLaughlin WA, Berman HM. Statistical models for discerning protein structures containing the DNA-binding helix-turn-helix motif. J Mol Biol 2003; 330: 43–55.
9. Ohlendorf DH, Matthew JB. Electrostatics and flexibility in protein-DNA interactions. Adv Biophys 1985; 20: 137–151.
10. Mandel-Gutfreund Y, Margalit H. Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. Nucleic Acids Res 1998; 26: 2306–2312.
11. Jones S, van Heyningen P, Berman HM, Thornton JM. Protein-DNA interactions: A structural analysis. J Mol Biol 1999; 287: 877–896.
12. Wintjens R, Lievin J, Rooman M, Buisine E. Contribution of cation-pi interactions to the stability of protein-DNA complexes. J Mol Biol 2000; 302: 395–410.
13. Elcock AH. Prediction of functionally important residues based solely on the computed energetics of protein structure. J Mol Biol 2001; 312: 885–896.
14. Luscombe NM, Thornton JM. Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J Mol Biol 2002; 320: 991–1009.
15. Mirny LA, Gelfand MS. Structural analysis of conserved base pairs in protein-DNA complexes. Nucleic Acids Res 2002; 30: 1704–1711.
16. Sathyapriya R, Vishveshwara S. Interaction of DNA with clusters of amino acids in proteins. Nucleic Acids Res 2004; 32: 4109–4118.
17. Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996; 257: 342–358.
18. Sowa ME, He W, Wensel TG, Lichtarge O. A regulator of G protein signaling interaction surface linked to effector specificity. Proc Natl Acad Sci USA 2000; 97: 1483–1488.
19. Laskowski RA. PDBsum: summaries and analyses of PDB structures. Nucleic Acids Res 2001; 29: 221–222.
20. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 2002; 58: 899–907.
21. Mandel-Gutfreund Y, Schueler O, Margalit H. Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. J Mol Biol 1995; 253: 370–382.
22. Luscombe NM, Laskowski RA, Thornton JM. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res 2001; 29: 2860–2874.
23. Jayaram B, McConnell K, Dixit SB, Das A, Beveridge DL. Free-energy component analysis of 40 protein-DNA complexes: a consensus view on the thermodynamics of binding at the molecular level. J Comput Chem 2002; 23: 1–14.
24. Jayaram B, Jain T. The role of water in protein-DNA recognition. Annu Rev Biophys Biomol Struct 2004; 33: 343–361.
25. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol 1994; 238: 777–793.
26. Miller S, Janin J, Lesk AM, Chothia C. Interior and surface of monomeric proteins. J Mol Biol 1987; 196: 641–656.
27. Koradi R, Billeter M, Wuthrich K. MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 1996; 14: 51–55.
28. Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N. ConSurf: Identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 2003; 19: 163–164.
29. Landgraf R, Xenarios I, Eisenberg D. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J Mol Biol 2001; 307: 1487–1502.
30. Canutescu AA, Shelenkov AA, Dunbrack RL Jr. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci 2003; 12: 2001–2014.
31. Pearlman DA, Case DA, Caldwell JW, Ross WS, Cheatham TE, DeBolt S, Ferguson D, Seibel G, Kollman P. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comp Phys Commun 1995; 91: 1–41.
32. Jones S, Thornton JM. Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 1997; 272: 133–143.
33. Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 1997; 389: 251–260.
34. Wood CM, Nicholson JM, Lambert SJ, Chantalat L, Reynolds CD, Baldwin JP. High-resolution structure of the native histone octamer. Acta Crystallograph Sect F Struct Biol Cryst Commun 2005; 61: 541–545.
35. Prasad R, Beard WA, Chyan JY, Maciejewski MW, Mullen GP, Wilson SH. Functional analysis of the amino-terminal 8-kDa domain of DNA polymerase beta as revealed by site-directed mutagenesis. DNA binding and 5′-deoxyribose phosphate lyase activities. J Biol Chem 1998; 273: 11121–11126.
Chapter 3
1. Orengo CA, Flores TP, Taylor WR, Thornton JM. Identification and classification of protein fold families. Prot. Engng. (1993) 6:485–500.
2. Kasuya A, Thornton JM. Three-dimensional structure analysis of PROSITE patterns. J. Mol. Biol. (1999) 286:1673–1691.
3. Lin K, Wright JD, Lim C. Long spacers in PROSITE patterns have a consensus backbone motif. J. Mol. Biol. (2000) 299:539–548.
4. Watson JD, Laskowski RA, Thornton JM. Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. (2005) 15:275–284.
5. Kristensen DM, Chen BY, Fofanov VY, Ward RM, Lisewski AM, Kimmel M, Kavraki L, Lichtarge O. Recurrent use of evolutionary importance for functional annotation of proteins based on local structural similarity. Prot. Sci. (2006) 15:1530–1536.
6. Jones S, Barker JA, Nobeli I, Thornton JM. Using structural motif templates to identify proteins with DNA binding function. Nucleic Acid Res. (2003) 31:2811–2823.
7. Shanahan HP, Garcia MA, Jones S, Thornton JM. Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Res. (2004) 32:4732–4741.
8. Pugalenthi G, Suganthan PN, Sowdhamini R, Chakrabarti S. MegaMotifBase: a database of structural motifs in protein families and superfamilies. Nucleic Acids Res. (2008) 36:D218–D221.
9. Madsen D, Kleywegt GT. Interactive motif and fold recognition in protein structures. J. Appl. Cryst. (2002) 35:137–139.
10. Ferrer-Costa C, Shanahan HP, Jones S, Thornton JM. HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif. Bioinformatics (2005) 21:3679–3680.
11. Goyal K, Mohanty D, Mande SC. PAR-3D: a server to predict protein active site residues. Nucleic Acids Res. (2007) 35:W503–W505.
12. Bauer RA, Bourne PE, Formella A, Frommel C, Gille C, Goede A, Guerler A, Hoppe A, Knapp EW, Pöschel TEA. Superimpose: a 3D structural superposition server. Nucleic Acids Res. (2008) 36:W47–W54.
13. Debret G, Martel A, Cuniasse P. RASMOT-3D PRO: a 3D motif search webserver. Nucleic Acids Res. (2009) 37:W459–W464.
14. Dudev M, Lim C. Discovering structural motifs using a structural alphabet: application to Mg-binding sites. BMC Bioformatics (2007) 8:106–118.
15. de Brevern AG, Etchebest C, Hazout S. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins: Struct. Funct. Genet. (2000) 41:271–287.
16. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Iype L, Jain S, Fagan P, Marvin J, et al. The Protein Data Bank. Acta. Crystallogr. D. (2002) 58:899–907.
17. Bradford JR, Westhead DR. Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics (2005) 21:1487–1494.
18. Ponomarenko JV, Bourne PE. Antibody-protein interactions: benchmark datasets and prediction tools evaluation. BMC Struct. Biol. (2007) 7:64–82.
19. Pearl FM, Bennett CF, Bray JE, Harrison AP, Martin N, Shepherd A, Sillitoe I, Thornton J, Orengo CA. The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. (2003) 31:452–455.
20. Chen YC, Lim C. Common physical basis of macromolecule-binding sites in proteins. Nucleic Acids Res. (2008) 36:7078–7087.
21. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. (1994) 238:777–793.
22. Shatsky M, Nussinov R, Wolfson HJ. A method for simultaneous alignment of multiple protein structures. Proteins (2004) 56:143–156.
23. Luscombe NM, Austin SE, Berman HM, Thornton JM. An overview of the structures of protein-DNA complexes. Genome Biol. (2000) 1:1–10.
24. Littlefield O, Korkhin Y, Sigler PB. The structural basis for the oriented assembly of a TBP/TFB/promoter complex. Proc. Natl Acad. Sci. USA (1999) 96:13668–13673.
25. White A, Ding X, vanderSpek JC, Murphy JR, Ringe D. Structure of the metal-ion-activated diphtheria toxin repressor/tox operator complex. Nature (1998) 394:502–506.
26. Wah DA, Hirsch JA, Dorner LF, Schildkraut I, Aggarwal AK. Structure of the multimodular endonulcease FokI bound to DNA. Nature (1997) 388:97–100.
27. Parkinson G, Gunasekera A, Vojtechovsky J, Zhang X, Kunkel TA, Berman H, Ebright RH. Aromatic hydrogen bond in sequence-specific protein DNA recognition. Nat. Struct. Biol. (1996) 3:837–841.
28. Lawson CL, Carey J. Tandem binding in crystals of a trp repressor/operator half-site complex. Nature (1993) 366:178–182.
29. Littlefield O, Nelson HC. A new use for the ‘wing’ of the ‘winged’ helix-turn-helix motif in the HSF-DNA cocrystal. Nat. Struct. Biol. (1999) 6:464–470.
30. Albright RA, Matthews BW. Crystal structure of lamda-Cro bound to a consensus operator at 3.0 Å resolution. J. Mol. Biol. (1998) 280:137–151.
31. Efimov AV. A novel super-secondary structure of proteins and the relation between the structure and the amino acid sequence. FEBS (1984) 166:33–38.
32. Hutchinson EG, Thornton JM. PROMOTIF–a program to identify and analyze structural motifs in proteins. Prot. Sci. (1996) 5:212–220.
33. Lo C-H, Chang Y-H, Wright JD, Chen S-H, Kan D, Lim C, Liang P-H. A combined experimental and theoretical study of long-range interactions modulating dimerization and activity of yeast geranylgeranyl diphosphate synthase. J. Am. Chem. Soc. (2009) 131:4051–4062.
34. Wang YT, Wright JD, Doudeva LG, Jhang HC, Lim C, Yuan HS. Redesign of a non-specific endonuclease to yield better DNA-binding activity and altered DNA sequence preference in cleavage. J. Am. Chem. Soc. (2009) 131:17345–17353.
35. Perry K, Mondragon A. Structure of a complex between E. coli DNA topoisomerase I and single-stranded DNA. Structure (2003) 11:1349–1358.
36. Chen YC, Wu CY, Lim C. Predicting DNA-binding sites on proteins from electrostatic stabilization upon mutation to Asp/Glu and evolutionary conservation. Proteins: Struct. Funct. Bioinf. (2007) 67:671–680.
Chapter 4 and 5
1. de Brevern, A.G., Etchebest, C. and Hazout, S. (2000) Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins: Struct. Funct. Genet., 41, 271-287.
2. DeLano, W.L. (2002) The PyMOL User's Manual, DeLano Scientific, San Carlos, CA, USA.
3. Fourrier, L., Benros, C. and de Brevern, A.G. (2004) Use of a structural alphabet for analysis of short loops connecting repetitive structures. BMC Bioinformatics, 5, 58-71.
4. Tyagi, M., Bornot, A., Offmann, B. and de Brevern, A.G. (2009) Protein short loop prediction in terms of a structural alphabet. Computational Biology and Chemistry, 33, 329-333.
5. Luscombe, N.M., Austin, S.E., Berman, H.M. and Thornton, J.M. (2000) An overview of the structures of protein-DNA complexes. Genome Biol., 1, 1-10.
6. Rao, S.T. and Rossmann, M.G. (1973) Comparison of super-secondary structures in proteins. J. Mol. Biol., 76, 241-256.
7. Bellamacina, C.R. (1996) The nicotinamide dinucleotide binding motif: a comparison of nucleotide binding proteins. FASEB J. , 10, 1257-1269.
8. Schulz, C.E. (1992) Binding of nucleotides by proteins. Curt. Opin. Struci. Biol. , 2, 61-67.
9. Rossmann, M.G., Liljas, A., Branden, C.I. and Banaszak, L.J. (1975) In Boyer, P. (ed.), The Enzymes, 3rd Ed. Academic Press, New York, Vol. 11, pp. 61-102.
10. Dym, O. and Eisenberg, D. (2001) Sequence-structure analysis of FAD-containing proteins. Protein Sci., 10, 1712-1728.
11. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M. and Sigrist, C.J.A. (2006) The PROSITE database. Nucleic Acids Res., 34, D227-D230.
12. Kleiger, G. and Eisenberg, D. (2002) GXXXG and GXXXA Motifs Stabilize FAD and NAD(P)-binding Rossmann Folds Through Cα–HO Hydrogen Bonds and van der Waals Interactions. J. Mol. Biol., 323, 69-76.
13. Brakoulias, A. and Jackson, R.M. (2004) Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching. Proteins: Structure, Function, and Bioinformatics, 56, 250-260.
14. Gherardini, P.F., Ausiello, G., Russell, R.B. and Helmer-Citterich, M. (2010) Modular architecture of nucleotide-binding pockets. Nucleic Acids Res., 38, 3809-3816.
15. Ausiello, G., Gherardini, P.F., Gatti, E., Incani, O. and Helmer-Citterich, M. (2009) Structural motifs recurring in different folds recognize the same ligand fragments. BMC Bioinformatics, 10, 182-193.
16. Pearl, F.M.G., Bennett, C.F., Bray, J.E., Harrison, A.P., Martin, N., Shepherd, A., Sillitoe, I., Thornton, J. and Orengo, C.A. (2003) The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res., 31, 452-455.
17. Dudev, M. and Lim, C. (2007) Discovering structural motifs using a structural alphabet: Application to magnesium-binding sites. BMC Bioinformatics, 8, 106-118.
18. Hadley, C. and Jones, D.T. (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure, 7, 1099-1112.
19. Laskowski, R.A. (2009) PDBsum new things. Nucleic Acids Res., 37, D355-D359.