研究生: |
張燕華 |
---|---|
論文名稱: |
以蛋白質序列、結構及固有動態來正確地預測酵素催化位點 Traits derived from protein sequence, structure and intrinsic dynamics facilitate accurate predictions of enzyme active sites |
指導教授: | 楊立威 |
口試委員: |
楊立威
林志侯 蘇士哲 |
學位類別: |
碩士 Master |
系所名稱: |
生命科學暨醫學院 - 生物資訊與結構生物研究所 Institute of Bioinformatics and Structural Biology |
論文出版年: | 2013 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 49 |
中文關鍵詞: | 酵素 、活性位點預測 、序列 、結構 、固有動態 、保留性分數 、相對溶劑可接觸表面積 、酸解離常數 、空間叢集化分數 、多元回歸 、催化位點 |
外文關鍵詞: | active site prediction, intrinsic dynamics, solvent accessibility, acid dissociation constant, spatial clustering score, Gaussian Network Model, GNM, partial least squares regression, PLS, catalytic propensity, catalytic site |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來以酵素序列及結構為基礎的活性位點預測演算法,陸續地被學者專家們提出,而且對於預測的酵素種類也不再僅被侷限於特定的酵素家族。在這篇論文中,我們提出了一個以partial least squares regression model(PLS regression model)來預測活性位點的位置,並以225個非同源性酵素來訓練建立此預測模型。
我們使用了多種序列、結構及動態上的特徵作為預測模型的輸入數據,其中包括殘基保留性分數、catalytic propensity、相對溶劑可接觸表面積(RSA)、pKa的變化、酵素的固有動態(intrinsic dynamics)、相鄰殘基間RSA的平均差值、殘基到domain center的距離以及空間叢集化分數(spatial clustering scores)。
當我們由PLS regression model挑選出預測的前2名候選者時,就有1名候選者包含催化殘基;平均3個催化殘基就有1個被預測到(specificity=0.54, sensitivity=0.35),Matthews correlation coefficient(MCC)=0.38。當我們由PLS regression model挑選出預測的前7名候選者時,平均4名候選者就有1個包含催化殘基;平均3個催化殘基就有2個被預測到(specificity=0.29, sensitivity=0.66),MCC=0.35。在我們建立的預測模型中,以殘基保留性分數、空間叢集化分數、殘基到domain center的距離以及相鄰殘基RSA的平均差值為主要影響預測結果的特徵。
The researches of active site predictions that are based on the analysis of sequence and structure are increasingly developing for the past few years. The scope of enzyme categories in prediction is no longer limited to some specific enzyme families. In this thesis, we provide a partial least squares regression model trained over 225 nonhomologous enzymes to predict the location of actives sites. We use conservation scores, catalytic propensity, intrinsic dynamics of enzymes, relative solvent accessibility(RSA), pKa changes, the average RSA deviation in sequential residues, distances between residues and domain center and the spatial clustering scores as prediction model inputs.
The performance of our predictions is interpreted by sensitivity=0.35, specificity=0.54 and Matthews correlation coefficient(MCC)=0.38 when we select the top 2 candidates. Sensitivity=0.66, specificity=0.29 and MCC=0.38 when we select the top 7 candidates. The dominant features of residues of enzymes in PLS model are conservation, spatial clustering scores of prediction candidates, distance between residues and domain center and the average RSA deviation in sequential residues.
[1] Tami J. Marrone, James M. Briggs, and J. Andrew McCammon. Structure-based drug design: Computational advances. Annual Review of Pharmacology and Toxicology. 37: 71-90.
[2] Alex Gutteridge, Gail J. Bartlett ,and Janet M. Thornton. (2003). Using a neural network and spatial clustering to predict the location of active sites in enzymes. J. Mol. Biol. 330, 719-734.
[3] Shann-Ching Chen and Ivet Bahar. (2004). Mining frequent patterns in protein structures: a study of protease families. NIH-PA Author Manuscript, Bioinformatics, 20(suppl 1): i77-i85.
[4] Raj Chakrabarti, Alexander M. Klibanov, and Richard A. Friesner. (2005). Computational prediction of native protein ligand-binding and enzyme active site sequences. PNAS. 102(29): 10153-8.
[5] Tong W., Williams R.J., Wei Y., Murga L.F., Ko J., and Ondrechen M.J. (2008). Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines. Protein Sci. 17(2): 333-41.
[6] Lee-Wei Yang and Ivet Bahar. (2005). Coupling between catalytic site and collective dynamics: a requirement for mechanochemical activity of enzymes. Structure. 13, 893-904.
[7] Porter, C.T., Bartlett, G.J., and Thornton, J.M. (2004). The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 32, D129–D133.
[8] Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242.
[9] Evgeny K., Kim H. (2007). Inference of Macromolecular Assemblies from Crystalline State. J. Mol. Biol. 372, 774-797.
[10]Glaser, F., Pupko, T., Paz, I., Bell, R.E., Bechor-Shental, D., Martz, E. and Ben-Tal, N. (2003). ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics, 19, 163-164.
[11]Altschul,S.F., Wootton,J.C., Gertz,E.M., Agarwala,R., Morgulis,A., Schaffer,A.A. and Yu,Y.K. (2005). Protein database searches using compositionally adjusted substitution matrices. FEBS J., 272, 5101-5109.
[12]Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389-3402.
[13]Li,W. and Godzik,A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22, 1658-1659.
[14]Pupko T., Bell R.E., Mayrose I., Glaser F. and Ben-Tal N. (2002). Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18 Suppl 1:S71-S77.
[15]Mayrose I., Graur D., Ben-Tal N. and Pupko T. (2004). Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol. Biol. Evol. 21:1781-1791.
[16] Flory, P.J. (1976). Statistical thermodynamics of random networks. Proc. R. Soc. Lond. A 351, 351–380.
[17] Mattice, WL.; Suter, UW. (1994). Conformational theory of large molecules. John Wiley & Sons, Inc.
[18] Bahar, I., Atilgan, AR., Erman, B. (1997). Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold Des. 2(3):173-81.
[19] B. Lee and F. M. Richards. (1971). The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379-400.
[20] Bartlett GJ, Porter CT, Borkakoti N, Thornton JM. (2002). Analysis of catalytic residues in enzyme active sites. J. Mol. Biol. 15;324(1):105-21
[21] Reto Koradi, Martin Billeter and Kurt Wüthrich. (1996). MOLMOL: A program for display and analysis of macromolecular structures. J. Mol Graphics, 14, 51-55.
[22] Ondrechen M. J., Clifton J. G., and Ringe D. (2001). THEMATICS: A simple computational predictor of enzyme function from structure. PNAS. Vol. 98 no. 22 12473-12478.
[23] Ko J., Murga L. F., Andre P., Yang H., Ondrechen M. J., Williams R. J., Agunwamba A., and Budil D. E. (2005). Statistical criteria for the identification of protein active sites using theoretical microscopic titration curves. PROTEINS: Structure, Function, and Bioinformatics 59:183–195.
[24] Hui Li, Andrew D. Robertson, and Jan H. Jensen. (2005). Very fast empirical prediction and interpretation of protein pKa values. Proteins, 61, 704-721.
[25] Delphine C. Bas, David M. Rogers, and Jan H. Jensen. (2008). Very fast prediction and rationalization of pKa values for protein-ligand complexes. Proteins, 73, 765-783.
[26] Mats H.M. Olsson, Chresten R. Søndergard, Michal Rostkowski, and Jan H. Jensen. (2011). PROPKA3: Consistent treatment of Internal and surface residues in empirical pKa predictions. Journal of Chemical Theory and Computation, 7 (2), 525-537
[27] Chresten R. Søndergaard, Mats H.M. Olsson, Michaz Rostkowski, and Jan H. Jensen. (2011). Improved treatment of ligands and coupling effects in empirical calculation and rationalization of pKa values. Journal of Chemical Theory and Computation, 7 (7), 2284-2295
[28] Svante Wold, Michael Sjöström, Lennart Eriksson. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems,58(2), 109-130.
[29] Sijmen de Jong. (1993). SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 18, 251-263.
[30] 張德豐. (2013). MATLAB機率與數理統計. 五南圖書出版股份有限公司.
[31] Fornell C., Bookstein F. (1982). Two structural equation models: LISREL and PLS applied to consumer exit-voice theory. Journal of Marketing Research, 19(4), 440-452
[32] Cassel C., Hackl P., Westlund A., (1999). Robustness of partial least-squares method for estimating latent variables equality structures. Journal of Applied Statistics, 26(4), 435-446.
[33] Chin W., Newsted P., (1999). Structural equation modeling analysis with small samples using partial least squares. In Hoyle R. H. (Ed.), Statistics strategies for small sample research. (pp. 307-341) Thousand Oaks: SAGE.
[34] Schomburg I., Chang A., Placzek S., Söhngen C., Rother M., Lang M., Munaretto C., Ulas S., Stelzer M., Grote A., Scheer M., Schomburg D., (2013). BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucletic Acids Res., 41: 764-772.
[35] 黃俊英. (2000). 多變量分析. 翰蘆圖書出版有限公司.