研究生: |
林柏亨 |
---|---|
論文名稱: |
運用特徵強化的投票計算方法在蛋白質超家族的功能分析上之研究 On the Study of Feature Amplified Voting Algorithm for Functional Analysis in Protein Superfamily |
指導教授: | 唐傳義 |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 31 |
中文關鍵詞: | 投票 、特徵強化 、蛋白質超家族 |
外文關鍵詞: | vote, voting, feature amplified, protein superfamily |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在人類和許多其它生物的基因體定序陸續完成後,已定序的蛋白質序列資料量也以很快的速度不斷的累積。在後基因體時代,如何識別出在蛋白質序列中有哪些部份具有功能上的重要性是很重大的課題。通常的做法之一是設法從蛋白質的結構入手,再進一步解讀各部位功能上的意義;然而蛋白質的結構分析是費時且昂貴的,許多結構資訊尚不可得。如何在已知序列而未知結構的情況下,推測出序列中哪些位置是影響功能的關鍵胺基酸,是實務上會面對的問題。利用多序列比對或演化觀點來分析蛋白質家族╱超家族的序列資料是常用的解決方法,但如果演化上的距離和功能上的變異程度沒有一致的對應關係,則此方法的表現並不理想。
在此我們發展了一個特徵強化的投票計算方法來改善這個問題。給定一個目標蛋白質序列和與其相關的多條蛋白質序列(通常是同一個家族或超家族的蛋白質),先根據具有功能與否將相關序列分成兩群,再和目標序列進行序列比對,然後用投票累計出各位置胺基酸的分數,即可反應出該胺基酸在功能上的關鍵程度。序列比對的方法以三條序對同時比對為主體,同時亦將投票計算方法的觀念套用到數種目前常用的多序列比對方法,最後再把不同方法所得到的結果進行分析比較。
我們將這個方法用在醯亞胺水解酵素超家族上,在rat imidase的519個胺基酸中,預測出10個可能的候選者;其中有5個胺基酸與反應區中金屬的結合有關。同時我們亦驗證了投票計算方法的概念配合其它的多序列比對的方法上亦有良好的成效。運用此方法,我們可以只由序列的資訊來預測與功能相關的候選胺基酸,以幫助生物學家們更有效率地研究他們所感興趣的蛋白質。
Abstract
After the gnome sequencing project of human and many other species has completed, a lot of protein sequences are also rapidly discovered and accumulated. In the post genome era, it is an urgent task to identify the key regions in a protein(enzyme) sequence that have significant effects on protein(enzyme) function. One of the straightforward ways is to analyze protein structures by experiments and then to understand functional mechanisms based on the structural information. But the analysis of protein structure is very costly with both time and expenses; identifying enzyme functional residues based on amino acid sequence is often demanded in many biological studies especially when related structure information is not available. In some cases of protein superfamilies, the functional residues could hardly be detected by multiple sequence alignment or evolutionary strategies when phylogenetic relationships do not parallel to their protein functions.
In this study, a feature amplified voting algorithm (FAVA) is developed to solve such a problem. Given a target sequence and several related protein sequences (usually they belong to the same protein superfamily), we divide related sequences into two sets according to their functional properties and perform sequence alignments with target sequence. After alignment phase, functional residues of target protein can then be extracted by following voting analysis. The main method of sequence alignment is simultaneous three sequences alignment; besides, the FAVA concept is also applied to several well-known multiple sequence alignment methods that has been widely used. Finally the results from different methods are compared and analyzed.
Amidohydrolase superfamily is used as a case study because it contains divergent enzymes and proteins and provides an interesting case for developing such method. FAVA is used to identify critical residues of mammalian imidase, a member of amidohydrolase superfamily. In 519 amino acids of rat imidase sequence, we predict 10 candidate residues, five residues among them are related to metal binding. In this study , we have verified that the voting concept can also be combined with other multiple sequence alignment method and improve their performance on functional residues prediction.
Altenbuchner, J., Siemann-Herzberg, M., and Syldatk, C. (2001). Hydantoinases and related enzymes as biocatalysts for the synthesis of unnatural chiral amino acids, Curr. Opin. Biotechnol. 12, 559-563.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol., 215, 403-410.
Altschul, S. F., Boguski, M. S., Gish, W., and Wooten, J. C. (1994). Issues in searching molecular sequence databases. Nature Genet., 6, 119-129.
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389-3402.
Barton, G. J., and Sternberg, M. J. E. (1987). Evaluation and improvements in the automatic alignment of protein sequences. Protein Eng. 1, 89-94.
Bernheim, F. and Bernheim, M. L. C. (1946). The hydrolysis of hydantoin by various tissues. J. Biol. Chem. 163, 683-685.
Copley, R. R., and Bork, P. (2000). Homology among (β/α)8 barrels: implications for the evolution of metabolic pathway. J. Mol. Biol. 303, 627-640.
Eadie, G. S., Bernheim, F., and Bernheim, M. L. C. (1949) The partial purification and properties of animal and plant hydantoinase. J. Biol. Chem. 181, 449-458.
Edgar, Robert C., (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792-97.
Feng, D., and Doolittle, R. F. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol., 60:351-360, 1987
Gerlt, J. A., and Raushal, F. M. (2003). Evolution of function in (β/α)8 –barrel
enzymes. Curr. Opin. Chem. Biol., 7:252–264 .
Goshima, Y., Nakamura, F., Strittmatter, P., and Strittmatter, S. M. (1995). Collapsin-induced growth cone collapse mediated by an intracellular protein related to UNC-33. Nature 376, 509-514.
Grigoriev, Igor V., and Kim, Sung-Hou (1999). Detection of protein fold similarity based on correlation of amino acid properties. Proc. Natl. Acad. Sci. USA 96, 14318-14323.
Hamajima, N., Matsuda, K., Sakata, S., Tamaki, N., Sasaki, M., and Nonaka, Masaru (1996). A novel gene family defined by human dihydropyrimidinase and three related proteins with differential tissue distribution. Gene 180, 157-163.
Henikoff, S., and Henikoff, J.G. (1991). Automated assembly of protein blocks for database searching. Nucleic Acids Res., 19(23):6565-6572.
Holm, L., and Sander, C. (1997). An evolutionary treasure: unification of a broad set of amidohydrolases related to urease. Proteins 28, 72-82.
Huang, C.-Y., and Yang, Y.-S. (2002). The role of metal on imide hydrolysis: metal content and pH profiles of metal ion-replaced mammalian imidase. Biochem. Biophy. Res. Commun. 297, 1027-1032.
Huang, C.-Y., and Yang, Y.-S. (2004). Discovery of a Novel N-iminylamidase Activity: Substrate Specificity, Chemicoselectivity and Catalytic Mechanism. Protein Expr. Purif.
King, R. D., Karwath, A., Clare, A. and Dehaspe, L., (2001). The utility of different representations of protein sequence for predicting functional class. Bioinformatics;17 445-454.
Lee, Chihan, Lin, Y. T., Tang, C. Y., and Yang, Y. S. (2003). Identify Amino Acid Candidates Critical for Function of Rat Imidase by Cross-Reference Voting in Imidase Super Family. ACM Symposium on Applied Computing, Bioinformatics
Track, (SAC 2003), pp. 127-134.
Lichtarge, O., Bourne, H. R., and Cohen, F. E., (1996). An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol, 257(2):342-358.
Lin, Meng-Shan, (2004). Identification of Critical Amino Acid Candidates by Cross-reference Voting based on Blosum62 Scoring Matrix, Thesis of Master Degree of Department of Computer Science, National Tsing Hua University, Taiwan.
Livingstone, C. D., and Barton, G. J., (1993). Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput. Appl. Biosci., 9, 745–756.
Murata, M., Richardson, J. S., and Sussman, J. L. (1985). Simultaneous comparison of three protein sequences. Proc. Natl. Acad. Sci. USA. 82, 3073-3077.
Notredame, C, Higgins, D. G., Heringa, J., (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000 Sep 8;302(1):205-17.
Ogiwara, A., Uchiyama, I., Takagi, T., and Kanehisa M. (1996). Construction and analysis of a profile library characterizing groups of structurally known proteins. Prot. Sci., 5, 1991-1999.
Pearson, W. R., and Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proc Natl Acad Sci, 85(8): 2444–2448.
Sonnhammer, E. and Kahn, D. (1994). The modular arrangement of proteins as inferred from analysis of homology. Protein Science, 3:482-492.
Su, T. M., and Yang, Y.-S. (2000). Identification, purification, and characterization of a thermophilic imidase from pig liver. Protein Expr. Purif. 19, 289-297.
Syldatk, C., May, O., Altenbuchner, J., Mattes, R., and Siemann, M. (1999). Microbial hydantoinases-industrial enzymes from the origin of life? Appl. Microb. Biotechnol. 51, 293-309.
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994). CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22, 4673-4680.
Wallach, D. P., and Grisolia, S. (1957). The purification and properties of dihydropyrimidine hydrase. J. Biol. Chem., 163, 277-288.
Wang, L.-H., and Strittmatter, S. M. (1997). Brain CRMP forms heterotetramers similar to liver dihydropyrimidinase. J. Neuro-chem. 69, 2261-2269.
Wu, C. H., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z. Z., Ledley, R. S., Lewis, K. C., Mewes, H. W., Orcutt, B. C., Suzek, B. E., Tsugita, A., Vinayaka, C. R., Yeh, L. L., Zhang, J., and Barker, W. C. (2002). The protein information resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res. 30,35-37.
Yang, Y.-S., Ramaswamy, S., and Jakoby, W.B. (1993). Rat liver imidase. J. Biol. Chem., 268, 10870-10875.