研究生: |
蔡金良 Tsai, Chin-Liang |
---|---|
論文名稱: |
建立醣苷水解酵素家族序列上的共同特徵 Finding Consistent Sequence Patterns in Glycoside Hydrolase (GH) Protein Families |
指導教授: |
唐傳義
Tang, Chuan Yi |
口試委員: |
廖崇碩
林俊淵 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 24 |
中文關鍵詞: | 醣苷水解酵素 、序列排比 |
外文關鍵詞: | Glycoside hydrolases, Multiple sequence alignment |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
此篇論文主要目的為建立糖苷水解酵素家族在催化區域序列上的共同的特徵。糖苷水解酵素資料庫(CAZy)依序列地相似度將其分成125個家族,每一個家族都可以催化一種或多種的水解反應,但是描述序列上特徵的資料庫(PROSITE),只有描述18個糖苷水解家族的特徵。此外在建立序列上特徵時,有些多重序列排比的工具無法一次處理太多的序列,所以我們不使用全部的序列來做序列排比,利用序列一致性來分群,從每群中隨機選出一條代表序列來做。在部分家族中,利用此方法所作出的特徵能夠取代使用全部的序列所做的,可得到更能描述自己家族序列的特徵,而且這些特徵不會因為所選的序列不同而使每次的結果有太大的改變。所以我們利用此方法建立出糖苷水解酵素家族的特徵。
Finding Consistent Sequence patterns in Glycoside Hydrolases (GH) Protein Families Chin-Liang Tsai,
Advisor: Professor Chuan Yi Tang Master of Science on Computer Science, National Tsing Hua University, Hsinchu City, Taiwan In this research, our major objective is finding the consistent sequence patterns in GH protein families. GH sequences can be classified into 125 families by the sequence similarities. The PROSITE databases only build 18 motif descriptions or patterns for GH families. Some multiple sequence alignment (MSA) tools cannot handle a large of sequences in addition. We classified GH proteins by sequence identity and randomly selected one sequence from every cluster to build their patterns. We compared with using all sequence to build and our method. The sensitivity of our patterns is greater than using all sequence. The patterns were consistent when we randomly selected sequences. We used this method to construct the patterns for GH family proteins. Key words: Glycoside hydrolases, Multiple sequence alignment, Patten
1. Aldrete, M.E.C., Synthesis and Characterization of Glycosides. EDITORIAL 04 TRABAJOS CIENTIFICOS: p. 78.
2. Webb, E.C., Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. 1992: Academic Press.
3. Henrissat, B., A classification of glycosyl hydrolases based on amino acid sequence similarities. Biochemical Journal, 1991. 280(Pt 2): p. 309.
4. Edgar, R.C. and S. Batzoglou, Multiple sequence alignment. Current Opinion in Structural Biology, 2006. 16(3): p. 368-373.
5. Needleman, S.B. and C.D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 1970. 48(3): p. 443-453.
6. Katoh, K., et al., MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research, 2005. 33(2): p. 511.
7. Edgar, R.C., MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 2004. 32(5): p. 1792 -1797.
8. Notredame, C., D.G. Higgins, and J. Heringa, T-coffee: a novel method for fast and accurate multiple sequence alignment1. Journal of molecular biology, 2000. 302(1): p. 205-217.
9. Bork, P., Shuffled domains in extracellular proteins. FEBS letters, 1991. 286(1-2): p. 47-54.
10. Sonnhammer, E.L.L., S.R. Eddy, and R. Durbin, Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins-Structure Function and Genetics, 1997. 28(3): p. 405-420.
11. Murzin, A.G., et al., SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of molecular biology, 1995. 247(4): p. 536-540.
12. Gough, J. and C. Chothia, SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Research, 2002. 30(1): p. 268 -272.
13. Mount, D.W., Bioinformatics: sequence and genome analysis. 2004: CSHL press.
14. Sigrist, C.J.A., et al., PROSITE: A documented database using patterns and profiles as motif descriptors. Briefings in Bioinformatics, 2002. 3(3): p. 265 -274.
15. Cantarel, B.L., et al., The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Research, 2009. 37(Database): p. D233-D238.
16. Bairoch, A., et al., The universal protein resource (UniProt). Nucleic Acids Research, 2005. 33(suppl 1): p. D154.
17. de Lima Morais, D.A., et al., SUPERFAMILY 1.75 including a domain-centric gene ontology method. Nucleic Acids Research, 2011. 39(suppl 1): p. D427.
18. Bashton, M. and C. Chothia, The generation of new protein functions by the combination of domains. Structure, 2007. 15(1): p. 85-99.
19. Rost, B., Twilight zone of protein sequence alignments. Protein engineering, 1999. 12(2): p. 85.
20. Edgar, R.C. MUSCLE : Multiple sequence alignment Faster and more accurate than CLUSTALW. 2010; Available from: http://www.drive5.com/muscle/.
21. McCarter, J.D. and G. Stephen Withers, Mechanisms of enzymatic glycoside hydrolysis. Current Opinion in Structural Biology, 1994. 4(6): p. 885-892.