簡易檢索 / 詳目顯示

研究生: 李季青
Chi-Ching Lee
論文名稱: 利用DNA與蛋白質探針來建構全基因體及蛋白質體樹
Construction of whole genomic and proteomic trees based on DNA and Protein probes
指導教授: 呂平江
Ping-Chiang Lyu
口試委員:
學位類別: 碩士
Master
系所名稱: 生命科學暨醫學院 - 生物資訊與結構生物研究所
Institute of Bioinformatics and Structural Biology
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 66
中文關鍵詞: 比較基因體生物資訊微生物基因體全基因體比較基因體樹蛋白質體樹
外文關鍵詞: Comparative genome, Bioinformatics, Microbial genome, Whole genome comparison, Genomic tree, Proteomic tree
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 微生物在型態及生活環境方面有極大差異,使得系統分類與演化關係之研究難有一致結論。1970年代以來,人們著手建立以分子演化為基礎的微生物分類體系,試圖利用某些穩定且具有共通演化特徵的生物標記 (biomarker) 訂出微生物的演化關係。例如,小核醣體RNA (SSU RNA)序列相似度分析是最早被應用於研究原核生物演化關係的生物標記,至今依然被普遍採用。然而,只依靠少量的生物標記來推斷全體物種的演化關係已被認為有其不足。2000年以後,基因體定序技術漸趨成熟,越來越多微生物被定序完成,因此,開始有科學家由全基因體 (whole genome) 的角度來探討物種間的親緣關係。
    我們建立了一套基於全基因體與蛋白質體的分群方法來分類微生物並據以分析微生物在生物演化上的位階與重要性。我們利用一些具有生物意義的氨基酸與核酸典型序列 (pattern) 片段來解析基因體與蛋白質體,氨基酸典型序列是取自Prosite資料庫;核酸典型序列是採用限制酵素 (restriction enzyme) 之辨識序列,資料來自REBASE (the Rstriction Enzyme dataBASE)。這些典型序列在全基因體與蛋白質體中出現的機率經過統計後,再以unsupervised clustering方法分析結果。
    結果顯示,我們的基因體樹能把GC含率 (GC contents) 相似的微生物分群在一起。此外,以 Prosite pattern 做分群的結果能夠將古細菌 (archaea)、真細菌(bacteria) 與真菌 (fungi) 分成二群,後兩者在同一群。這個蛋白質體樹的底層和傳統分類結果相似,而較末稍的分支則更適切地將生化代謝表現型相似的微生物分群在一起,例如寄生型細菌、嗜熱細菌、產甲烷菌以及光合作用細菌等。這套分群與分析比對方法,我們已透過PHP語言、MySQL資料庫與圖形化資料呈現技術,建置了一個線上服務,網址為:http://probac.life.nthu.edu.tw/。


    The classification of microorganisms is difficult because they have various morphological and environmental distributing properties. Since 1970, taxonomy systems have been developed based on some stable and standard molecular biomarkers; for instance, sequence similarity of SSU RNA (small subunit ribosomal RNA) is the first and still wildly used biomarker nowadays for prokaryotes. However, it has been reported insufficient to classify all kinds of organisms by using one or only a few biomarkers. After 2000, the development of genome sequencing techniques has been so rapid that it is now possible to analyze the evolutionary relationships of organisms on the scale of whole genomes.
    We have developed a probe-based genome/proteome clustering approach based on the frequency of biologically meaningful restriction enzyme recognition elements and protein signatures. Such elements and signatures are provided by REBASE, the Restriction Enzyme dataBASE, and Prosite database, a collection of annotated motif descriptors from protein families and domains, We compared bacteria, archaea and fungi to build the genomic and proteomic tree by an unsupervised clustering method.
    Our results showed that, the genomic tree grouped together microorganisms with similar GC contents, and the proteomic tree clustered bacteria, archaea and fungi into two branches, where the latter two share the same node. Furthermore, the tree built based on Prosite signatures agreed well with the traditional phylogeny at the basal branches while the distal classifications seemed to reflect phenotypic features, such as the parasitism, thermophilicity, capabilities of methanogenesis or photosynthesis, better than traditional SSU RNA-based classifications. A web service has been set up, which is available at: http://probac.life.nthu.edu.tw/.

    Contents 1 Contents of Tables and Figures 3 Abstract 4 中文摘要 5 Chapter 1 - Introduction 6 1.1 Taxonomy of Microorganisms 6 1.1.1 The History 6 1.1.2 Identification and Classification of Microorganisms 7 1.1.3 Phenetic Taxonomy and Cadistic Taxonomy 9 1.1.4 Modern Taxonomy 10 1.2 Archaea, Bacteria and Fungi 11 1.2.1 Bacteria 11 1.2.2 Archaea 12 1.2.3 Fungi 12 1.3 Motivations of This Study 13 Chapter 2 – Materials and Methods 15 2.1 Material 15 2.1.1 Hardwares 15 2.1.2 Raw Genome sequences 15 2.1.3 REBASE database 15 2.1.4 The Prosite database 17 2.1.5 The GOLD database 17 2.1.6 Softwares 18 2.2 Methods 18 2.2.1 Overview of ProBAC(Probe-Based Alignment-free Clustering) 18 2.2.2 Raw sequences 18 2.2.3 Other Data Sources 19 2.2.4 Data handling and storage 19 2.2.5 Calculation of the Probe Occurrence 19 2.2.6 The Hierarchical Clustering 20 Chapter 3 – Results 21 3.1 Usage of the Web Service 21 3.2 The Genomic Tree Based on Restriction Enzyme Recognition Sites 22 3.3 Proteomic tree based on functional motifs 22 Chapter 4 – Discussions 24 4.1 On Genomic Trees 24 4.2 On Proteomic Trees 26 4.2.1 Archaea proteomic tree 26 4.2.2 Proteomic tree of 457 organisms 27 4.2.3 Supporting results from recent studies 29 4.2.4 Two interesting cases 30 Chapter 5 - Conclusions 33 Chapter 6 – Future works 34 Aim 1. Locate functional specific domain/motif sets 34 Aim 2. Develop an efficient microbial classification and identification system 34 Aim 3. The incomplete/truncated genome classification 35 References 36 Contents of Tables and Figures Table 1. PHP regular expressions for Prosite and REBASE codes 40 Figure 1. Enterotube® for bacteria rapid identification 41 Figure 2. The 2D-PAGE for whole-cell proteins 42 Figure 3. DNA-DNA hybridization technique 43 Figure 4. The DNA probe technique 44 Figure 5. The relationship between DNA-DNA hybridization and 16S rRNA homology 45 Figure 6. The polyphasic taxonomy concepts 46 Figure 7. The three-domain phylogenetic system 47 Figure 8. Flowchart of the probe-based genome clustering 48 Figure 9. Restriction enzyme recognition occurrences in different organisms 49 Figure 10. Web stie of the probe-based ProBac 50 Figure 11. The detailed view of microorganisms 51 Figure 12. User Interface of the Genome analyzer of ProBAC 52 Figure 13. User interface of the Proteome analyzer of ProBAC 53 Figure 14. The archaeal genomic tree based on restriction enzyme recognition elements 54 Figure 15. Genomic tree of 457 organisms. 55 Figure 16. The proteomic tree 56 Figure 17. A simplified proteomic tree 57 Figure 18. The Proteomic tree of 23 distinct archaeal genus 58 Figure 19. The 16S rRNA tree and the ProBAC Proteomic tree 59 Figure 20. The intracelluar parasite/endosynbioent 60 Figure 21. The photosynthetic bacterial branch of the ProBAC proteomic tree 61 Figure 22. The proteomic tree of Firmicutes 62 Figure 23. The tree of Epsilon-proteobacteria 63 Figure 24. The 16S rRNA tree (a) and Prosite-probe tree (b) 64 Figure 25.The sub-branch of Lactic Acid Bacteria (LAB). 65 Figure 26.The tree of Latobacillus 66

    Bairoch, A. 1991. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res 19 Suppl: 2241-2245.
    Bult, C.J., O. White, G.J. Olsen, L. Zhou, R.D. Fleischmann, G.G. Sutton, J.A. Blake, L.M. FitzGerald, R.A. Clayton, J.D. Gocayne et al. 1996. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273: 1058-1073.
    Colwell, R.R. 1970. Polyphasic taxonomy of the genus vibrio: numerical taxonomy of Vibrio cholerae, Vibrio parahaemolyticus, and related Vibrio species. J Bacteriol 104: 410-433.
    de Hoon, M.J., S. Imoto, J. Nolan, and S. Miyano. 2004. Open source clustering software. Bioinformatics 20: 1453-1454.
    Deckert, G., P.V. Warren, T. Gaasterland, W.G. Young, A.L. Lenox, D.E. Graham, R. Overbeek, M.A. Snead, M. Keller, M. Aujay et al. 1998. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392: 353-358.
    Dicks, L.M.T., and H. J. J. van Vuuren, and F. Dellaglio. 1987. Relatedness of homofermentative Lactobacillus species revealed by numerical analysis of total soluble cell protein patterns. Int. J. Syst. Bacteriol 37: 437-440.
    Doolittle, R.F. 1995. The origins and evolution of eukaryotic proteins. Philos Trans R Soc Lond B Biol Sci 349: 235-240.
    Dutilh, B.E., M.A. Huynen, W.J. Bruno, and B. Snel. 2004. The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. J Mol Evol 58: 527-539.
    Edwards, R. and R.G. Finch. 1986. Characterisation and antibiotic susceptibilities of Streptobacillus moniliformis. J Med Microbiol 21: 39-42.
    Ewing, W.H. 1962. Sources of Escherichia coli cultures that belonged to O antigen groups associated with infantile diarrheal disease. J Infect Dis 110: 114-120.
    Fox, G.E., J.D. Wisotzkey, and P. Jurtshuk, Jr. 1992. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity. Int J Syst Bacteriol 42: 166-170.
    Garrity, G.M. 2005. Bergey's Manual of Systematic Bacteriology.
    Grimont, P.A., F. Grimont, N. Desplaces, and P. Tchen. 1985. DNA probe specific for Legionella pneumophila. J Clin Microbiol 21: 431-437.
    Holmes, B., S.P. Lapage, and H. Malnick. 1975. Strains of Pseudomonas putrefaciens from clinical material. J Clin Pathol 28: 149-155.
    Hulo, N., A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro, P.S. Langendijk-Genevaux, M. Pagni, and C.J. Sigrist. 2006. The PROSITE database. Nucleic Acids Res 34: D227-230.
    Ibba, M., J.L. Bono, P.A. Rosa, and D. Soll. 1997. Archaeal-type lysyl-tRNA synthetase in the Lyme disease spirochete Borrelia burgdorferi. Proc Natl Acad Sci U S A 94: 14383-14388.
    Ibba, M., H.C. Losey, Y. Kawarabayasi, H. Kikuchi, S. Bunjun, and D. Soll. 1999. Substrate recognition by class I lysyl-tRNA synthetases: a molecular basis for gene displacement. Proc Natl Acad Sci U S A 96: 418-423.
    Jain, R., M.C. Rivera, and J.A. Lake. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci U S A 96: 3801-3806.
    Johnson, M.A., J.M. Whalley, I.R. Littlejohns, J. Dickson, V.W. Smith, C.R. Wilks, and A.H. Reisner. 1985. Macropodid herpesviruses 1 and 2: two herpesviruses from Australian marsupials differentiated by restriction endonucleases, DNA composition and hybridization. Brief report. Arch Virol 85: 313-319.
    Kim, J.S.A.S.Y.L. 2006. Genomic Tree of Gene Contents Based on Functional Groups of KEGG J. Microvial. Biotechnol 16: 748-756.
    Korbel, J.O., B. Snel, M.A. Huynen, and P. Bork. 2002. SHOT: a web server for the construction of genome phylogenies. Trends Genet 18: 158-162.
    Lake, J.A., R. Jain, and M.C. Rivera. 1999. Mix and match in the tree of life. Science 283: 2027-2028.
    Lin, J. and M. Gerstein. 2000. Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Res 10: 808-818.
    Liolios, K., N. Tavernarakis, P. Hugenholtz, and N.C. Kyrpides. 2006. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res 34: D332-334.
    Ludwin, B. 1998. A look at umbilical cord blood. Nurs Spectr (Gt Chic Ne Ill Nw Indiana Ed) 11: 24.
    Ludwin, D., I. Alexopoulou, J.M. Esdaile, and P. Tugwell. 1994. Renal biopsy specimens from patients with rheumatoid arthritis and apparently normal renal function after therapy with cyclosporine. Canadian Multicentre Rheumatology Group. Am J Kidney Dis 23: 260-265.
    Marmur, J. and P. Doty. 1961. Thermal renaturation of deoxyribonucleic acids. J Mol Biol 3: 585-594.
    Mira, A., R. Pushker, B.A. Legault, D. Moreira, and F. Rodriguez-Valera. 2004. Evolutionary relationships of Fusobacterium nucleatum based on phylogenetic analysis and comparative genomics. BMC Evol Biol 4: 50.
    Murry, P.R. 1997. Medical Microbiology.
    Nomura, T., K. Yasuda, T. Yamada, S. Okamoto, R.I. Mahato, Y. Watanabe, Y. Takakura, and M. Hashida. 1999. Gene expression and antitumor effects following direct interferon (IFN)-gamma gene transfer with naked plasmid DNA and DC-chol liposome complexes in mice. Gene Ther 6: 121-129.
    Olsen, G.J. and C.R. Woese. 1993. Ribosomal RNA: a key to phylogeny. Faseb J 7: 113-123.
    Olsen, G.J., C.R. Woese, and R. Overbeek. 1994. The winds of (evolutionary) change: breathing new life into microbiology. J Bacteriol 176: 1-6.
    Pruitt, K.D., T. Tatusova, and D.R. Maglott. 2005. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501-504.
    Pruitt, K.D., T. Tatusova, and D.R. Maglott. 2007. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35: D61-65.
    Qi, J., B. Wang, and B.I. Hao. 2004. Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J Mol Evol 58: 1-11.
    Roberts, R.J. and D. Macelis. 1993. REBASE--restriction enzymes and methylases. Nucleic Acids Res 21: 3125-3137.
    Roberts, R.J., T. Vincze, J. Posfai, and D. Macelis. 2007. REBASE--enzymes and genes for DNA restriction and modification. Nucleic Acids Res 35: D269-270.
    Sigrist, C.J., L. Cerutti, N. Hulo, A. Gattiker, L. Falquet, M. Pagni, A. Bairoch, and P. Bucher. 2002. PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3: 265-274.
    Sneath, P.H. 1992. Correction of orthography of epithets in Pasteurella and some problems with recommendations on latinization. Int J Syst Bacteriol 42: 658-659.
    Snel, B., M.A. Huynen, and B.E. Dutilh. 2005. Genome trees and the nature of genome evolution. Annu Rev Microbiol 59: 191-209.
    Stackebrandt, E., W. Ludwig, M. Weizenegger, S. Dorn, T.J. McGill, G.E. Fox, C.R. Woese, W. Schubert, and K.H. Schleifer. 1987. Comparative 16S rRNA oligonucleotide analyses and murein types of round-spore-forming bacilli and non-spore-forming relatives. J Gen Microbiol 133: 2523-2529.
    Teichmann, S.A. and G. Mitchison. 1999. Is there a phylogenetic signal in prokaryote proteins? J Mol Evol 49: 98-107.
    Tekaia, F. and E. Yeramian. 2005. Genome trees from conservation profiles. PLoS Comput Biol 1: e75.
    Tomb, J.F., O. White, A.R. Kerlavage, R.A. Clayton, G.G. Sutton, R.D. Fleischmann, K.A. Ketchum, H.P. Klenk, S. Gill, B.A. Dougherty et al. 1997. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388: 539-547.
    Vandamme, P., B. Pot, M. Gillis, P. de Vos, K. Kersters, and J. Swings. 1996. Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiol Rev 60: 407-438.
    Woese, C.R. 1987. Bacterial evolution. Microbiol Rev 51: 221-271.
    Woese, C.R., O. Kandler, and M.L. Wheelis. 1990. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A 87: 4576-4579.
    Wolf, Y.I., I.B. Rogozin, N.V. Grishin, and E.V. Koonin. 2002. Genome trees and the tree of life. Trends Genet 18: 472-479.
    Wolf, Y.I., I.B. Rogozin, N.V. Grishin, R.L. Tatusov, and E.V. Koonin. 2001. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol 1: 8.
    Yang, Z. 2005. The power of phylogenetic comparison in revealing protein function. Proc Natl Acad Sci U S A 102: 3179-3180.
    Yap, W.H., Z. Zhang, and Y. Wang. 1999. Distinct types of rRNA operons exist in the genome of the actinomycete Thermomonospora chromogena and evidence for horizontal transfer of an entire rRNA operon. J Bacteriol 181: 5201-5209.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE