研究生: |
賴思明 Szu-Ming Lai |
---|---|
論文名稱: |
微生物全基因體特性資料庫之建立與應用 GPDB(Genome Profile DataBase): Construction and Application of Complete Microbial Genome Analysis Database |
指導教授: |
呂平江
Ping-Chiang Lyu |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
生命科學暨醫學院 - 生物資訊與結構生物研究所 Institute of Bioinformatics and Structural Biology |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 中文 |
論文頁數: | 120 |
中文關鍵詞: | 比較基因體 、基因體特性 、生物資訊 、微生物基因體 、全基因體比較 、虛擬二維電泳 |
外文關鍵詞: | Comparative genome, Genome profile, Bioinformatics, Microbial genome, Whole genome comparison, Virtual 2D gel |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
中文摘要
隨著愈來愈多的原核生物基因體被定序完,利用完整序列來探討原核生物間的多樣性也變得可行,許多比較基因體學的研究致力於尋找物種間的相似性與差異性。由於原核生物在形態的辦識困難以及生活環境的多樣性,導致在分類及演化的地位上較難有一致的結論,直到以分子演化的方式才得以分析原核生物間親緣關係,特別是16S rRNA的鑑定將現生生物畫分成三個主要的生命形態。而生活在極端環境下的微生物,如嗜高溫菌、嗜酸菌、嗜鹽菌…等,也有研究指出在核酸及胺基酸組成上有所偏好,更有許多研究利用全基因體的特性來重建原核生物的演化地位。因此,我們建立了一個資料庫- GPDB (Genome Profile DataBase),目的是提供生物學家利用全基因體的資訊來探討原核生物的演化與多樣性。目前這個資料庫包括了145株 (strain) 完整定序的原核生物,共有223個染色體與質體,含429177條ORFs。原始的序列及物種的分類資料源自於NCBI的GenBank與Toxonomy資料庫,為了能自動化分析這些資訊,我們以perl語言寫了「Genome Profile Pipeline」用來分析不同的genome profiles,包括核酸組成 (GC & AT content, total GC & AT skew, N-nucleotide frequency, codon usage…)、胺基酸組成 (N-peptide frequency distribution), 蛋白體組成 (length & Mw & pI & transmembrane helix protein & fold…),並以MySQL資料庫作為後端,以圖形化網頁的方式呈現並比較不同生物的genome profiles,並且應用Hierarchical clustering的方式協助類似特徵的歸類,此外,我們也提供虛擬的二維電泳,助蛋白體學上的分析研究。整個網站以模組化的方式設計,以方便日後加入新的genome profiles,提供更多全基因體上的資訊來探討原核生物的多樣性。
Abstract
With rapidly generated whole genome sequence data especially those microbial organisms, it can be used to explore the diversity of ancient life. More and more comparative genomic methods have been used to investigate the similarities or dissimilarities between organisms. Phylogenetic tree based on 16S rRNA indicates the prokaryotic evolutionary relationship unrevealed from the morphological characteristics. Other features like GC content and amino acid composition are widely used to account for extreme environmental organisms such as thermophiles, acidophiles, halophiles, etc. Quarrying the whole genome wide information may suggest why microbe diverse. Here we constructed a database GPDB (Genome Profile DataBase) with 145 microbial genomes including bacteria and archaea. The original sequence data and annotations are based on NCBI GeneBank and RefSeq databases. The uniform nomenclature and classification were used according to the taxonomy database at NCBI. In order to automatically process so many features, the program called "Genome Profile Pipeline" has been developed in perl language. Here we present lots of various "Genome Profile", such as basic information (taxonomy, genome size, orf number…), nucleotide composition (GC & AT content, total GC & AT skew, N-nucleotide frequency, codon usage…), and amino acid composition (N-peptide frequency distribution, proteome distribution like length & Mw & pI & transmembrane helix protein & fold…) in graphic ways. In order to estimate different combination interactively, an on-line graphic browsing interface which use Euclidean distance for hierarchical clustering method was built to compare and view the difference between these organisms. Further more, the website is modulated for more Genome Profile to be included and compared in the future.
Akman, L., A. Yamashita, H. Watanabe, K. Oshima, T. Shiba, M. Hattori, and S. Aksoy. 2002. Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat Genet 32: 402-407.
Altschul, S.F., T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
Andreeva, A., D. Howorth, S.E. Brenner, T.J. Hubbard, C. Chothia, and A.G. Murzin. 2004. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32 Database issue: D226-229.
Bocs, S., S. Cruveiller, D. Vallenet, G. Nuel, and C. Medigue. 2003. AMIGene: Annotation of MIcrobial Genes. Nucleic Acids Res 31: 3723-3726.
Bocs, S., A. Danchin, and C. Medigue. 2002. Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes. BMC Bioinformatics 3: 5.
Brendel, V., P. Bucher, I.R. Nourbakhsh, B.E. Blaisdell, and S. Karlin. 1992. Methods and algorithms for statistical analysis of protein sequences. Proc Natl Acad Sci U S A 89: 2002-2006.
Check, E. 2002. Venter aims for maximum impact with minimal genome. Nature 420: 350.
Cole, J.R., B. Chai, T.L. Marsh, R.J. Farris, Q. Wang, S.A. Kulam, S. Chandra, D.M. McGarrell, T.M. Schmidt, G.M. Garrity, and J.M. Tiedje. 2003. The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res 31: 442-443.
Daubin, V., N.A. Moran, and H. Ochman. 2003. Phylogenetics and the cohesion of bacterial genomes. Science 301: 829-832.
de Bakker, P.I., A. Bateman, D.F. Burke, R.N. Miguel, K. Mizuguchi, J. Shi, H. Shirai, and T.L. Blundell. 2001. HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families. Bioinformatics 17: 748-749.
Delcher, A.L., D. Harmon, S. Kasif, O. White, and S.L. Salzberg. 1999. Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27: 4636-4641.
Devereux, J., P. Haeberli, and O. Smithies. 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12: 387-395.
dos Reis, M., L. Wernisch, and R. Savva. 2003. Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res 31: 6976-6985.
Fleischmann, R.D., M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, A.R. Kerlavage, C.J. Bult, J.F. Tomb, B.A. Dougherty, J.M. Merrick, and et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496-512.
Forsdyke, D.R. and J.R. Mortimer. 2000. Chargaff's legacy. Gene 261: 127-137.
Fraser, C.M., J.A. Eisen, and S.L. Salzberg. 2000. Microbial genome sequencing. Nature 406: 799-803.
Frishman, D., K. Albermann, J. Hani, K. Heumann, A. Metanomski, A. Zollner, and H.W. Mewes. 2001. Functional and structural genomics using PEDANT. Bioinformatics 17: 44-57.
Frishman, D., M. Mokrejs, D. Kosykh, G. Kastenmuller, G. Kolesov, I. Zubrzycki, C. Gruber, B. Geier, A. Kaps, K. Albermann, A. Volz, C. Wagner, M. Fellenberg, K. Heumann, and H.W. Mewes. 2003. The PEDANT genome database. Nucleic Acids Res 31: 207-211.
Fukuchi, S., K. Yoshimune, M. Wakayama, M. Moriguchi, and K. Nishikawa. 2003. Unique amino acid composition of proteins in halophilic bacteria. J Mol Biol 327: 347-357.
Gee, H. 2003. Evolution: ending incongruence. Nature 425: 782.
Gray, S.A. and M.E. Konkel. 1999. Codon usage in the A/T-rich bacterium Campylobacter jejuni. Adv Exp Med Biol 473: 231-235.
Green, E.D. 2001. Strategies for the systematic sequencing of complex genomes. Nat Rev Genet 2: 573-583.
Gupta, R.S. 1998. Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol Mol Biol Rev 62: 1435-1491.
Heymans, M. and A.K. Singh. 2003. Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics 19 Suppl 1: i138-146.
Hiller, K., M. Schobert, C. Hundertmark, D. Jahn, and R. Munch. 2003. JVirGel: Calculation of virtual two-dimensional protein gels. Nucleic Acids Res 31: 3862-3865.
Hiscock, D. and C. Upton. 2000. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes. Bioinformatics 16: 484-485.
Hoersch, S., C. Leroy, N.P. Brown, M.A. Andrade, and C. Sander. 2000. The GeneQuiz web server: protein functional analysis through the Web. Trends Biochem Sci 25: 33-35.
Hubbard, T.J., B. Ailey, S.E. Brenner, A.G. Murzin, and C. Chothia. 1999. SCOP: a Structural Classification of Proteins database. Nucleic Acids Res 27: 254-256.
Karlin, S. and L.R. Cardon. 1994. Computational DNA sequence analysis. Annu Rev Microbiol 48: 619-654.
Karlin, S., J. Mrazek, and A.M. Campbell. 1997. Compositional biases of bacterial genomes and evolutionary implications. J Bacteriol 179: 3899-3913.
Kawabata, T., S. Fukuchi, K. Homma, M. Ota, J. Araki, T. Ito, N. Ichiyoshi, and K. Nishikawa. 2002. GTOP: a database of protein structures predicted from genome sequences. Nucleic Acids Res 30: 294-298.
Kennedy, S.P., W.V. Ng, S.L. Salzberg, L. Hood, and S. DasSarma. 2001. Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res 11: 1641-1650.
Koonin, E.V. 2003. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol 1: 127-136.
Kreil, D.P. and C.A. Ouzounis. 2001. Identification of thermophilic species by the amino acid compositions deduced from their genomes. Nucleic Acids Res 29: 1608-1615.
Krogh, A., B. Larsson, G. von Heijne, and E.L. Sonnhammer. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305: 567-580.
Kunst, F., N. Ogasawara, I. Moszer, A.M. Albertini, G. Alloni, V. Azevedo, M.G. Bertero, P. Bessieres, A. Bolotin, S. Borchert, R. Borriss, L. Boursier, A. Brans, M. Braun, S.C. Brignell, S. Bron, S. Brouillet, C.V. Bruschi, B. Caldwell, V. Capuano, N.M. Carter, S.K. Choi, J.J. Codani, I.F. Connerton, A. Danchin, and et al. 1997. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390: 249-256.
Lafay, B., J.C. Atherton, and P.M. Sharp. 2000. Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. Microbiology 146 ( Pt 4): 851-860.
Lander, E.S. L.M. Linton B. Birren C. Nusbaum M.C. Zody J. Baldwin K. Devon K. Dewar M. Doyle W. FitzHugh R. Funke D. Gage K. Harris A. Heaford J. Howland L. Kann J. Lehoczky R. LeVine P. McEwan K. McKernan J. Meldrim J.P. Mesirov C. Miranda W. Morris J. Naylor C. Raymond M. Rosetti R. Santos A. Sheridan C. Sougnez N. Stange-Thomann N. Stojanovic A. Subramanian D. Wyman J. Rogers J. Sulston R. Ainscough S. Beck D. Bentley J. Burton C. Clee N. Carter A. Coulson R. Deadman P. Deloukas A. Dunham I. Dunham R. Durbin L. French D. Grafham S. Gregory T. Hubbard S. Humphray A. Hunt M. Jones C. Lloyd A. McMurray L. Matthews S. Mercer S. Milne J.C. Mullikin A. Mungall R. Plumb M. Ross R. Shownkeen S. Sims R.H. Waterston R.K. Wilson L.W. Hillier J.D. McPherson M.A. Marra E.R. Mardis L.A. Fulton A.T. Chinwalla K.H. Pepin W.R. Gish S.L. Chissoe M.C. Wendl K.D. Delehaunty T.L. Miner A. Delehaunty J.B. Kramer L.L. Cook R.S. Fulton D.L. Johnson P.J. Minx S.W. Clifton T. Hawkins E. Branscomb P. Predki P. Richardson S. Wenning T. Slezak N. Doggett J.F. Cheng A. Olsen S. Lucas C. Elkin E. Uberbacher M. Frazier R.A. Gibbs D.M. Muzny S.E. Scherer J.B. Bouck E.J. Sodergren K.C. Worley C.M. Rives J.H. Gorrell M.L. Metzker S.L. Naylor R.S. Kucherlapati D.L. Nelson G.M. Weinstock Y. Sakaki A. Fujiyama M. Hattori T. Yada A. Toyoda T. Itoh C. Kawagoe H. Watanabe Y. Totoki T. Taylor J. Weissenbach R. Heilig W. Saurin F. Artiguenave P. Brottier T. Bruls E. Pelletier C. Robert P. Wincker D.R. Smith L. Doucette-Stamm M. Rubenfield K. Weinstock H.M. Lee J. Dubois A. Rosenthal M. Platzer G. Nyakatura S. Taudien A. Rump H. Yang J. Yu J. Wang G. Huang J. Gu L. Hood L. Rowen A. Madan S. Qin R.W. Davis N.A. Federspiel A.P. Abola M.J. Proctor R.M. Myers J. Schmutz M. Dickson J. Grimwood D.R. Cox M.V. Olson R. Kaul N. Shimizu K. Kawasaki S. Minoshima G.A. Evans M. Athanasiou R. Schultz B.A. Roe F. Chen H. Pan J. Ramser H. Lehrach R. Reinhardt W.R. McCombie M. de la Bastide N. Dedhia H. Blocker K. Hornischer G. Nordsiek R. Agarwala L. Aravind J.A. Bailey A. Bateman S. Batzoglou E. Birney P. Bork D.G. Brown C.B. Burge L. Cerutti H.C. Chen D. Church M. Clamp R.R. Copley T. Doerks S.R. Eddy E.E. Eichler T.S. Furey J. Galagan J.G. Gilbert C. Harmon Y. Hayashizaki D. Haussler H. Hermjakob K. Hokamp W. Jang L.S. Johnson T.A. Jones S. Kasif A. Kaspryzk S. Kennedy W.J. Kent P. Kitts E.V. Koonin I. Korf D. Kulp D. Lancet T.M. Lowe A. McLysaght T. Mikkelsen J.V. Moran N. Mulder V.J. Pollara C.P. Ponting G. Schuler J. Schultz G. Slater A.F. Smit E. Stupka J. Szustakowski D. Thierry-Mieg J. Thierry-Mieg L. Wagner J. Wallis R. Wheeler A. Williams Y.I. Wolf K.H. Wolfe S.P. Yang R.F. Yeh F. Collins M.S. Guyer J. Peterson A. Felsenfeld K.A. Wetterstrand A. Patrinos M.J. Morgan J. Szustakowki P. de Jong J.J. Catanese K. Osoegawa H. Shizuya S. Choi and Y.J. Chen. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.
Larsen, H. 1969. Extremely halphilic bacteria. J Gen Microbiol 55: 22-23.
Lee, D., A. Grant, D. Buchan, and C. Orengo. 2003. A structural perspective on genome evolution. Curr Opin Struct Biol 13: 359-369.
Lin, J. and M. Gerstein. 2000. Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Res 10: 808-818.
Lo Conte, L., B. Ailey, T.J. Hubbard, S.E. Brenner, A.G. Murzin, and C. Chothia. 2000. SCOP: a structural classification of proteins database. Nucleic Acids Res 28: 257-259.
Lo Conte, L., S.E. Brenner, T.J. Hubbard, C. Chothia, and A.G. Murzin. 2002. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res 30: 264-267.
Lukashin, A.V. and M. Borodovsky. 1998. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26: 1107-1115.
Madern, D., C. Ebel, and G. Zaccai. 2000. Halophilic adaptation of enzymes. Extremophiles 4: 91-98.
Medjahed, D., G.W. Smythers, D.A. Powell, R.M. Stephens, P.F. Lemkin, and D.J. Munroe. 2003. VIRTUAL2D: A web-accessible predictive database for proteomics analysis. Proteomics 3: 129-138.
Meyer, F., A. Goesmann, A.C. McHardy, D. Bartels, T. Bekel, J. Clausen, J. Kalinowski, B. Linke, O. Rupp, R. Giegerich, and A. Puhler. 2003. GenDB--an open source genome annotation system for prokaryote genomes. Nucleic Acids Res 31: 2187-2195.
Mizuguchi, K., C.M. Deane, T.L. Blundell, and J.P. Overington. 1998. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci 7: 2469-2471.
Moller, S., M.D. Croning, and R. Apweiler. 2001. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17: 646-653.
Mushegian, A. 1999. The minimal genome concept. Curr Opin Genet Dev 9: 709-714.
Mushegian, A.R. and E.V. Koonin. 1996. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A 93: 10268-10273.
Nakamura, Y., T. Gojobori, and T. Ikemura. 2000. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28: 292.
Ng, W.V., S.P. Kennedy, G.G. Mahairas, B. Berquist, M. Pan, H.D. Shukla, S.R. Lasky, N.S. Baliga, V. Thorsson, J. Sbrogna, S. Swartzell, D. Weir, J. Hall, T.A. Dahl, R. Welti, Y.A. Goo, B. Leithauser, K. Keller, R. Cruz, M.J. Danson, D.W. Hough, D.G. Maddocks, P.E. Jablonski, M.P. Krebs, C.M. Angevine, H. Dale, T.A. Isenbarger, R.F. Peck, M. Pohlschroder, J.L. Spudich, K.W. Jung, M. Alam, T. Freitas, S. Hou, C.J. Daniels, P.P. Dennis, A.D. Omer, H. Ebhardt, T.M. Lowe, P. Liang, M. Riley, L. Hood, and S. DasSarma. 2000. Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci U S A 97: 12176-12181.
Novembre, J.A. 2002. Accounting for background nucleotide composition when measuring codon usage bias. Mol Biol Evol 19: 1390-1394.
Ouzounis, C.A. and P.D. Karp. 2002. The past, present and future of genome-wide re-annotation. Genome Biol 3: COMMENT2001.
Pace, N.R. 1997. A molecular view of microbial diversity and the biosphere. Science 276: 734-740.
Pennisi, E. 2003. Molecular biology. Venter cooks up a synthetic genome in record time. Science 302: 1307.
Philippe, H. and C.J. Douady. 2003. Horizontal gene transfer and phylogenetics. Curr Opin Microbiol 6: 498-505.
Pieper, U., N. Eswar, H. Braberg, M.S. Madhusudhan, F.P. Davis, A.C. Stuart, N. Mirkovic, A. Rossi, M.A. Marti-Renom, A. Fiser, B. Webb, D. Greenblatt, C.C. Huang, T.E. Ferrin, and A. Sali. 2004. MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 32 Database issue: D217-222.
Pieper, U., N. Eswar, A.C. Stuart, V.A. Ilyin, and A. Sali. 2002. MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 30: 255-259.
Pruess, M., W. Fleischmann, A. Kanapin, Y. Karavidopoulou, P. Kersey, E. Kriventseva, V. Mittard, N. Mulder, I. Phan, F. Servant, and R. Apweiler. 2003. The Proteome Analysis database: a tool for the in silico analysis of whole proteomes. Nucleic Acids Res 31: 414-417.
Reedy, B.V. and P.E. Bourne. 2003. Protein structure evolution and the SCOP database. Methods Biochem Anal 44: 239-248.
Richard, S.B., D. Madern, E. Garcin, and G. Zaccai. 2000. Halophilic adaptation: novel solvent protein interactions observed in the 2.9 and 2.6 A resolution structures of the wild type and a mutant of malate dehydrogenase from Haloarcula marismortui. Biochemistry 39: 992-1000.
Rokas, A., B.L. Williams, N. King, and S.B. Carroll. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798-804.
Saccone, C. 2003. Handbook of comparative genomics: Principles and methodology. John Wiley & Sons, Inc.
Salzberg, S.L., A.L. Delcher, S. Kasif, and O. White. 1998. Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26: 544-548.
Sanchez, R., U. Pieper, N. Mirkovic, P.I. de Bakker, E. Wittenstein, and A. Sali. 2000. MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 28: 250-253.
Scharf, M., R. Schneider, G. Casari, P. Bork, A. Valencia, C. Ouzounis, and C. Sander. 1994. GeneQuiz: a workbench for sequence analysis. Proc Int Conf Intell Syst Mol Biol 2: 348-353.
Service, R.F. 2001. Proteomics. High-speed biologists search for gold in proteins. Science 294: 2074-2077.
Shigenobu, S., H. Watanabe, M. Hattori, Y. Sakaki, and H. Ishikawa. 2000. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407: 81-86.
Singer, G.A. and D.A. Hickey. 2003. Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content. Gene 317: 39-47.
Sonnhammer, E.L., G. von Heijne, and A. Krogh. 1998. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6: 175-182.
Stebbings, L.A. and K. Mizuguchi. 2004. HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database. Nucleic Acids Res 32 Database issue: D203-207.
Stuart, G.W., K. Moffett, and S. Baker. 2002. Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics 18: 100-108.
Tatusov, R.L., N.D. Fedorova, J.D. Jackson, A.R. Jacobs, B. Kiryutin, E.V. Koonin, D.M. Krylov, R. Mazumder, S.L. Mekhedov, A.N. Nikolskaya, B.S. Rao, S. Smirnov, A.V. Sverdlov, S. Vasudevan, Y.I. Wolf, J.J. Yin, and D.A. Natale. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4: 41.
Tatusov, R.L., M.Y. Galperin, D.A. Natale, and E.V. Koonin. 2000. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28: 33-36.
Tatusov, R.L., D.A. Natale, I.V. Garkavtsev, T.A. Tatusova, U.T. Shankavaram, B.S. Rao, B. Kiryutin, M.Y. Galperin, N.D. Fedorova, and E.V. Koonin. 2001. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29: 22-28.
Tsujimoto, K., M. Semadeni, M. Huflejt, and L. Packer. 1988. Intracellular pH of halobacteria can be determined by the fluorescent dye 2', 7'-bis(carboxyethyl)-5(6)-carboxyfluorescein. Biochem Biophys Res Commun 155: 123-129.
Tyson, G.W., J. Chapman, P. Hugenholtz, E.E. Allen, R.J. Ram, P.M. Richardson, V.V. Solovyev, E.M. Rubin, D.S. Rokhsar, and J.F. Banfield. 2004. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428: 37-43.
Venter, J.C., K. Remington, J.F. Heidelberg, A.L. Halpern, D. Rusch, J.A. Eisen, D. Wu, I. Paulsen, K.E. Nelson, W. Nelson, D.E. Fouts, S. Levy, A.H. Knap, M.W. Lomas, K. Nealson, O. White, J. Peterson, J. Hoffman, R. Parsons, H. Baden-Tillson, C. Pfannkoch, Y.H. Rogers, and H.O. Smith. 2004. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science.
Westphal, S.P. 2002. Your very own sequence. Last month, entrepreneur Craig Venter announced a bold new target: for anyone to be able to get their genome sequenced for under $1000. So can it be done? And what use would it be knowing your genome sequence anyway? New Sci 176: 12-13.
Wolf, Y.I., I.B. Rogozin, N.V. Grishin, R.L. Tatusov, and E.V. Koonin. 2001. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol 1: 8.
Xia, X., T. Wei, Z. Xie, and A. Danchin. 2002. Genomic changes in nucleotide and dinucleotide frequencies in Pasteurella multocida cultured under high temperature. Genetics 161: 1385-1394.
Zimmer, C. 2003. Genomics. Tinker, tailor: can Venter stitch together a genome from scratch? Science 299: 1006-1007.