研究生: |
蔡鳴興 Tsai, Ming-Hsin |
---|---|
論文名稱: |
微生物的物種鑑定及重建代謝網絡 Species Identification and Metabolic Network Reconstructions of Microbes |
指導教授: |
蘇豐文
Soo, Von-Wun |
口試委員: |
陳朝欽
Chen, Chaur-Chin 陳宜欣 Chen, Yi-Shin 曾新穆 Tseng, Vincent S 林仲彥 Lin, Chung-Yen |
學位類別: |
博士 Doctor |
系所名稱: |
|
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 89 |
中文關鍵詞: | 微生物 、物種辨識 、代謝網路 、次世代定序 、基因檢測 、序列搜尋 |
外文關鍵詞: | microorganism, Species identification, Metabolic network, Next-generation sequencing, Gene detection, Sequence search |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
次世代定序(NGS)已經是一個非常成熟的技術,在加上其成本越來越低,讓我們有機會更細微的觀察在同一物種中不同菌株之間序列上的差異,以及廣泛的採集環境基因,進一步來探討微生物之間共生的機制。近年來對於微生物的辨識及分類已經從如16S核糖體RNA等部分基因的檢測分析轉為全基因體的分析,因為全基因體可以提供更多的資訊。這對公共衛生及流行病學的研究很有幫助,例如研究人員可以看到致病菌株之間細微差異並做出精確的分群。更進一步的可以推論出病菌的親緣關係及傳染途徑。
序列比對已經是一個非常成熟的技術,尤其在NGS技術問世後,各種為了在巨量的短片段序列中加速搜尋的方法也陸續被開發出來。因NGS的定序準確率在99.9%以上,所以這類的工具通常犧牲其敏感度來增加搜尋速度。為因應不同的相似度的序列搜尋,我們提出了兩個方法用來調整敏感度及搜尋速度,這兩個方法為Fault-Tolerant Hash及Local Maximum Hash Search Strategy。這兩個方法允許我們在敏感度、搜尋空間及速度之間做交換,在不同的資料條件下選擇最佳的參數,例如搜尋短片段的NGS序列則可犧牲敏感度來換去速度。若是搜尋長片段的NGS序列(或稱第三代定序),如SMRT及Nanopore等定序技術準確率只有85%的資料則需調高敏感度來增加準確率。
為了讓公共衛生及流行病學的研究人員可以方便的利用NGS資料做流病分析,我們開發了一個自動執行病原菌辨別及親緣關係分析的網頁服務系統,PathoBacTyper。這系統可以方便讓不熟悉計算機分析的研究人員輕易的執行分析的工作。我們也提供多種形式的輸出結果讓研究人員容易判讀或做更進一步的分析。
在微生物之間共生機制的研究,我們發展了GEMSiRV軟體,這是一套允許研究人員編輯微生物代謝網路系統,並執行代謝系統模擬的視覺化軟體。對使用者友善的操作介面,讓研究人員可以快速的重建微生物代謝網路,其視覺化的呈現讓研究人員清楚的瞭解系統模擬的結果。
Next generation sequencing (NGS) is already a very mature technology, coupled with its lower and lower costs, giving us the opportunity to observe more closely at the sequence variation between different strains in the same species, as well as extensive collection environmental genes, to further explore the mechanism of microbial symbiosis. In recent years, the identification and classification of microorganisms have shifted from the detection and analysis of some genes such as 16S ribosomal RNA to genome-wide analysis because the whole genome can provide more information. This is helpful for public health and epidemiological research, where researchers can see the subtle variation between pathogenic strains and make precise clustering. Further can infer the genetic relationship and transmission of pathogens.
Sequence alignment is already a very mature technology. Especially after NGS technology emerge, various methods for accelerating the search in a huge number of short reads sequences have also been developed. Because of the NGS's sequencing accuracy above 99.9%, such tools often sacrifice their sensitivity to increase search speed. To sequence search for different degrees of similarity, we propose two methods to adjust sensitivity and search speed: Fault-Tolerant Hash and Local Maximum Hash Search Strategy. These two methods give us a trade-off among sensitivity, search space and speed, select the best parameters under different data conditions, for example, the NGS sequence searching for short reads can sacrifice sensitivity for speed. If NGS sequences are searched for long reads (third-generation sequencing), the accuracy of sequencing technologies such as SMRT and Nanopore is only 85%, so the sensitivity should be increased to increase the search accuracy.
To enable public health and epidemiological researchers to conveniently use NGS data for disease analysis, we developed PathoBacTyper, a web service that automatically performs pathogen identification and phylogenetic analysis. This system makes it easy for researchers who are not familiar with computer analysis to easily perform analytical work. We also provide various forms of output to make it easy for researchers to interpret or to do further analysis.
In the study of symbiotic mechanisms between microorganisms, we have developed the GEMSiRV software, a visual software that allows researchers to edit microbial metabolic network systems and perform metabolic system simulations. The user-friendly interface allows researchers to quickly rebuild the metabolic network of microbes and its visual presentation gives researchers a clear understanding of the results of system simulations.
[1] Pogoda, C. S., Keepers, K. G., Lendemer, J. C., Kane, N. C., & Tripp, E. A. (2018). Reductions in Complexity of Mitochondrial Genomes in Lichen‐Forming Fungi Shed Light on Genome Architecture of Obligate Symbioses. Molecular ecology.
[2] Amann, R. I., Binder, B. J., Olson, R. J., Chisholm, S. W., Devereux, R., & Stahl, D. A. (1990). Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Applied and environmental microbiology, 56(6), 1919-1925.
[3] Schloss, P. D., & Handelsman, J. (2005). Metagenomics for studying unculturable microorganisms: cutting the Gordian knot. Genome biology, 6(8), 229.
[4] Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A., ... & Fouts, D. E. (2004). Environmental genome shotgun sequencing of the Sargasso Sea. science, 304(5667), 66-74.
[5] Tremaroli, V., & Bäckhed, F. (2012). Functional interactions between the gut microbiota and host metabolism. Nature, 489(7415), 242.
[6] Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K. S., Manichanh, C., ... & Mende, D. R. (2010). A human gut microbial gene catalogue established by metagenomic sequencing. nature, 464(7285), 59.
[7] Karlsson, F. H., Tremaroli, V., Nookaew, I., Bergström, G., Behre, C. J., Fagerberg, B., ... & Bäckhed, F. (2013). Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature, 498(7452), 99.
[8] Le Chatelier, E., Nielsen, T., Qin, J., Prifti, E., Hildebrand, F., Falony, G., ... & Leonard, P. (2013). Richness of human gut microbiome correlates with metabolic markers. Nature, 500(7464), 541.
[9] Turnbaugh, P. J., Hamady, M., Yatsunenko, T., Cantarel, B. L., Duncan, A., Ley, R. E., ... & Egholm, M. (2009). A core gut microbiome in obese and lean twins. nature, 457(7228), 480.
[10] Ochman, H., Lawrence, J. G., & Groisman, E. A. (2000). Lateral gene transfer and the nature of bacterial innovation. nature, 405(6784), 299.
[11] Gogarten, J. P., & Townsend, J. P. (2005). Horizontal gene transfer, genome innovation and evolution. Nature Reviews Microbiology, 3(9), 679.
[12] Nelson, K. E., Clayton, R. A., Gill, S. R., Gwinn, M. L., Dodson, R. J., Haft, D. H., ... & McDonald, L. (1999). Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature, 399(6734), 323.
[13] Abubucker, S., Segata, N., Goll, J., Schubert, A. M., Izard, J., Cantarel, B. L., ... & White, O. (2012). Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS computational biology, 8(6), e1002358.
[14] Franz, E., Delaquis, P., Morabito, S., Beutin, L., Gobius, K., Rasko, D.A., Bono, J., French, N., Osek, J., Lindstedt, B.A. et al. (2014). Exploiting the explosion of information associated with whole genome sequencing to tackle Shiga toxin-producing Escherichia coli (STEC) in global food production systems. Int J Food Microbiol, 187, 57-72
[15] Gordon, N.C., Price, J.R., Cole, K., Everitt, R., Morgan, M., Finney, J., Kearns, A.M., Pichon, B., Young, B., Wilson, D.J. et al. (2014). Prediction of Staphylococcus aureus antimicrobial resistance by whole-genome sequencing. J Clin Microbiol, 52, 1182-1191
[16] Halachev, M.R., Chan, J.Z., Constantinidou, C.I., Cumley, N., Bradley, C., Smith-Banks, M., Oppenheim, B. and Pallen, M.J. (2014). Genomic epidemiology of a protracted hospital outbreak caused by multidrug-resistant Acinetobacter baumannii in Birmingham, England. Genome Med, 6, 70
[17] Joensen, K.G., Scheutz, F., Lund, O., Hasman, H., Kaas, R.S., Nielsen, E.M. and Aarestrup, F.M. (2014). Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J Clin Microbiol, 52, 1501-1510
[18] Koser, C.U., Ellington, M.J. and Peacock, S.J. (2014). Whole-genome sequencing to control antimicrobial resistance. Trends Genet, 30, 401-407
[19] Luo, T., Yang, C., Peng, Y., Lu, L., Sun, G., Wu, J., Jin, X., Hong, J., Li, F., Mei, J. et al. (2014). Whole-genome sequencing to detect recent transmission of Mycobacterium tuberculosis in settings with a high burden of tuberculosis. Tuberculosis (Edinb), 94, 434-440
[20] Merker, M., Kohl, T.A., Roetzer, A., Truebe, L., Richter, E., Rusch-Gerdes, S., Fattorini, L., Oggioni, M.R., Cox, H., Varaine, F. et al. (2013). Whole genome sequencing reveals complex evolution patterns of multidrug-resistant Mycobacterium tuberculosis Beijing strains in patients. PLoS One, 8, e82551
[21] Parkhill, J., Dougan, G., James, K.D., Thomson, N.R., Pickard, D., Wain, J., Churcher, C., Mungall, K.L., Bentley, S.D., Holden, M.T. et al. (2001). Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature, 413, 848-852
[22] Schmid, D., Allerberger, F., Huhulescu, S., Pietzka, A., Amar, C., Kleta, S., Prager, R., Preussel, K., Aichinger, E. and Mellmann, A. (2014). Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011-2013. Clin Microbiol Infect, 20, 431-436
[23] Struelens, M.J. and Brisse, S. (2013). From molecular to genomic epidemiology: transforming surveillance and control of infectious diseases. Euro Surveill, 18, 20386
[24]Meinel, D.M., Margos, G., Konrad, R., Krebs, S., Blum, H. and Sing, A. (2014). Next generation sequencing analysis of nine Corynebacterium ulcerans isolates reveals zoonotic transmission and a novel putative diphtheria toxin-encoding pathogenicity island. Genome Med, 6, 113
[25]Hoffmann, M., Zhao, S., Pettengill, J., Luo, Y., Monday, S.R., Abbott, J., Ayers, S.L., Cinar, H.N., Muruvanda, T., Li, C. et al. (2014). Comparative genomic analysis and virulence differences in closely related salmonella enterica serotype heidelberg isolates from humans, retail meats, and animals. Genome Biol Evol, 6, 1046-1068
[26]Acke, F.R., Malfait, F., Vanakker, O.M., Steyaert, W., De Leeneer, K., Mortier, G., Dhooge, I., De Paepe, A., De Leenheer, E.M. and Coucke, P.J. (2014). Novel pathogenic COL11A1/COL11A2 variants in Stickler syndrome detected by targeted NGS and exome sequencing. Mol Genet Metab, 113, 230-235
[27] Leekitcharoenphon, P., Nielsen, E.M., Kaas, R.S., Lund, O. and Aarestrup, F.M. (2014). Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica. PLoS One, 9, e87991
[28] Achtman, M. (2008). Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol, 62, 53-70
[29] Nielsen, R., Paul, J.S., Albrechtsen, A. and Song, Y.S. (2011). Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet, 12, 443-451
[30] Holt, K.E., Baker, S., Dongol, S., Basnyat, B., Adhikari, N., Thorson, S., Pulickal, A.S., Song, Y., Parkhill, J., Farrar, J.J. et al. (2010). High-throughput bacterial SNP typing identifies distinct clusters of Salmonella Typhi causing typhoid in Nepalese children. BMC Infect Dis, 10, 144
[31] Bekal, S., Berry, C., Reimer, A.R., Van Domselaar, G., Beaudry, G., Fournier, E., Doualla-Bell, F., Levac, E., Gaulin, C., Ramsay, D. et al. (2016). Usefulness of High-Quality Core Genome Single-Nucleotide Variant Analysis for Subtyping the Highly Clonal and the Most Prevalent Salmonella enterica Serovar Heidelberg Clone in the Context of Outbreak Investigations. J Clin Microbiol, 54, 289-295
[32] Taylor, A.J., Lappi, V., Wolfgang, W.J., Lapierre, P., Palumbo, M.J., Medus, C. and Boxrud, D. (2015). Characterization of Foodborne Outbreaks of Salmonella enterica Serovar Enteritidis with Whole-Genome Sequencing Single Nucleotide Polymorphism-Based Analysis for Surveillance and Outbreak Detection. J Clin Microbiol, 53, 3334-3340
[33] Bakker, H.C., Switt, A.I., Cummings, C.A., Hoelzer, K., Degoricija, L., Rodriguez-Rivera, L.D., Wright, E.M., Fang, R., Davis, M., Root, T. et al. (2011). A whole-genome single nucleotide polymorphism-based approach to trace and identify outbreaks linked to a common Salmonella enterica subsp. enterica serovar Montevideo pulsed-field gel electrophoresis type. Appl Environ Microbiol, 77, 8648-8655
[34] Octavia, S., Wang, Q., Tanaka, M.M., Kaur, S., Sintchenko, V. and Lan, R. (2015). Delineating community outbreaks of Salmonella enterica serovar Typhimurium by use of whole-genome sequencing: insights into genomic variability within an outbreak. J Clin Microbiol, 53, 1063-1071
[35] Katz, L. S., Griswold, T., Williams-Newkirk, A. J., Wagner, D., Petkau, A., Sieffert, C., Carleton, H. A. (2017). A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens. Frontiers in Microbiology, 8
[36] Petkau, A., Mabon, P., Sieffert, C., Knox, N., Cabral, J., Iskander, M., Iskander, M., Weedmark, K., Zaheer, R., Katz, L.S. et al. (2016). SNVPhyl: A Single Nucleotide Variant Phylogenomics pipeline for microbial genomic epidemiology. bioRxiv
[37] Davis, S., Pettengill, J.B., Luo, Y., Payne, J., Shpuntoff, A., Rand, H. and Strain, E. (2015). CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data. PeerJ Computer Science, 1, e20
[38] Scholz, M., Ward, D.V., Pasolli, E., Tolio, T., Zolfo, M., Asnicar, F., Truong, D.T., Tett, A., Morrow, A.L. and Segata, N. (2016). Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat Methods, 13, 435-438
[39] Weese, D., Emde, A.K., Rausch, T., Doring, A. and Reinert, K. (2009). RazerS--fast read mapping with sensitivity control. Genome Res, 19, 1646-1654
[40] Alkan, C., Kidd, J.M., Marques-Bonet, T., Aksay, G., Antonacci, F., Hormozdiari, F., Kitzman, J.O., Baker, C., Malig, M., Mutlu, O. et al. (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet, 41, 1061-1067
[41] Huerta-Cepas, J., Serra, F. and Bork, P. (2016). ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol, 33, 1635-1638
[42] Treangen, T. J., Ondov, B. D., Koren, S., & Phillippy, A. M. (2014). The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome biology, 15(11), 524
[43] Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2–approximately maximum-likelihood trees for large alignments. PloS one, 5, e9490
[44] Petersen, R. F., Litrup, E., Larsson, J. T., Torpdahl, M., Sørensen, G., Müller, L., & Nielsen, E. M. (2011). Molecular characterization of Salmonella Typhimurium highly successful outbreak strains. Foodborne pathogens and disease, 8(6), 655-661
[45] Torpdahl, M., Sørensen, G., Lindstedt, B. A., & Nielsen, E. M. (2007). Tandem repeat analysis for surveillance of human Salmonella Typhimurium infections. Emerging infectious diseases, 13(3)
[46] De Been, M., Pinholt, M., Top, J., Bletz, S., Mellmann, A., Van Schaik, W., ... & Corander, J. (2015). Core genome multilocus sequence typing scheme for high-resolution typing of Enterococcus faecium. Journal of clinical microbiology, 53(12), 3788-3797.
[47] Moran-Gilad, J., Prior, K., Yakunin, E., Harrison, T. G., Underwood, A., Lazarovitch, T., ... & Grotto, I. (2015). Design and application of a core genome multilocus sequence typing scheme for investigation of Legionnaires' disease incidents. Euro Surveill, 20(21186), 10-2807.
[48] Riesenfeld, C. S., Schloss, P. D., & Handelsman, J. (2004). Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet., 38, 525-552.
[49] Xia, L. C., Cram, J. A., Chen, T., Fuhrman, J. A., & Sun, F. (2011). Accurate genome relative abundance estimation based on shotgun metagenomic reads. PloS one, 6(12), e27992.
[50] Tsai, M. H., Liu, Y. Y., & Soo, V. W. (2017). PathoBacTyper: A Web Server for Pathogenic Bacteria Identification and Molecular Genotyping. Frontiers in Microbiology, 8.
[51] Liao, Y. C., Tsai, M. H., Chen, F. C., & Hsiung, C. A. (2012). GEMSiRV: a software platform for GEnome-scale metabolic model simulation, reconstruction and visualization. Bioinformatics, 28(13), 1752-1758.
[52] Woese, C. R., Kandler, O., & Wheelis, M. L. (1990). Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences, 87(12), 4576-4579.
[53] Clarridge, J. E. (2004). Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clinical microbiology reviews, 17(4), 840-862.
[54] Pereira, F., Carneiro, J., Matthiesen, R., van Asch, B., Pinto, N., Gusmão, L., & Amorim, A. (2010). Identification of species by multiplex analysis of variable-length sequences. Nucleic acids research, 38(22), e203-e203.
[55] Poretsky, R., Rodriguez-R, L. M., Luo, C., Tsementzi, D., & Konstantinidis, K. T. (2014). Strengths and limitations of 16S rRNA gene amplicon sequencing in revealing temporal microbial community dynamics. PLoS One, 9(4), e93827.
[56] Rosen, G. L., Reichenberger, E. R., & Rosenfeld, A. M. (2011). NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics, 27(1), 127-129.
[57] Wood, D. E., & Salzberg, S. L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome biology, 15(3), R46.
[58] Tu, Q., He, Z., & Zhou, J. (2014). Strain/species identification in metagenomes using genome-specific markers. Nucleic acids research, 42(8), e67-e67.
[59] Ounit, R., Wanamaker, S., Close, T. J., & Lonardi, S. (2015). CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC genomics, 16(1), 236.
[60] Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. Pac Symp Biocomput: 564-575. PubMed: 11928508.
[61] Smith T, Waterman M: Identification of Common Molecular Subsequences. JMol Biol 1981, 147:194–197.
[62] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990; 215(3):403-10.
[63] Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol 2000, 7:203–214.
[64] Kent W: BLAT–the BLAST-like alignment tool. Genome Res 2002, 12:656–664.
[65] Ning,Z. et al. (2001) SSAHA: a fast search method for large DNA databases. Genome Res., 11, 1725–1729.
[66] Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24: 713–714.
[67] Smith,A.D. et al. (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics, 9, 128.
[68] Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18: 1851–1858.
[69] Jiang H, Wang WH (2008) SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24: 2395–2396.
[70] Lin,H. et al. (2008)ZOOM! Zillions of oligos mapped. Bioinformatics, 24, 2431–2437.
[71] Eaves HL, Gao Y (2009) MOM: maximum oligonucleotide mapping. Bioinformatics 25: 969–970.
[72] Campagna D, Albiero A, Bilardi A, Caniato E, Forcato C, Manavski S, Vitulo N, Valle G (2009) PASS: a program to align short sequences. Bioinformatics 25: 967–968.
[73] Kim YJ, Teletia N, Ruotti V, Maher CA, Chinnaiyan AM, et al. (2009) ProbeMatch: a tool for aligning oligonucleotide sequences. Bioinformatics 25: 1424–1425.
[74] Ferragina P, Manzini G (2000) Opportunistic Data Structures with Applications. Proceedings of the 41st Annual Symposium on FOCS. pp 390–398.
[75] Li R, Yu C, Li Y, Lam TW, Yiu SM, et al. (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967.
[76] Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.
[77] Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9: 357–359.
[78] Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler. Bioinformatics 25:1754–60.
[79] Chen Y, Hong J, Cui W, Zaneveld J, Wang W, et al. (2013) CGAP-Align: A High Performance DNA Short Read Alignment Tool. PLoS ONE 8(4): e61033. doi:10.1371/journal.pone.0061033.
[80] Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, et al. (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotech. 30, 693–700.
[81] Au KF, Underwood JG, Lee L, Wong WH (2012) Improving PacBio Long Read Accuracy by Short Read Alignment. PLoS ONE 7(10): e46679. doi:10.1371/journal.pone.0046679
[82] Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10:563–569.
[83] Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, et al. (2000) A whole-genome assembly of Drosophila. Science 287: 2196–2204. doi: 10.1126/science.287.5461.2196
[84] Chaisson MJ, Tesler G (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 2012, 13:238.
[85] Hucka, M., Finney, A., Sauro, H. M., Bolouri, H., Doyle, J. C., Kitano, H., ... & Cuellar, A. A. (2003). The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics, 19(4), 524-531.
[86] Durot, M., Bourguignon, P. Y., & Schachter, V. (2008). Genome-scale models of bacterial metabolism: reconstruction and applications. FEMS microbiology reviews, 33(1), 164-190.
[87] Liao, Y. C., Chen, J. C. Y., Tsai, M. H., Tang, Y. H., Chen, F. C., & Hsiung, C. A. (2011). MrBac: a web server for draft metabolic network reconstructions for bacteria. Bioengineered bugs, 2(5), 284-287.
[88] Liao, Y. C., Huang, T. W., Chen, F. C., Charusanti, P., Hong, J. S., Chang, H. Y., ... & Hsiung, C. A. (2011). An experimentally validated genome-scale metabolic reconstruction of Klebsiella pneumoniae MGH 78578, iYL1228. Journal of bacteriology, 193(7), 1710-1717.
[89] Truong, D. T., Franzosa, E. A., Tickle, T. L., Scholz, M., Weingart, G., Pasolli, E., ... & Segata, N. (2015). MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature methods, 12(10), 902.
[90] Li, H. (2012). Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics, 28(14), 1838-1844.
[91] Li, Y., Zou, L., Zhang, H., & Zhao, D. (2016). Computing longest increasing subsequences over sequential data streams. Proceedings of the VLDB Endowment, 10(3), 181-192.
[92] Höhl, M., Kurtz, S., & Ohlebusch, E. (2002). Efficient multiple genome alignment. Bioinformatics, 18(suppl_1), S312-S320.
[93] Kurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C., & Salzberg, S. L. (2004). Versatile and open software for comparing large genomes. Genome biology, 5(2), R12.
[94] Vishnoi, A., Roy, R., & Bhattacharya, A. (2007). Comparative analysis of bacterial genomes: identification of divergent regions in mycobacterial strains using an anchor-based approach. Nucleic acids research, 35(11), 3654-3667.
[95] Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press.
[96] Buhler, J. (2001). Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17(5), 419-428.
[97] Berlin, K., Koren, S., Chin, C. S., Drake, J. P., Landolin, J. M., & Phillippy, A. M. (2015). Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nature biotechnology, 33(6), 623.
[98] Ji, J., Li, J., Yan, S., Zhang, B., & Tian, Q. (2012). Super-bit locality-sensitive hashing. In Advances in Neural Information Processing Systems (pp. 108-116).
[99] Lin, H. N., & Hsu, W. L. (2017). Kart: a divide-and-conquer algorithm for NGS read alignment. Bioinformatics, 33(15), 2281-2287.
[100] Schleifer, K. H. (2009). Classification of Bacteria and Archaea: past, present and future. Systematic and applied microbiology, 32(8), 533-542.
[101] Doolittle, W. F., & Papke, R. T. (2006). Genomics and the bacterial species problem. Genome biology, 7(9), 116.
[102] Achtman, M., & Wagner, M. (2008). Microbial diversity and the genetic nature of microbial species. Nature Reviews Microbiology, 6(6), 431.
[103] Schlaberg, R., Simmon, K. E., & Fisher, M. A. (2012). A systematic approach for discovering novel, clinically relevant bacteria. Emerging infectious diseases, 18(3), 422.
[104] Clarridge, J. E. (2004). Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clinical microbiology reviews, 17(4), 840-862.
[105] Fournier, P. E., Dumler, J. S., Greub, G., Zhang, J., Wu, Y., & Raoult, D. (2003). Gene sequence-based criteria for identification of new rickettsia isolates and description of Rickettsia heilongjiangensis sp. nov. Journal of clinical microbiology, 41(12), 5456-5465.
[106] Beye, M., Fahsi, N., Raoult, D., & Fournier, P. E. (2018). Careful use of 16S rRNA gene sequence similarity values for the identification of Mycobacterium species. New microbes and new infections, 22, 24-29.
[107] Stoddard, S. F., Smith, B. J., Hein, R., Roller, B. R., & Schmidt, T. M. (2014). rrn DB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic acids research, 43(D1), D593-D598.
[108] Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., ... & Thompson, J. D. (2011). Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology, 7(1), 539.
[109] Kielbasa, S. M., Wan, R., Sato, K., Horton, P., & Frith, M. (2011). Adaptive seeds tame genomic sequence comparison. Genome research, gr-113985.