研究生: |
孫敬倫 Sun, Jing-Lun |
---|---|
論文名稱: |
整合參考基因體重組技術與序列遞迴修正方法 進行次世代基因體序列分析 IRAP: An Iterative Reference-Guided Assembly Pipeline for Next Generation Sequencing DNA-seq analysis |
指導教授: |
唐傳義
Tang, Chuan-Yi 王雯靜 Wang, Wen-Ching |
口試委員: |
唐傳義
Tang, Chuan-Yi 王雯靜 Wang, Wen-Ching 劉明麗 Liou, Ming-Li |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 50 |
中文關鍵詞: | 序列重組 、基因註解 、組序 、貼序 |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
這篇論文中,我們的目標是希望能整合貼序(aligner)及組序(assembler)兩種方法來提高最後重組完序列的準確性及提升已被註解的基因數量。目前序列重組主要有兩種方式:貼序及組序,兩者最大的差別在於貼序必須要給定基因體序列(reference)。兩種方法各有其優缺點,貼序的方式可以利用較少的資源來找到高相似度的序列,但是貼序結果的好壞會受限於使用的基因體序列跟定序片段(reads)的相似度。組序的優點是可以不需要給定基因體序列,所以不會受限於要有已知且相似的基因體序列,但是組序的結果會因定序片段的錯誤率及演算法上給定的特定值而有相當的差異。
基因註解上我們會使用已知的基因序列來做註解,並針對在基因體序列上註解的基因來利用定序片段來偵測是否有因抗藥性而產生的突變。在有偵測到突變的基因裡,我們會用蛋白質結構的變化及基因功能(gene function)來做篩選,並利用蛋白質與蛋白質之間的交互作用(protein-protein interactions)來篩選出最有可能的變異基因來做生物驗證。驗證方面我們使用聚合酵素鏈鎖反應(polymerase chain reaction)、基因剔除(gene knockout)等級相關生物實驗來做驗證,利用這些實驗可以驗證我們組的基因體序列的準確性,另一方面也可以去驗證在基因上找到的突變是否跟抗藥性相關。
Background:
For discovering antibiotic resistant genes in functional genomics, comprehensive variant detection between target genome and template genome is important especially variant on an annotated genes. While Next-generation sequencing (NGS) of high throughput DNA sequencing (DNA-seq) has emerged as a powerful technology to conquer these problems, the success of each assembled tool are dependent upon the availability and quality of detected genes.
Results:
Here, we describe IRAP (An Iterative Reference-guided Assembly Pipeline for next generation sequencing DNA-seq analysis), a pipeline that assemble reads into contigs and predict gene models by designing an iterative re-sequencing and integrating de novo assembly of DNA-seq data. We have simulate five distinct Helicobacter pylori lineages generated by ART by IRAP assembly pipeline and applied to real data of three Helicobacter pylori isolate pairs and compared the results to the several existing assembly tools. The contigs produced by IRAP has highly accurate (over 99%) and reconstruct more full-length genes for the majority of the existing reference gene sets (100%). All results showed that IRAP can match or outperform than current popular assemblers.
Conclusions:
These results demonstrate that the IRAP pipeline is able to reconstruct more full-length genes accurately and has more chance to discover potential genes which associate to antibiotic-resistant directly or indirectly.
1. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 1977, 74: 5463–5467.
2. Illumina website. Available: http://www.illumina.com/technology/paired_end_sequencing_assay.ilmn. Accessed 2011 Aug 3.
3. Applied Biosystems website. Available: http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing/next-generation-systems.html. Accessed 2011 Aug 3.
4. 454 website. Available: [http://454.com/applications/whole-genome-sequencing/index.asp]
5. Shendure J, Ji H: Next-generation DNA sequencing. Nature Biotechnol-ogy 2008, 26: 1135–1145.
6. C. S. Keith, D. O. Hoang, B. M. Barrett, B. Feigelman, M. C. Nelson, H. Thai and C. Baysdorfer: Partial Sequence Analysis of 130 Randomly Selected Maize cDNA Clones. PlantPhysiology 1993, 101: 329–332.
7. Li H, Durbin R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1754-60. Epub 2009 May 18.
8. Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009, 10:R25
9. Ruiqiang Li, Yingrui Li, Karsten Kristiansen and Jun Wang: SOAP: short oligonucleotide alignment program. Bioinformatics 2008, Vol. 24 no. 5: 713–714
10. Heng Li,Jue Ruan, and Richard Durbin : Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18:1851-1858
11. Jared T. Simpson,Kim Wong, Shaun D. Jackman, Jacqueline E. Schein,Steven J.M. Jones and İnanç Birol : ABySS: A parallel assembler for short read sequence data. Genome Res 2009, 19:1117-1123
12. Daniel R. Zerbino and Ewan Birney: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008 May; 18(5): 821–829.
13. Mark J. Chaisson and Pavel A. Pevzner : Short read fragment assembly of bacterial genomes. Genome Res, 2008, 18:324-330
14. Ruiqiang Li,Hongmei Zhu,Jue Ruan,Wubin Qian,Xiaodong Fang, Zhongbin Shi,Yingrui Li, Shengting Li, Gao Shan,Karsten Kristiansen, Songgang Li, Huanming Yang, Jian Wang and Jun Wang : De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. published online December 17, 2009
15. Fuzzypath ftp://ftp.sanger.ac.uk/pub/zn1/fuzzypath/
16. Edena http://www.genomic.ch/edena.php
17. Yu Peng, Henry Leung, S.M. Yiu, Francis Y.L. Chin: IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler
18. Huang WC, Li LP, Myers JR, and Marth GT: ART: a next-generation sequencing read simulator. Bioinformatics 2012, 28:593-594.
19. C.H. Lai, C.H. Kuo, P.Y. Chen, S.K. Poon, C.S. Chang, W.C. Wang Association of antibiotic resistance and higher internalization activity in resistant Helicobacter pylori isolates. Journal of Antimicrobial Chemotherapy 2006, 57: 466–471.
20.Schneeberger K, Ossowski S, Ott F, Klein JD, Wang X, Lanz C, Smith LM, Cao J, Fitz J, Warthmann N, Henz SR, Huson DH, Weigel D: Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proc. Natl. Acad. Sci. 2011, 108: 10249–10254.
21. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup : The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 2009, 25: 2078-9.
22. Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22: 557–567.
23. Sommer DD, Delcher AL, Salzberg SL, Pop M: Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 2007, 8:64.
24. Huang, X. and Madan, A: CAP3: A DNA sequence assembly program. Genome Research 1999, 9:868-877.
25. A.M. Phillippy, M.C. Schatz, M. Pop: Genome assembly forensics: finding the elusive mis-assembly. Genome Biol. 2008, 9:R55.
26. Lander E.S., et al.: Initial sequencing and analysis of the human genome. Nature 2001, 409:860-921.
27. Jeffrey Martin, Vincent M Bruno, Zhide Fang, Xiandong Meng, Matthew Blow,Tao Zhang, Gavin Sherlock, Michael Snyder and Zhong Wang: Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics 2010, 11:663.
28. Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K: SNP detection formassively parallel whole-genome resequencing. Genome Res 2009, 19:1124-1132.