研究生: |
盧振華 Lu, Chen-Hua |
---|---|
論文名稱: |
A Re-sequencing Tool For Next Generation Sequencing Based On Burrows-Wheeler Transform 基於Burrows-Wheeler轉換之次世代定序技術重定序工具 |
指導教授: |
唐傳義
Tang, Chuan-Yi |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | NGS 、BWT 、FM-index 、重定序 、短序列讀數 、去氧核醣核酸 |
外文關鍵詞: | NGS, BWT, FM-index, re-sequencing, short read, DNA |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文中,我們呈現了一個適用於次世代定序技術資料的重定序工具。這些資料是由大量的短序列所構成的,並將被對應到一個參考基因體上。我們修改並實作了Borrows-Wheeler Transform與FM-index的演算法,以此對人類的基因體建立索引;並提出了一個分割短序列的方法,使我們在對應時得以容許更長的漢明距離。最後,我們從1000 Genome Project中選出4組不同長度的真實資料,在個人電腦上進行測試,藉此展現此工具的性能,並與bowtie這個廣為使用的工具進行比較。
In this thesis, we present a re-sequencing tool designed for the Next Generation Sequencing (NGS) data. These data are composed huge amount of short reads, which are to align onto a referenced genome. We modified and implemented the algorithm of Borrows-Wheeler Transform and FM-index to build the genome index of human, and proposed an idea to segment each short read which lead us to align short reads with longer Hamming distance. Finally, we used 4 real data sets with different lengths form 1000 Genome Project to demonstrate the performance of our tool with a personal computer, and compared the results with a widely used tool, bowtie.
1. Flicek P. and Birney E. Sense from sequence reads: methods for alignment and assembly. Nature Methods (2009) 6:S6–12.
2. Li H., Ruan J., and Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research (2008) 18:1851-1858.
3. Li R., Li Y., Kristiansen K., and Wang J. SOAP: short oligonucleotide alignment program. Bioinformatics (2008) 24:713-714.
4. Smith AD., Xuan Z., and Zhang MQ. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics (2008) 9:128.
5. Jiang H. and Wong WH. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics (2008) 24:2395-2396.
6. Langmead B, Trapnell C, Pop M, and Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology (2009) 10:R25.
7. Li R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics (2009) 25:1966-1967.
8. Li H and Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 25:1754-60.
9. Burrows M. and Wheeler DJ. A block-sorting lossless data compression algorithm. Technical report (1994) 124, Palo Alto, CA, Digital Equipment Corporation.
10. Ferragina P and Manzini G. Opportunistic data structures with applications. Proceedings of the 41st Symposium on Foundation of Computer Science (FOCS 2000):390–398.
11. Ferragina P and Manzini G. An experimental study of an opportunistic index. Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete algorithms (2001):269-278.
39
12. NCBI Human Genome Resources. http://www.ncbi.nlm.nih.gov/projects/genome/guide/human/
13. 1000 Genomes Project. http://www.1000genomes.org/