簡易檢索 / 詳目顯示

研究生: 吳珮如
Wu, Pei-Ju
論文名稱: 動態Burrows-Wheeler轉換方法設計與實踐
Dynamic Burrows-Wheeler Transform and Its Practical Implementation
指導教授: 韓永楷
Hon, Wing-Kai
口試委員: 李哲榮
Lee, Che-Rung
盧錦隆
Lu, Chin-Lung
學位類別: 碩士
Master
系所名稱:
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 42
中文關鍵詞: 塊排序壓縮動態去氧核醣核酸
外文關鍵詞: Burrows–Wheeler
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Burrows-Wheeler 轉換是一種應用在資料壓縮和生物基因序列比對上的演算法,近年來,由於次世代定序的出現,基因序列比對成為一個受到大家關注的研究問題,次世代定序是目前最新的一種DNA定序方法,它大幅降低了傳統定序法所需的時間和空間的成本,目前許多序列比對工具的核心都使用Burrows-Wheeler 轉換演算法來壓縮資料,像是BWA、Bowtie、SOAP2。然而,在基因定序時,若是需要對小部分的序列做更動,或是加入新的序列,那麼就要將原本全部的序列重新做一次Burrows-Wheeler 轉換。為了改善重新轉換所需的時間和空間成本,我們設計並實作出一個動態Burrows-Wheeler 轉換的演算法,可以用較短的時間就能得到相同的結果,並且不需要全部重新做轉換,除此之外我們也提供一些使用者查詢的功能。


    Burrows-Wheeler Transform (BWT) is a popular algorithm applied in data compression and DNA sequence alignment. Due to the emergence of Next Generation Sequencing --- the latest DNA sequencing method that significantly reduces the cost of time and space required by traditional sequencing methods, DNA sequence alignment has become a subject of great concern in recent years. Currently, many of the sequence alignment tools, such as BWA, Bowtie and SOAP2, are based on BWT to reduce the memory requirement. However, it is not convenient for us to re-do the transform of whole collection of sequences while we only add or delete a sequence in the collection. In order to improve the time and space required for re-transform, we designed and implemented a dynamic BWT algorithm. With our method, we can get the same result as the static transform in a shorter time, and do not need to re-do the transform.

    Abstract Contents 1 Introduction 1 1.1 Organization . . . . . . . . . . . . . . . . . . . . 4 2 Preliminaries 5 2.1 Burrows-Wheeler Transform (BWT) . . . . . . . 6 2.1.1 BWT and Backward Search . . . . . . . . 6 2.1.2 Generalized BWT . . . . . . . . . . . . . 9 2.1.3 Rank and Select in BWT . . . . . . . . . 10 2.2 B-Tree . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Methods 14 3.1 Data Structure . . . . . . . . . . . . . . . . . . . 15 3.2 Insertion of New Strings . . . . . . . . . . . . . . 17 3.3 Query . . . . . . . . . . . . . . . . . . . . . . . . 19 13.3.1 Rank . . . . . . . . . . . . . . . . . . . . 19 3.3.2 Select . . . . . . . . . . . . . . . . . . . . 20 3.4 External Memory Implementation . . . . . . . . . 21 4 Experiment 22 4.1 Optimization . . . . . . . . . . . . . . . . . . . . 23 4.1.1 Encoding of Characters . . . . . . . . . . 23 4.1.2 Bit Shift Operations . . . . . . . . . . . . 25 4.1.3 Fragment Size . . . . . . . . . . . . . . . . 27 4.1.4 Fanout Size . . . . . . . . . . . . . . . . . 30 4.2 Comparison between Static and Dynamic Versions 35 5 Conclusion and Future Work 39

    [1] R. Arnold and T. Bell: The Canterbury Corpus Home Page.
    http://corpus.canterbury.ac.nz/.
    [2] T. Beller, M. Zwerger, S. Gog, and E. Ohlebusch: Space-Efficient Construction of the Burrows-Wheeler Transform.
    Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 5–16, 2013.
    [3] M. Burrows and D. Wheeler: A Block Sorting Lossless Data Compression Algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
    [4] P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro: An Alphabet-Friendly FM-Index. Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 150–160, 2004.
    [5] Y. Mori: An Implementation of the Induced Sorting Algorithm, 2010.
    https://sites.google.com/site/yuta256/.
    [6] G. Nong, S. Zhang, and W. H. Chan: Two Effi cient Algorithms for Linear Time Suffi x Array Construction. IEEE Transactions on Computers, 60:1471–1484, 2011.
    [7] J. Sirén: Burrows-Wheeler Transform for Terabases. Proceedings of IEEE Data Compression Conference (DCC), pages 211–220, 2016.
    [8] National Center for Biotechnology Information (NCBI).
    https://www.ncbi.nlm.nih.gov/
    [9] The 1000 Genomes Project Consortium: A Map of Human
    Genome Variation from Population-Scale Sequencing. Nature, 467(7319):1061–1073, 2010.
    [10] Wikipedia: B-tree. https://en.wikipedia.org/wiki/B-tree.

    QR CODE