簡易檢索 / 詳目顯示

研究生: 陳盈妤
Chen, Ying-Yu
論文名稱: 在子演化樹上做字串搜尋的索引框架
Index Framework for the Subphylogeny Pattern Searching Problem
指導教授: 韓永楷
Hon, Wing-Kai
口試委員: 盧錦隆
Lu, Chin-Lung
李哲榮
Lee, Che-Rung
學位類別: 碩士
Master
系所名稱:
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 23
中文關鍵詞: 子演化樹字串搜尋DNA相似度索引結構
外文關鍵詞: Subphylogeny, Pattern Matching, DNA Similarity, Indexing
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文研究如何在任意子演化樹上做有效率的字串搜索。我們利用在演化樹上相近的物種在DNA字串上較相似的特點,可以以此減少相同字串的儲存空間,再輔以Suffix Tree、Wavelet Tree及Heavy Path Composition三種資料結構,設計出一個空間與最少資料量成正比的索引結構,並以此達到有效率字串搜索的目標。


    This paper studies how to perform efficient subphylogeny pattern matching query on a phylogeny. By exploiting the fact that the close species in the phylogeny share many similarities in their DNA sequences, so that we can reduce the space to store these sequences, and applying existing data structures like suffix array, wavelet tree, and heavy path decomposition, we design an indexing structure that takes asymptotically minimal space, while supporting the subphylogeny pattern matching query, efficiently, as desired.

    1. Introduction---------------------------------2 2. Preliminaries--------------------------------7 2.1 Suffix Array--------------------------------7 2.2 Heavy Path Decomposition--------------------8 2.3 Wavelet Tree-------------------------------10 2.3.1 Restricted 2D Orthogonal Range Query-----12 3. Our Framework-------------------------------14 3.1 Index Structure----------------------------15 3.2 Query Algorithm----------------------------17 4. Conclusion----------------------------------20

    [1] R. J. Britten. Divergence between Samples of Chimpanzee and
    Human DNA Sequences is 5%, Counting Indels. Proc. National
    Academy of Science, 99(21):13633-13635, 2002.
    [2] R. Cole, L.-A. Gottlieb, and M. Lewenstein. Dictionary Matching
    and Indexing with Errors and Don't Cares. In Proc. of Symposium
    on Theory of Computing (STOC), pages 91-100, 2004.
    [3] P. Ferragina and G. Manzini. Indexing Compressed Text. Journal
    of the ACM (JACM), 52(4):552-581, 2005.
    [4] R. Grossi, A. Gupta, and J. S. Vitter. High-order Entropycompressed
    Text Indexes. In Proc. of ACM-SIAM Symposium
    on Discrete Algorithms (SODA), pages 841-850, 2003.
    [5] M.-C. King and A. C. Wilson. Evolution at Two Levels in Humans
    and Chimpanzees. Science, 188(4184):107-116, 1975.
    [6] U. Manber and G. Myers. Sux Arrays: A New Method for Online
    String Searches. siam Journal on Computing, 22(5):935-948, 1993.
    [7] E. M. McCreight. A Space-Economical Sux Tree Construction
    Algorithm. Journal of the ACM (JACM), 23(2):262-272, 1976.
    [8] G. Navarro. Wavelet Trees for All. Journal of Discrete Algorithms(JDA), 25:2-20, 2014.
    [9] R. Raman, V. Raman, and S. R. Satti. Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees, Pre x Sums
    and Multisets. ACM Transactions on Algorithms (TALG), 3(4),
    article 43, 2007.
    [10] P. Weiner: Linear Pattern Matching Algorithms. In Proc. of
    Symposium on Switching and Automata Theory, pages 1-11,
    1973.
    [11] Wikipeidia entry for Metagenomics. Link at:
    https://en.wikipedia.org/wiki/Metagenomics

    QR CODE