研究生: |
陳盈妤 Chen, Ying-Yu |
---|---|
論文名稱: |
在子演化樹上做字串搜尋的索引框架 Index Framework for the Subphylogeny Pattern Searching Problem |
指導教授: |
韓永楷
Hon, Wing-Kai |
口試委員: |
盧錦隆
Lu, Chin-Lung 李哲榮 Lee, Che-Rung |
學位類別: |
碩士 Master |
系所名稱: |
|
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 23 |
中文關鍵詞: | 子演化樹 、字串搜尋 、DNA相似度 、索引結構 |
外文關鍵詞: | Subphylogeny, Pattern Matching, DNA Similarity, Indexing |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文研究如何在任意子演化樹上做有效率的字串搜索。我們利用在演化樹上相近的物種在DNA字串上較相似的特點,可以以此減少相同字串的儲存空間,再輔以Suffix Tree、Wavelet Tree及Heavy Path Composition三種資料結構,設計出一個空間與最少資料量成正比的索引結構,並以此達到有效率字串搜索的目標。
This paper studies how to perform efficient subphylogeny pattern matching query on a phylogeny. By exploiting the fact that the close species in the phylogeny share many similarities in their DNA sequences, so that we can reduce the space to store these sequences, and applying existing data structures like suffix array, wavelet tree, and heavy path decomposition, we design an indexing structure that takes asymptotically minimal space, while supporting the subphylogeny pattern matching query, efficiently, as desired.
[1] R. J. Britten. Divergence between Samples of Chimpanzee and
Human DNA Sequences is 5%, Counting Indels. Proc. National
Academy of Science, 99(21):13633-13635, 2002.
[2] R. Cole, L.-A. Gottlieb, and M. Lewenstein. Dictionary Matching
and Indexing with Errors and Don't Cares. In Proc. of Symposium
on Theory of Computing (STOC), pages 91-100, 2004.
[3] P. Ferragina and G. Manzini. Indexing Compressed Text. Journal
of the ACM (JACM), 52(4):552-581, 2005.
[4] R. Grossi, A. Gupta, and J. S. Vitter. High-order Entropycompressed
Text Indexes. In Proc. of ACM-SIAM Symposium
on Discrete Algorithms (SODA), pages 841-850, 2003.
[5] M.-C. King and A. C. Wilson. Evolution at Two Levels in Humans
and Chimpanzees. Science, 188(4184):107-116, 1975.
[6] U. Manber and G. Myers. Sux Arrays: A New Method for Online
String Searches. siam Journal on Computing, 22(5):935-948, 1993.
[7] E. M. McCreight. A Space-Economical Sux Tree Construction
Algorithm. Journal of the ACM (JACM), 23(2):262-272, 1976.
[8] G. Navarro. Wavelet Trees for All. Journal of Discrete Algorithms(JDA), 25:2-20, 2014.
[9] R. Raman, V. Raman, and S. R. Satti. Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees, Prex Sums
and Multisets. ACM Transactions on Algorithms (TALG), 3(4),
article 43, 2007.
[10] P. Weiner: Linear Pattern Matching Algorithms. In Proc. of
Symposium on Switching and Automata Theory, pages 1-11,
1973.
[11] Wikipeidia entry for Metagenomics. Link at:
https://en.wikipedia.org/wiki/Metagenomics