研究生: |
楊易霖 Yang, Yilin |
---|---|
論文名稱: |
字典匹配問題的變形 Variants of Dictionary Matching |
指導教授: |
韓永楷
Wing-Kai Hon |
口試委員: |
盧錦隆
廖崇碩 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 50 |
中文關鍵詞: | 字典匹配問題 |
外文關鍵詞: | Dictionary matching |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
這篇碩論主要研究的方向為字典匹配問題,我總共研究了三個不同的變形。
第一個變形是所有字典裡的字串都可以有一個gap,這個問題Amir他們在2014年提出了一個解法,而我換個方向改變了他們儲存資料的方式,改進了他們的做法,使得在同樣限制下我們的空間與時間複雜度都更小。而且我們的做法還可以將這個問題擴展到更加一般性的問題。
第二個變形跟第一個問題很像,這問題字典裡的所有字串都可以消失其中一段,但我們不知道是哪一段消失了,然後希望可以讀取一個文本text,找到所有可能的字串匹配的位置。這個問題是一個全新的問題,目前還沒有人討論過這個問題。我們透過在failure tree上做樹鏈剖分,提出了一個完整可行的解法。
第三個問題跟前兩個問題都不一樣,字典裡的字串跟一般的字串是一樣的,沒有gap也不能消失其中一段,但我們把字母做一對一的映射後,也可以視作是同樣的字串來做匹配,比如說aab跟xxy可以視作是一樣的兩個字串。我們提出了一個新的方式,修改了前人的做法,降低了以前算法的空間複雜度。
Dictionary matching is a well-studied problem in computer sci- ence, which have found numerous applications, such as computer virus detection and bioinformatics analysis. In this thesis, we study three variants of the dictionary matching problem. The first variant, called dictionary matching with gapped patterns, is recently proposed by Amir et al. in which a pattern may be matched with a substring of a query text T with a gap of bounded length present in the pattern. We first give an alter- native linear-space solution to Amir et al.’s problem, where gap lengths of all patterns have the same lower bound α and upper bound β. The query time on any query text T is bounded by O((β − α + 1)|T | log d + occ), where d denotes the number of patterns, and occ denotes the size of the output. After that, we show that the framework can be generalized to handle the case where gaps may have different bounds, thereby answering one of the open problems raised by Amir et al. The second variant,
called dictionary matching with one missing substring, is a new problem in which a gap of bounded length may be present in the text substring when it is being matched. We show that this prob- lem can be solved by using a similar framework. Furthermore, by applying a novel indexing technique on the failure tree, we obtain a space-time tradeoff result, which will be suitable when the dictionary contains only short patterns, or when index space is a critical concern. The third variant is called parameterized dictionary matching. Idury et al. proposed a linear-space so- lution for this problem using Baker’s encoding method. Here, we come up with a new encoding method, which allows us to achieve a better space complexity for the index, with only slight slowdown in the query time.
[1] A. Aho and M. Corasick: Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM (CACM), 18(6):333–340, 1975.
[2] A. Amir, D. Keselman, G. M. Landau, M. Lewenstein, N. Lewenstein, and M. Rodeh: Text Indexing and Dic- tionary Matching With One Error. Journal of Algorithms, 37(2):309–325, 2000.
[3] A. Amir, A. Levy, E. Porat, and B. R. Shalom: Dictionary Matching with One Gap. In Proc. of Symposium on Com- binatorial Pattern Matching (CPM), pages 11–20, 2014.
[4] B. S. Baker: Parameterized Pattern Matching: Algorithms and Applications. Journal of Computer and System Sci-ences, 52(1):28–42, 1996.
[5] D. Belazzougui: Succinct Dictionary Matching with No Slowdown. In Proc. of Symposium on Combinatorial Pat- tern Matching (CPM), pages 88–100, 2010.
[6] T. M. Chan, K. G. Larsen, and M. Pa ̆tra ̧scu: Orthogonal Range Searching on the Ram, Revisited. In Proc. of Sym- posium on Computational Geometry (SoCG), pages 1–10, 2011.
[7] B. Chazelle: Filtering Search: A New Approach to Query Answering. SIAM Journal on Computing (SICOMP), 15(3):703–724, 1986.
[8] R. Cole, L.-A. Gottlieb, and M. Lewenstein: Dictionary Matching and Indexing with Errors and Don’t Cares. In Proc. of Symposium on Theory of Computing (STOC), pages 91–100, 2004.
[9] T. Haapasalo, P. Silvasti, S. Sippu, and E. Soisalon- Soininen: Online Dictionary Matching with Variable-Length Gaps. In Proc. of Symposium on Experimental Al- gorithms (SEA), pages 76–87, 2011.
[10] W. K. Hon, T. H. Ku, T. W. Lam, R. Shah, S. L. Tam, S. V. Thankachan, and J. S. Vitter: Compressing Dictio- nary Matching Index via Sparsification Technique. Algorith- mica, 2014.
[11] W. K. Hon, T. H. Ku, R. Shah, S. V. Thankachan, and J. S. Vitter: Faster Compressed Dictionary Matching. The- oretical Computer Science (TCS), 475:113–119, 2013.
[12] R. M. Idury and A. A. Schaffer: Multiple Matching of Parameterized Patterns. Theoretical Computer Science (TCS), 154(2):203–224, 1996
[13] G. Kucherov and M. Rusinowitch: Matching a Set of Strings with Variable Length Don’t Cares. Theoretical Computer Science (TCS), 178:129–154, 1997.
[14] E. M. McCreight: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM (JACM),23(2):262–272, 1976.
[15] S. Rahul: Improved Bounds for Orthogonal Point Enclo- sure Query and Point Location in Orthogonal Subdivi- sions in IR3. In Proc. of Symposium on Discrete Algorithms (SODA), pages 200–211, 2015.
[16] P. Weiner: Linear Pattern Matching Algorithms. In Proc. of Symposium on Switching and Automata Theory, pages 1–11, 1973.
[17] M. Zhang, Y. Zhang, and L. Hu: A Faster Algorithm for Matching a Set of Patterns with Variable Length Don’t Cares. Information Processing Letters (IPL), 110(6):216– 220, 2010.