研究生: |
林政耀 Lin,Cheng Yao |
---|---|
論文名稱: |
一個計算非重疊反轉與轉位距離的有效率演算法 An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance |
指導教授: |
盧錦隆
Lu, Chin Lung |
口試委員: |
李家同
Lee, Chia Tung 唐傳義 Tang, Chuan Yi 邱顯泰 Chiu, Hsien Tai |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 20 |
中文關鍵詞: | 演算法 、計算生物學 、反轉 、轉位 、突變距離 |
外文關鍵詞: | Algorithm, computational biology, inversion, transposition, mutation distance |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
從演化的觀點來看,DNA序列可能會藉由大規模突變(又稱為重組)演化,例如反轉 (用反向互補序列取代DNA序列的某一片斷)與轉位(把DNA序列的某一片段從原來的位置移到另一個位置,或等同於把兩個相鄰但不重疊的DNA序列片斷做互換) 。給予兩個長度皆為n的字串,他們之間的非重疊反轉與轉位距離(又稱為突變距離)被定義為使用最少次數的非重疊反轉與轉位把一個字串轉換成另一個字串。在本論文的研究中,我們提出一個O(n^3)時間與O(n^2)空間的演算法來計算出兩個輸入字串之間的突變距離。事實上,我們的演算法可以被用來設計出一個O(nm^3)時間與O(m^2)空間的演算法去解決非重疊反轉與轉位距離的近似字串比對問題,此問題的目標是要去找出在一個給定的內文中所有子字串,使得這些子字串與另一個給定的樣板之間的突變距離都小於或等於一個給定的門檻值k,其中n為內文長度, m為樣板長度。
From evolutionary point of view, DNA sequences may evolve by large-scale mutations (also called rearrangements), such as inversions (i.e., replacing a fragment of DNA sequence by its reverse complement) and transpositions (i.e., moving a fragment of DNA sequence from one location to another or, equivalently, exchanging two adjacent and non-overlapping fragments on DNA sequence). Given two strings of the same length n, the non-overlapping inversion and transposition distance (also called mutation distance) between them is defined as the minimum number of non-overlapping inversion and transposition operations used to transform one string into the other. In this study, we present an O(n^3) time and O(n^2) space algorithm to compute the mutation distance of two input strings. In fact, our algorithm can be used to design an algorithm that can solve the approximate string matching problem under non-overlapping inversion and transposition distance, which is to find all substrings of a given text whose mutation distances from a given pattern are less than or equal to a given threshold k, in O(nm^3) time and O(m^2) space, where n is the length of the text and m is the length of the pattern.
[1]A. Amir, T. Hartman, O. Kapah, A. Levy, E. Porat: On the cost of interchange rear-rangement in strings. SIAM Journal on Computing, 39 (2009) 1444-1461.
[2]D. Cantone, S. Cristofaro, S. Faro: Efficient string-matching allowing for non-overlapping inversions. Theoretical Computer Science, 483 (2013) 85-95
[3]D. Cantone, S. Faro, E. Giaquinta: Approximate string matching allowing for inversions and translocations. Proceedings of the Prague Stringology Conference 2010, pp. 37-51.
[4]D.-J. Cho, Y.-S. Han, H. Kim: Alignment with non-overlapping inversions on two strings. In: S.P. Pal, K. Sadakane (eds.) WALCOM 2014, LNCS , vol. 8344, Springer, 2014, pp. 261-272.
[5]D.-J. Cho, Y.-S. Han, H. Kim: Alignment with non-overlapping inversions and transloca-tions on two strings. Theoretical Computer Science, 575 (2015) 90-101
[6]S. Grabowski, S. Faro, E. Giaquinta: String matching with inversions and translocations in linear average time (most of the time). Information Processing Letters, 111 (2011) 516-520.
[7]S.-Y. Huang, C.-H. Yang, T.T. Ta, C.L. Lu: Approximate string matching problem under non-overlapping inversion distance. In: Workshop on Algorithms and Computation Theory in conjunction with 2014 International Computer Symposium, 2014, pp. 38-46.
[8]Y.-L. Huang, C.L. Lu: Sorting by reversals, generalized transpositions, and translocations using permutation groups. Journal of Computational Biology, 17 (2010) 685-705
[9]O. Kapah, G.M. Landau, A. Levy, N. Oz: Interchange rearrangement: the element-cost model. Theoretical Computer Science, 410 (2009) 4315-4326.
[10]F.T. Zohora, M.S. Rahman: Application of consensus string matching in the diagnosis of allelic heterogeneity. In: Basu, M., Pan, Y., Wang, J. (eds.) ISBRA 2014. LNBI, vol. 8344, Springer, 2014, pp. 163-175.