簡易檢索 / 詳目顯示

研究生: 吳秉寰
Wu, Bing-Huan
論文名稱: 字串間相鄰交換距離之研究
A Study on the Adjacent Swap Distance Between Strings
指導教授: 王炳豐
Wang, Biing-Feng
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 37
中文關鍵詞: 字串演算法交換距離交換排序問題Kendall Tau 距離
外文關鍵詞: strings, algorithms, swap distances, sorting by swaps, inversions, Kendall Tau distances
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Estimating the similarity between two strings is an important and practical problem in many fields, such as computational biology, sortedness measuring, rank aggregation and music theory. Given a string α = α1 α2 … αn, an adjacent swap σ(i, i计1) on α is an operation that transforms α to into another string α' = α1 α2 … αi-1 αi+1 αi αi+2 … αn. Given two strings of size n over an alphabet Σ, the swap distance problem is to find the minimum number of adjacent swaps needed to transform one given string into the other. The focus of this thesis is the swap distance problem. Assume that |Σ| <= n. Chitturi et al. showed that the swap distance problem can be solved by using an algorithm for counting the inversions of a permutation. Dietz had a data structure that can be used to do the counting in O(n lg n / lg lg n) time. Therefore, the swap distance problem can be solved in the same time. For small alphabet Σ, Chitturi et al. proposed an O(n|Σ|)-time and O(n|Σ|)-space algorithm. In this thesis, we give an improved algorithm that requires O(n|Σ|) time and O(n) space. Our algorithm is more space-efficient than Chitturi et al.'s. The sorting by swaps problem is to find the minimum number of adjacent swaps needed to sort a given string, which is a special case of the swap distance problem. We also study the sorting by swaps problem and propose an O(n + n lg |Σ| / lg lg n)-time and O(n)-space algorithm. The proposed algorithm is more efficient than directly applying the previous two best solutions of the swap distance problem to this special case. Especially, when |Σ| = O((lg n)^c) for some constant c, our algorithm runs in linear time and space. It is easy to extend our algorithms to solve the signed version of corresponding problems.


    在許多領域中,估計兩字串之間的相似程度是個非常重要且實用的問題。例如:計算生物學、排序程度測量、排名聚合演算法及音樂理論。本論文研究其中一種測量方式,稱為「交換距離問題」(swap distance problems),定義如下。給定一個字串 α = α1 α2 … αn,相鄰交換 σ(i, i + 1) (adjacent swap) 會將 αi 和 αi+1的順序交換,也就是將 α 變成另一個字串 α' = α1 α2 … αi-1 αi+1 αi αi+2 … αn。交換距離問題的目標是計算至少需要幾個相鄰交換,始能將一給定的字串轉變成另一個給定的字串。假設字母集的大小 |Σ| <= n,Chitturi 等人發現此問題可以藉由計算相對應排列 (permutation) 中的反轉 (inversion) 數目來解決。Dietz 提供一個資料結構能夠在 O(n lg n / lg lg n) 時間內計算出排列中的反轉數目,因此交換距離問題也能在同樣的時間內求解。在字母集很小的情況下,Chitturi 等人提出一個 O(n|Σ|) 時間及 O(n|Σ|) 空間的演算法。在本論文中,我們提出一個 O(n|Σ|) 時間及 O(n) 空間的改進演算法,此演算法使用較少的空間。「交換排序問題」 (sorting by swaps problems) 是交換距離問題的一種特例,其目標是計算至少需要幾個相鄰交換,始能排序一個給定的字串。對於交換排序問題,我們提出一個 O(n + n lg |Σ| / lg lg n) 時間及 O(n) 空間的演算法。在這個特例下,此演算法比直接套用交換距離問題演算法更有效率,特別當字母集的大小 |Σ| = O((lg n)^c) 時,此演算法只需要 O(n) 時間及空間。藉由簡單的修改,我們所提出的演算法也能解決在有號 (signed) 情況下對應的問題。

    Abstract I Table of Contents IV List of Figures V List of Tables VI Chapter 1. Introduction 1 Chapter 2. Preliminaries 9 2.1. Notation and definitions 9 2.2. Pairing Diagram 12 2.3 Optimal Pairing 14 Chapter 3. Improved Algorithm for the Swap Distance Problem 16 3.1. Swap Distance between Unsigned Strings 16 3.2. Swap Distance between Signed Strings 21 Chapter 4. Improved Algorithm for the Sorting by Swaps Problem 24 4.1. A Naïve Algorithm 24 4.2. The Improved Algorithm 27 Chapter 5. Concluding Remarks 30 References 32

    [1] M. Ajtai, T. S. Jayram, R. Kumar and D. Sivakumar, "Approximate counting of inversions in a data stream," in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing (SOTC), ACM, Montreal, Quebec, Canada, 2002, pp.370 - 379.
    [2] A. Amir, Y. Aumann, G. Benson, A. Levy, O. Lipsky, E. Porat, S. Skiena and U. Vishne, "Pattern matching with address errors: Rearrangement distances," Journal of Computer and System Sciences (JCSS), v.75, 2009, pp.359-370.
    [3] A. Andersson and O. Petersson, "Approximate indexed lists," Journal of Algorithms, v.29, 1998, pp.256-276.
    [4] Y. J. P. Ardila, R. Clifford and M. Mohamed, "Necklace Swap Problem for Rhythmic Similarity Measures," String Processing and Information Retrieval (SPIRE), Lecture Notes in Computer Science, v.3772, Springer-Verlag, 2005, pp. 234-245.
    [5] D. A. Bader, B. M. E. Moret and M. Yan, "A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study," Journal of Computational Biology, v.8, 2001, pp.483-491.
    [6] P. Berman and S. Hannenhalli, "Fast Sorting by Reversal," in Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Computer Science, v.1075, Springer-Verlag, 1996, pp. 168-185.
    [7] A. Bookstein, S. T. Klein and T. Raita, "Fuzzy Hamming Distance: A New Dissimilarity Measure," in Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Computer Science, v.2089, Springer-Verlag, 2001, pp. 86-97.
    [8] A. Bookstein, V. A. Kulyukin and T. Raita, "Generalized Hamming Distance," Information Retrieval, v.5, 2002, pp.353-375.
    [9] D. Bremner, T. M. Chan, E. D. Demaine, J. Erickson, F. Hurtado, J. Iacono, S. Langerman and P. Taslakian, "Necklaces, convolutions, and X + Y," in Proceedings of the 14th conference on Annual European Symposium (ESA), Lecture Notes in Computer Science, v.4168, Springer-Verlag, Zurich, Switzerland, 2006, pp. 160-171.
    [10] S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web search engine," Computer Networks and ISDN Systems, v.30, 1998, pp.107-117.
    [11] A. Caprara, "Sorting by Reversals is Difficult," in Proceedings of the first annual international conference on Computational molecular biology (RECOMB), Santa Fe, New Mexico, United States, 1997, pp.75-83.
    [12] B. Chitturi, H. Sudborough, W. Voit and X. Feng, "Adjacent Swaps on Strings," in Proceedings of the 14th annual international conference on Computing and Combinatorics (COCOON), Lecture Notes in Computer Science, v.5092, Springer-Verlag, Dalian, China, 2008, pp. 299-308.
    [13] D. A. Christie and R. W. Irving, "Sorting Strings by Reversals and by Transpositions," SIAM Journal on Discrete Mathematics, v.14, 2001, pp.193-206.
    [14] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, "Introduction to Algorithms," The MIT Press, 2001.
    [15] P. F. Dietz, "Optimal Algorithms for List Indexing and Subset Rank," in Proceedings of the Workshop on Algorithms and Data Structures (WADS), Lecture Notes in Computer Science, v.382, Springer-Verlag, 1989, pp. 39-46.
    [16] C. Dwork, R. Kumar, M. Naor and D. Sivakumar, "Rank aggregation methods for the Web," in Proceedings of the 10th international conference on World Wide Web (WWW10), Hong Kong, 2001.
    [17] V. Estivill-Castro and D. Wood, "A survey of adaptive sorting algorithms," ACM Computing Surveys (SCUR), v.24, 1992, pp.441-476.
    [18] A. Gupta and F. X. Zane, "Counting inversions in lists," in Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms (SODA), Society for Industrial and Applied Mathematics, Baltimore, Maryland, 2003, pp.253 - 254.
    [19] S. Hannenhalli and P. A. Pevzner, "Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals," Journal of the ACM (JACM), v.46, 1999, pp.1-27.
    [20] M. Jiang, "A Linear-Time Algorithm for Hamming Distance with Shifts," Theory of Computing Systems (TCS), v.44, 2009, pp.349-355.
    [21] H. Kaplan, R. Shamir and R. E. Tarjan, "A Faster and Simpler Algorithm for Sorting Signed Permutations by Reversals," SIAM Journal on Computing, v.29, 2000, pp.880-892.
    [22] H. Kaplan and E. Verbin, "Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals," in Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Computer Science, v.2676, Springer-Verlag, 2003, pp. 1018.
    [23] M. G. Kendall, "A New Measure of Rank Correlation," Biometrika, v.30, 1938, pp.81-93.
    [24] M. G. Kendall, "Rank Correlation Methods," Charles Griffin & Company Limited, 1948.
    [25] D. E. Knuth, "The Art of Computer Programming Volumes 3: Sorting and Searching," Addison-Wesley Longman Publishing Co., Inc., 1998.
    [26] A. J. Radcliffe, A. D. Scott and E. L. Wilmer, "Reversals and Transpositions Over Finite Alphabets," SIAM Journal on Discrete Mathematics, v.19, 2005, pp.224-244.
    [27] E. Tannier, A. Bergeron and M.-F. Sagot, "Advances on Sorting by Reversals," Discrete Applied Mathematics, v.155, 2007, pp.881-888.
    [28] G. Toussaint, "Computational Geometric Aspects of Musical Rhythm," Massachussetts Institute of Technology, 2004, pp. 47-48.
    [29] G. T. Toussaint, "A Comparison of Rhythmic Similarity Measures," in Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR), Barcelona, Spain, 2004.
    [30] C. Xin, Z. Jie, F. Zheng, N. Peng, Z. Yang, L. Stefano and J. Tao, "Assignment of Orthologous Genes via Genome Rearrangement," IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), v.2, 2005, pp.302-315.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE