研究生: |
黃振臺 Chen-Tai Huang |
---|---|
論文名稱: |
節省空間之三序列比對平行演算法 Parallel Three-Sequence Alignment with Space-efficiency |
指導教授: |
唐傳義
Chuan-Yi Tang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 英文 |
論文頁數: | 29 |
中文關鍵詞: | 計算生物學 、序列比對 、動態規劃 、平行演算法 |
外文關鍵詞: | Computational biology, sequence alignment, dynamic programming, parallel algorithms |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
「序列比對」在分子生物學上是一個基本而重要的應用。很多比對序列的方法已經被提出,像是二序列比對(pair-wise sequence alignment)、三序列比對、限制條件序列比對(Constraint sequence alignment)、多序列比對(Multiple sequence alignment)…等,其中多序列比對近來常常被使用也因此顯得更重要。然而,多序列比對已被證明是一個NP-complete的問題,所以漸進式的多序列比對(progressive MSA)被發展出來。漸進式的多序列比對是用二序列比對做為基本步驟,但是這種方式產生出來的結果有時候品質並不是很好。要改善這個問題可以把三序列比對加入漸進式的多序列比對。三序列比對的時間複雜度為O(mnl),空間複雜度為O(mn),其中m, n 和 l 分別為三條序列的長度。由於三序列比對的複雜度很高,其應用性就會被限制。所以降低三序列比對的複雜度就顯得很重要,在這篇論文中,我們提出了一個平行的方法來降低三序列比對的複雜度以提升其應用性。我們用MPI+C語言來實作三序列比對,並且在平行環境上進行測試。我們所提出的平行方法把三序列比對的時間和空間複雜度降到O(mnl/p)以及O(mn/p),其中p代表處理器的數量。在此篇論文我們也分析了我們的平行方法同時也提出實驗數據。從實驗數據可以發現我們的方法有不錯的效能並且能使三序列比對的應用性提高不少。
The sequence alignment is a fundamental problem in computational biology. Many sequence alignment methods have been proposed such as pair-wise sequence alignment, multiple sequence alignment (MSA), syntenic alignment, constraint multiple sequence alignment, and so on. Among these methods, pair-wise sequence alignment is most commonly used. Recently, MSA is more and more important which has been used to solve many molecular biological problems. The sum-of-pairs MSA problem has been proved to be a NP-complete problem. A progressive algorithm is used to solve MSA problem which is based on the pair-wise sequence alignment. However, the quality of the result of progressive MSA is not satisfied sometimes. This problem may be solved by using three-sequence alignment as a basic step instead of pair-wise sequence alignment. Three-sequence alignment can be solved sequentially in O(mnl) time complexity and O(mn) space complexity, where m, n and l are the lengths of the sequences to be aligned. The complexities of three-sequence alignment limit its applicability. Hence, to reduce the complexities becomes an important issue. In this paper, we propose a parallel algorithm to solve this problem. Our algorithm requires O(mn/p) space complexity and O(mnl/p) time complexity, where p is number of processors. Both experimental results and theoretical analysis are presented. The experimental results show that our algorithm is applicable and achieves a good speed up.
[1] S. Aluru, N. Futamura, and K. Mehrotra, “Parallel biological sequence comparison using prefix computations,” J. Parallel and Distributed Computing, vol. 63, issue 3, pp. 264 – 272, 2003
[2] P. Bonizzoni and G. D. Vedova, “The complexity of multiple sequence alignment with SP-score that is a metric,” Theoretical Computer Science, vol. 259, issue1–2, pp. 63–79, 2001.
[3] H. Carrillo and D. Lipman, The multiple sequence alignment problem in biology, SIAM Journal on Applied Mathematics, 48 (1988) 1073-1082.
[4] S.C. Chan, A.K.C. Wong and D.K.Y. Chiu, A survey of multiple sequence comparison methods. Bulletin of Mathematical Biology, 54 (1992) 563-598
[5] D.Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridege University Press, NY.,1997
[6] O. Gotoh, “An improved algorithm for matching biological sequences,” J. Molecular Biology, vol. 162, pp. 705-708, 1982
[7] D.S. Hirschberg, A linear space algorithm for computing maximal common subsequences, Communications of the ACM, 18(6) (1975). 341-343.
[8] X. Huang, “A Space-Efficient Parallel Sequence Comparison Algorithm for a Message-Passing Multiprocessors,” Int’l j. Parallel Programming, vol. 18, no. 3, pp. 223-239, 1989.
[9] X. Huang, “A Space-Efficient Algorithm for Local Similarities,” Computer Applications in the Biosciences, vol. 6, no. 4, pp.373-381, 1990.
[10] X. Huang, “Alignment of Three Sequences in Quadratic Space,” ACM SIGAPP Applied Computing, vol. 1, issue 2, pp. 7-11, 1993
[11] E.W. Mayers and W. Miller, “Optimal Alignments in Linear Space,” Computer Applications in the Biosciences, vol. 4, no. 1, pp. 11-17, 1988
[12] H.B. Nicholas, A.J. Ropelewski and D.W. Deerfield, “Strategies for multiple sequence alignment,” Biotechniques, vol. 32, pp. 592-603, 2002.
[13] C. Notredame, Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics, vol. 3, pp. 131-144, 2002.
[14] S. Rajko and S. Aluru, “Space and Time Optimal Parallel Sequence Alignments,” IEEE Transactions on Parallel and Distributed Systems, vol. 15, issue 12, pp. 1070-1081, 2004.
[15] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology. PWS Publishing Company, 1997.
[16] L. Wang and T. Jiang, “On the complexity of multiple sequence alignment,” Journal of Computational Biology, vol. 1, no. 4, pp. 337–348, 1994.