簡易檢索 / 詳目顯示

研究生: 林世杰
論文名稱: 不同規模下多重解析度之序列排比
Multiple resolution and scale sequence alignment
指導教授: 唐傳義
CY Tang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2001
畢業學年度: 89
語文別: 中文
論文頁數: 34
中文關鍵詞: 序列排比longest common subsequencesuffix treehash table
外文關鍵詞: alignemt, longest common subsequence, hash table, suffix tree
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於傳統的序列比對方式都是利用動態程式設計的方法。在進行序列比對時,時間及空間複雜度都不可避免的隨著序列長度呈指數成長。雖然後來有提出改進的方法,節省了空間,解決記憶體的問題,但換來的是需要更多的時間。如此一來,當面對超長序列比對時,例如百萬長度,一般的電腦根本無法負擔這麼龐大的計算量及硬體需求。所以我們依據1999年,L. Delcher所提出whole genomes排比方式 [1] 加以改變。利用演算法中的LCS ( longest Common subsequence ) 的概念來取代原來suffix tree [5] 所提供的子字串比對功能,且能藉由生物學者的專家經驗,允許一定誤差範圍內的相似關係。進一步提出合併及改進LCS的方法及性質。使得實際上,在一般個人電腦上進行百萬長度序列比對時,所花費的時間尚在可接受的範圍之內。而依據我們實驗所得的結果,在進行一百二十多萬對一百萬長度的序列排比時,取兩千的範圍,滑動視窗取16,相似度15以上的必取,14的還可接受的條件之下,在一般的個人電腦上,取範圍2000,需25分鐘即可得到結果。所以說,雖然採用LCS設計的序列排比方式,在時間複雜度上比利用suffix tree來做的方式高,但還在可接受的範圍內。且獲得更多的彈性。故在實際應用上,我們所提出的方法不但能兼顧彈性,且還不失為一個有效率的方法。


    We design a new sequence alignment system that base on L. Delcher’s [1] idea to aligning very long sequences. Because the traditional sequence alignment methods [12] spend too much time and memory space when align very long sequences. We use an idea, longest common subsequence (LCS), instead of suffix tree [5], and then we can solve sequence alignment problems more flexible. But it spends more time than using suffix tree. We improve LCS by using hash table, and it is a kind of trade off method between time and space. We also find some properties of LCS. So we can combine two shorter LCS’s results to get a longer LCS’s approximate result. In our experiment, we show our method is an effective method in practice.

    中文摘要 i 英文摘要 ii 目錄 iii 圖表目錄 iv 第1章 前言 1 第2章 相關背景 5 2.1 LCS (Longest Common Subsequence) 5 2.2 改進LCS的方法 7 2.3 定理證明 11 2.3.1 定理一 11 2.3.2 定理二 14 2.3.3 定理三 15 第3章 我們的方法 18 3.1 演算法 18 3.1.1 前置處理 19 3.1.2 核心部份 19 3.2 演算法分析 23 第四章 實驗結果 24 第5章 討論及結論 28 第6章 未來工作 30 6.1 在演算法方面 30 6.2 在生物學方面 31 第7章 參考文獻 32

    1. L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White and Steven L. Salzberg, Alignment of whole genomes, Nucleic Acids Research, 1999, Vol. 27, No. 11 2369-2376
    2. Joao Setubal and Jaoa Meidanis (1997) Introduction to computation molecular biology. University of Campinas, Brazil
    3. Pavel A. Pevzner (1999) Computational Molecular Biology: An Algorithmic Approach. The MIT Press
    4. Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison (1998) Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge university press
    5. Dan Gusfield (1997) Algorithms on Strings, Trees, and Sequences: computer science and computational biology. Cambridge university press
    6. Ellis Horowitz and Sartaj Sanhni Fundamentals of data structures in pascal fourth edition
    7. Claus Rick, Simple and fast linear space computation of longest common subsequences, Information Processing Letters 75 (6) (2000) pp. 275-281
    8. Hunt, J. W. and Szymanski, T. G. 1977. A fast algorithm for computing longest common subsequences. Communications of the ACM 20, 350-353.
    9. Apostolico, A. and C. Guerra, The longest common subsequence problem revisited, Algorithmica, Vol.2, 1987, pp.315-336.
    10. Chin, F. Y. L. and C. K. Poon, A fast algorithm for computing longest common subsequences of small alphabet size, J. of Info. Proc., Vol.13, No.4, 1990, pp.463-469
    11. J. Deken. Some limit results for longest common subsequences. Discrete Mathematics, 26:17-31, 1979
    12. S. B. Needleman and C. D. Wunsch (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48, 443-453
    13. T. F. Smith and M. S Waterman (1981) Identification of common molecular sequences. J. Mol. Biol. 197, 723-728.
    14. S.F. Altschul (1989) Gap costs for multiple sequence alignment. J. theor. Biol. 138,297-309.
    15. OSAMU GOTOH (1999) Multiple sequence alignment: algorithms and applications. Adv. Biophys., Vol. 36, pp. 159-206
    16. K. –M. Chao (1998) On computing all suboptimal alignments. Information Sciences, Volume: 105, Issue: 1-4, March,, pp. 189-207
    17. K. –M. Chao and W. Miller (1995) Linear-space algorithms that build local alignments from fragments. Algorithmica
    18. D.-F. Feng and R. F. Doolittle (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351-360.
    19. X. Huang and W. Miller (1991) A time-efficient, linear-space local similarity algorithm. Adv. in Apple. Math. 12, 337-357.Lsurent Marsan, Marie-France Sagot, RECOMB 2000
    20. Lsurent Marsan, Marie-France Sagot, Extracting structured motifs using a suffix tree algorithms and application to promoter consensus identification, RECOMB 2000
    21. Randall F. Smith and Temple F. Smith, Automatic generation of primary sequence patterns from sets of related protein sequences, Proc. Natl. Acad. Sci. USA, Vol.87, pp.118-122, January 1990
    22. Fang-Cheng Leu, Yin-Te Tsai, Chuan Yi Tang, An efficient external sorting algorithm, Information Processing Letters 75(2000) 159-163
    23. Z. Galil and R. Giancarlo. Speeding up dynamic programming with applications to molecular biology. Theoretical Computer Science, 64:107-118, 1989
    24. R. Grossi and F. Luccio. Simple and effient string matching with k mismatches. Information Processing Letters, 33:113-120, 1989
    25. J. Kececioglu and D. Sankoff. Exact and approximation algorithms for the inversion distance between two permutations. Algorithmica, 13:180-210, 1995
    26. M. Schoniger and M. Waterman. A local algorithm for DNA sequence alignment with inversions. Bulletin of Mathematical Biology, 54:521-536, 1992
    27. G. Landau and J. Schmidt. An algorithm for approximate tandem repeats. In proc. 4th Annual Symp. On Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 648, pages 120-133. 1993
    28. V.I. Levenshtein. Binary codes capable of corresting deletions, insertions and reversals. Soviet Physics Doklady, 6:707-710, 1966.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE