研究生: |
張家銘 Jia-Ming Chang |
---|---|
論文名稱: |
緊湊集合關係及其在演化樹和多重序列排比之評估的應用 Compact Set Relation and Its Application in the Evaluation of the Evolution Tree and Multiple Sequence Alignment |
指導教授: |
唐傳義
Chuan Yi Tang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2002 |
畢業學年度: | 90 |
語文別: | 中文 |
中文關鍵詞: | Compact Set 、Multiple Sequence Alignment 、Evolution Tree |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現有評估演化樹(Evolution Tree)或多重序列比對(Multiple Sequence Alignemnt)的理論,
都過於強調數學上的絕對最佳解,例如:兩兩距離和(Sum-of-Pair)、樹型大小(Tree Size),等等。
本篇論文中,我們將從相對遠近關係是否保持的角度,提出一個新的評估標準,即以緊湊集合定義的鄰近關係。
就一個好的演化樹或多重序列比對而言,任三個物種在演化樹上或比對後的遠近關係,應與建樹前或比對前的遠近關係一致;
然而有些物種間的遠近關係或許不是那麼重要,因此我們利用緊湊集合的特性
(即集合內任意兩元素之間最長的距離,仍小於該集合中任一元素與集合外元素之間最短的距離),選出較為重要的遠近關係,
並且利用這些關係是否獲得保持來評估演化樹或多重序列比對的優劣。
本論文最後進行一系列的實驗,利用在緊湊集合所定義的鄰近關係,評估現有一些知名的建立演化樹與多重序列比對的程式。
此外本論文也提出一個新的多重序列比對演算法--YAMA-MST。實驗數據證明,YAMA-MST比起現有MSA的程式,
更能符合評估演化樹的需要。
[1] V. Bafna, E.L. Lawler, and P.A. Pevzner. Approximation algorithms for
multiple sequence alignment. Theoretical Computer Science, 182:233–244,
1997.
[2] W. Bains. Multan: a program to align multiple dna sequences. Mucleic Acids
Research, 14:159–177, 1986.
[3] G.J. Barton and M.J.E. Sternberg. A stragtegy for the rapid multiple align-ment
of protein. Journal of Molecular Biology, 19:327–337, 1987.
[4] P. Bonizzoni and G. Della Vedova. The complexity of multiple sequence
alignment with SP-score that is a metric. Theoretical Computer Science,
259:63–79, 2001.
[5] J. Carlos and J. Meidanis. Introdution to computational molecular biology.
PWS publishing company, 1997.
[6] H. Carrillo and D. Lipman. The multiple sequence alignment problem in
biology. SIAM Journal on Applied Mathematics, 48(5):1073–1082, 1988.
[7] S.C. Chan, A.K.C. Wong, and D.K.Y. Chiu. A survey of multiple sequence
comparison methods. Bulletin of Mathematical Biology, 48:1073–1082, 1992.
[8] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to
Algorithms. MIT Press, Cambridge, MA, second edition, 2001.
[9] F. Corpet. Multiple sequence alignment with hierarchical clustering. Nucleic
Acids Research, 16:10881–10890, 1988.
25
[10] E. Dekel, J. Hu, and W. Ouyang. An optimal algorithm for finding compact
sets. Information Processing Letters, 44:285–289, 1992.
[11] D.J.Lipman and S.F.Altschul. A tool for multiple sequences alignment. In
Proc.Nat.Acad. Sci. U.S.A., 1989.
[12] D.Snakoff. Simultaneous solution of rna folding, alignment and protosequence
prolblems. SIAM J. Appl. Math., 1985.
[13] J.T. Fang. Constructing and application of evolutionary tree’s measurement
model. Master’s thesis, Department of Computer Science National Tsing Hua
University, 2000.
[14] D.F. Feng and R.F. Doolittle. Progressive sequence alignment as a prerequisite
to correct phylogenetic trees. Journal of Molecular Evolution, 25:351–360,
1987.
[15] H.N. Gabow and R.E. Tarjan. A linear-time algoritm for a special case of
disjoint set union. J. Comput. System Sci., 30:209–221, 1985.
[16] G.H. Gonnet, C. Korostensky, and S. Benner. Evaluation measures of multiple
sequence alinments. Journal of Computational Biology, 7:261–276, 2000.
[17] D. Gusfield. Efficient methods for multiple sequence alignment with guaran-teed
error bounds. Bulletin of Mathematical Biology, 55:141–154, 1993.
[18] D. Harel and R.E. Tarjan. Fast algorithm for finding nearest common ances-tor.
SIAM. J. Comput., 13:285–289, 1984.
[19] S. Henikoff and J.G. Henikoff. Amino acid substitution matrices from protein
blocks. In Proceedings of the National Academy of Sciences USA, volume 8,
pages 1154–1171, 1998.
[20] D.G. Higgins and P. Sharpe. CLUSTAL: a package for performing multiple
sequence alignment on a microcomputer. Gene, 73:237–244, 1988.
[21] D.G. Higgins, J.D. Thompson, and T.J. Gibson. Using clustal for multiple
sequence alignments. Methods in Enzymology, 266:383–402, 1996.
26
[22] T. Jiang, P. Kearney, and M. Li. A polynomial time approximation scheme
for inferring evolutionary trees from quartet topologies and its application.
SIAM. J. Comput., 30:1942–1961, 2001.
[23] S.K. Kim. A note on finding compact sets in graphs represented by an adja-cency
list. Information Processing Letters, 57:335–338, 1996.
[24] M. Li, B. Ma, and L. Wang. Near optimal multiple alignment within a band
in polynomial time. In STOC, pages 425–434, Portland, Oregon, 2000.
[25] W. H. Li. Molecular Evolution. Sinauer Associates, Inc., 1997.
[26] S. Needleman and C. Wunsch. A general method applicable to the search for
similarities in the amino acid sequence of two proteins. Journal of Molecular
Evolution, 48:443–453, 1970.
[27] P.A. Pevzner. Multiple alignment, communication cost, and graph matching.
SIAM Journal on Applied Mathematics, 52:1763–1779, 1992.
[28] N. Saitou and M. Nei. The neighbor-joining method: a new method for
reconstructing phylogenetic trees. Molecular Biology and Evolution, 4:406–
425, 1987.
[29] B. Schieber and U. Vishkin. On finding lowest common ancestor:simplification
and parallelization. SIAM J. Comput., 17:1253–1262, 1988.
[30] S.F.Altschul and D.J.Lipman. Trees,star and mutiple biological sequence alig-ment.
SIAM J. Appl. Math., 1989.
[31] D.D. Sleator and R.E. Tarjan. A data structure for dynamic trees. J. Comput.
System Sci., 13:362–391, 1983.
[32] P.H.A. Sneath and R.R. Sokal. Numerical Taxonomy, pages 230–234. Free-man,
San Francisco, CA, 1973.
[33] R.E. Tarjan. Data Structure and Network Algoritms. SIAM, Philadelphia,
PA, 1983.
27
[34] W.R. Taylor. Multiple sequence alignmnet by a pairwise alignment. Computer
Applications in Biosciences, 3:81–87, 1987.
[35] J.D. Thompson, D.G. Higgins, and T.J. Gibson. CLUSTAL W: improving
the sensitivity of progressive multiple sequence alignment through sequence
weighting, position specific gap penalties, and weight matrix choice. Nucleic
Acids Research, 22:4673–4680, 1994.
[36] J.D. Thompson, F. Plewinak, and O. Poch. Balibase: a benchmark alignment
database for the evolution of multiple sequence alignments. Bioinformatics,
15:87–88, 1999.
[37] J.D. Thompson, F. Plewinak, and O. Poch. A comprehensive comparison
of multiple sequence alignment programs. Nucleic Acids Res., 27:2682–2690,
1999.
[38] T.K. Vintsyuk. Speech discrimination by dynamic programming. Comput.,
4:52–57, 1967.
[39] L. Wang and T. Jiang. On the complexity of multiple sequence alignment.
Journal of Computational Biology, 1(4):337–348, 1994.
[40] B.Y. Wu. Constructing the maximum consensus tree from rooted. In the
proceedings of National Computer Symposium, 2001.
[41] D. Zivkovic. A fast algorithm for finding the compact sets. Information
Processing Letters, 38:339–342, 1991.