研究生: |
黃彥菱 Huang, Yen-Lin |
---|---|
論文名稱: |
Solving Genome Rearrangement Problems Using Permutation Groups 利用排列群解基因體重組問題 |
指導教授: |
唐傳義
Tang, Chuan-Yi 盧錦隆 Lu, Chin Lung |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 66 |
中文關鍵詞: | 基因體重組 、排列群 、排序 、代數 、反轉 、區塊互換 、融合 、分裂 、易位 |
外文關鍵詞: | genome rearrangement, permutation groups, sorting, algebra, reversal, block-interchange, fusion, fission, translocation |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
With the growing availability of complete genome sequences, genome rearrangement studies based on genome-wide analysis of gene orders play an important role in phylogenetic tree reconstruction. In contrast to traditional alignment approaches for detecting point mutations (e.g., substitutions, insertions and deletions of nucleotides/amino acids), genome rearrangements are based on comparison of gene orders to detect large-scale mutations, such as reversals, transpositions, block-interchanges (also called generalized transpositions), fusions, fissions and translocations. Given two gene orders of genomes with the same set of genes, the genome rearrangement problem aims to compute a minimum sequence of rearrangement operations required to transform one genome into the other. The genome rearrangement problem can also be viewed as a problem of sorting a permutation, if the given genomes are represented by permutations with one having positive, sorted order. In this thesis, by using permutation groups in algebra, we first present an O(n + δlogδ) time algorithm for solving the problem of sorting by block-interchanges, where n is the number of genes and is the minimum number of rearrangement operations required to sort a genome. We then present an O(δn) time algorithm for the problem of sorting by reversals and block-interchanges with a weight proportion 1:2. In addition, we further consider additional translocations (including fusions and fissions), which are weighted 1, when dealing with multi-chromosomal genomes and consequently propose the O(δn) time algorithms for the problem with linear and circular chromosomal genomes, respectively. Based on the algorithms mentioned above, we have finally implemented a web server that allows biologists to perform genome rearrangement analysis involving reversals, block-interchanges and translocations (including fusions and fissions), and also infer phylogenetic trees of genomes being considered based on their pairwise genome rearrangement distances. In this web server, we also provide biologists to perform the so-called jackknife analysis to evaluate statistical reliability of the constructed phylogenetic trees.
隨著愈來愈多完整的基因體序列被定序出來,分析基因體上基因次序的基因體重組研究在演化樹的建構上扮演著重要的角色。不同於偵測點突變(例如核苷酸或胺基酸的取代、插入及刪除)的傳統對齊方法,基因體重組利用基因次序的比較去偵側大規模的突變,像是反轉(reversals)、移位(transpositions)、區塊互換(block-interchanges)、融合(fusions)、分裂(fissions)和易位(translocations)。已知二個基因體上共有基因的基因次序,基因體重組問題的目的是要去計算出一個最少的基因體重組序列把其中一個基因體的基因次序轉換成另一個基因體的基因次序。如果其中一個已知基因體的基因次序被表示成一個已排序好的正整數序列的話,那麼基因體重組的問題便可以被視為一種排序(sorting)的問題。在本論文中,我們利用代數的排列群首先提出一個時間複雜度為Ο(n +δlogδ)的演算法來解決區塊互換的排序問題(sorting by block-interchanges),其中n是基因的個數,δ是把基因體給排序好所需的最少基因體重組個數。我們接著提出一個時間複雜度為Ο(δn )的演算法來解決反轉及區塊互換的排序問題(sorting by reversals and block-interchanges),其中反轉及區塊互換的權重比為1:2。除此之外,當處理多條染色體的基因體時,我們進一步地考慮額外的易位(包括融合及分裂),將其權重設為1,並提出時間複雜度皆為Ο(δn )的兩個演算法分別排序線狀與環狀多條染色體的基因次序。最後,我們把上述的演算法實作成一個軟體工具可讓生物學家們透過網際網路來使用,此軟體工具可允許生物學家們進行含有反轉、區塊互換與易位(包括融合及分裂)的基因體重組分析,以及根據分析出來的兩兩基因體之間的重組距離來推測出基因體的種族樹。在這個軟體工具中,我們也提供生物學家們進行所謂的刀切分析法(jackknife)可以用來評估所建構種族樹的統計可信度。
[1] Adam, Z. and Sankoff, D. 2008. The ABCs of MGR with DCJ. Evolutionary Bioinformatics,
4, 69–74.
[2] Alekseyev, M. A. 2008. Multi-break rearrangements and breakpoint re-uses: from circular
to linear genomes. Journal of Computational Biology, 15, 1117–1131.
[3] Alekseyev, M. A. and Pevzner, P. A. 2008. Multi-break rearrangements and chromosomal
evolution. Theoretical Computer Science, 395, 193–202.
[4] Bader, D. A., Moret, B. M., and Yan, M. 2001. A linear-time algorithm for computing
inversion distance between signed permutations with an experimental study. Journal of
Computational Biology, 8, 483–491.
[5] Bafna, V. and Pevzner, P. A. 1993. Genome rearrangements and sorting by reversals.
In Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science,
148V157.
[6] Bafna, V. and Pevzner, P. A. 1998. Sorting by transpositions. SIAM Journal on Discrete
Mathematics, 11, 221–240.
[7] Belda, E., Moya, A., and Silva, F. J. 2005. Genome rearrangement distances and gene
order phylogeny in γ-Proteobacteria. Molecular Biology Evolutionary, 22, 1456–1467.
[8] Bergeron, A. 2001. A very elementary presentation of the HanenhalliVPevzner theory.
Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM
2001), 106V117. Springer. [updated version appeared in (2005) Discrete Appl. Math., 146,
134V145.]
61
[9] Bergeron, A., Mixtacki, J., and Stoye, J. 2006. On sorting by translocations. Journal of
Computational Biology, 13, 567–578.
[10] Bergeron, A., Mixtacki, J., and Stoye, J. 2006. A unifying view of genome rearrangements.
In Bucher, P. and Moret, B. M. E., eds., Proceedings of the 6th International Workshop on
Algorithms in Bioinformatics (WABI 2006), Lecture Notes in Computer Science, volume
4175, 163–173. Springer.
[11] Bergeron, A., Mixtacki, J., and Stoye, J. 2009. A new linear time algorithm to compute
the genomic distance via the double cut and join distance. Theoretical Computer Science,
410, 5300–5316.
[12] Berman, P. and Hannenhalli, S. 1996. Fast sorting by reversals. In Proceedings of the
7th Annual Symposium on Combinatorial Pattern Matching (CPM1996), Lecture Notes in
Computer Science, 1075, 168–185. Springer.
[13] Berman, P., Hannenhalli, S. and Karpinski, M. 2002. 1.375-approximation algorithm for
sorting by reversals. In Proceedings of the 10th Annual European Symposium on Algorithms
(ESA2002), Lecture Notes in Computer Science, 2461, 200–210. Springer.
[14] Berman, P. and Karpinski, M. 1999. On some tighter inapproximability results (extended
abstract) In Proceedings of the Twenty-sixth International Colloquium on Automata,
Language and Programming (ICALP), Lecture Notes in Computer Science, 1999, 200–209.
Springer.
[15] Blanchette, M., Kunisawa, T., and Sankoff, D. 1996. Parametric genome rearrangement.
Gene, 172, GC11–GC17.
[16] Blanchette, M., Kunisawa, T., and Sankoff, D. 1999. Gene order breakpoint evidence in
animal mitochondrial phylogeny. Journal of Molecular Evolution, 49, 193–203.
[17] Bona, M. and Flynn, R. 2009. The average number of block interchanges needed to sort
a permutation and a recent result of Stanley. Information Processing Letters, 109, 927–931.
[18] Bourque, G. and Pevzner, P. A. 2002. Genome-scale evolution: reconstructing gene orders
in the ancestral species. Genome Research, 12, 26–36.
62
[19] Caprara, A. 1999. Sorting permutations by reversals and Eulerian cycle Decompositions.
SIAM Journal on Discrete Mathematics, 12, 91–110.
[20] Christie, D. A. 1996. Sorting by block-interchanges. Information Processing Letters, 60,
165–169.
[21] Cosner, M. E., Jansen, R. K., Moret, B. M. E., Raubeson, L. A., sanWang, L., Warnow,
T., and Wyman, S. 2000. An empirical comparison of phylogenetic methods on chloroplast
gene order data in Campanulaceae. In Sankoff, D. and Nadeau, J. H., (eds.), Comparative
Genomics, Kluwer Academic Publishers, London pp. 99–121.
[22] Dobzhansky, T. and Sturtevant, A. H. 1938. Inversions in the chromosomes of drosophila
pseudoobscure. Genetics, 23, 28–64.
[23] Elias, I. and Hartman, T. 2006. A 1.375-approximation algorithm for sorting by transpositions.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3, 369–379.
[24] Eriksen, N. 2002. (1+ε)-approximation of sorting by reversals and transpositions. Theoretical
Computer Science, 289, 517–529.
[25] Farris, J. S., Albert, V. A., K‥allersj‥o, M., Lipscomb, D., and Kluge, A. G. 1996. Parsimony
jackknifing outperforms neighbor-joining. Cladistics, 12, 99–124.
[26] Feij?ao, P. and Meidanis, J. 2009. SCJ: a variant of breakpoint distance for which sorting,
genome median and genome halving problems are easy. Lecture Notes in Bioinformatics,
5724, 85–96.
[27] Felsenstein, J. 1989. PHYLIP: phylogeny inference package (version 3.2). Cladistics, 5,
164–166.
[28] Feng, J. and Zhu, D. 2007. Faster algorithms for sorting by transpositions and sorting by
block interchanges. ACM Transactions on Algorithms, 3, 25.
[29] Fertin, G., Labarre, A., Rusu, I., Tannier, E., and Vialette, S., Combinatorics of Genome
Rearrangements, The MIT Press, 2009.
[30] Fraleigh, J. B. 2003. A First Course in Abstract Algebra. Addison-Wesley, 7th edition.
63
[31] Hannenhalli, S. 1996. Polynomial algorithm for computing translocation distance between
genomes. Discrete Applied Mathematics, 71, 137–151.
[32] Hannenhalli, S. and Pevzner, P. A. 1995. Transforming men into mice (polynomial algorithm
for genomic distance problem). In Proceedings of the 36th IEEE Symposium on
Foundations of Computer Science (FOCS 1995), 581–592. IEEE Computer Society.
[33] Hannenhalli, S. and Pevzner, P. A. 1999. Transforming cabbage into turnip: Polynomial
algorithm for sorting signed permutations by reversals. Journal of the ACM, 46, 1–27.
[34] Hartman, T. and Sharan, R. 2005. A 1.5-approximation algorithm for sorting by transpositions
and transreversals. Journal of Computer and System Sciences 70, 300–320.
[35] Huang, Y.-L. and Lu, C. L. 2010. Sorting by reversals, generalized transpositions and
translocations using permutation groups. Journal of Computational Biology, 17, 685–705.
[36] Jones, N. C. and Pevzner, P. A., An Introduction to Bioinformatics Algorithms, The
MIT Press, 2004.
[37] Kaplan, H., Shamir, R., and Tarjan, R. E. 1999. Faster and simpler algorithm for sorting
signed permutations by reversals. SIAM Journal on Computing, 29, 880–892.
[38] Kaplan, H. and Verbin, E. 2003. Efficient data structures and a new randomized approach
for sorting signed permutations by reversals. In Proceedings of the 14th Symposium
on Combinatorial Pattern Matching , Lecture Notes in Computer Science, 2676, 170–185.
Springer.
[39] Kececioglu, J. and Sankoff, D. 1993. Exact and approximation algorithms for the inversion
distance between two chromosomes. In Proceedings of the 4th Annual Symposium
on Combinatorial Pattern Matching (CPM 1993), 87–105, Springer.
[40] Lin, C. H., Zhao, H., Lowcay, S. H., Shahab, A., and Bourque, G. 2010. webMGR: an
online tool for the multiple genome rearrangement problem. Bioinformatics, 26, 408–410.
[41] Lin, Y. C., Lu, C. L., Chang, H.-Y. and Tang, C. Y. 2005. An efficient algorithm for
sorting by block-interchanges and its application to the evolution of vibrio species. Journal
of Computational Biology, 12, 102–112.
64
[42] Lin, Y. C., Lu, C. L., Liu, Y.-C., and Tang, C. Y. 2006. SPRING: a tool for the analysis
of genome rearrangement using reversals and block-interchanges. Nucleic Acids Research,
34, 696–699.
[43] Lin, Y. and Moret, B. M. E. 2008. Estimating true evolutionary distances under the DCJ
model. Bioinformatics, 24, i114–i122.
[44] Lu, C. L., Huang, Y. L., Wang, T. C., et al. 2006. Analysis of circular genome rearrangement
by fusions, fissions and block-interchanges. BMC Bioinformatics, 7.
[45] Lu, C. L., Wang, T. C., Lin, Y. C., and Tang, C. Y. 2005. ROBIN: a tool for genome
rearrangement of block-interchanges. Bioinformatics, 21, 2780–2782.
[46] Meidanis, J. and Dias, Z. 2000. An alternative algebraic formalism for genome rearrangements.
In Sankoff, D. and Nadeau, J. H., eds., Comparative Genomics: Empirical and
Analytical Approaches to Gene Order Dynamics, Map Alignment and Evolution of Gene
Families, 213–223. Kluwer Academic Press.
[47] Meidanis, J. and Dias, Z. 2001. Genome rearrangements distance by fusion, fission, and
transposition is easy. In Navarro, G., ed., Proceedings of the 8th International Symposium
on String Processing and Information Retrieval (SPIRE 2001), 250–253. IEEE Computer
Society.
[48] Meidanis, J. and Setubal, J., Computational Molecular Biology, PWS publishing, 1997.
[49] Mira, C. and Meidanis, J. 2007. Sorting by block-interchanges and signed reversals.
In Proceedings of the International Conference on Information Technology (ITNG 2007),
670–676. IEEE Computer Society.
[50] OBrien, S. J., Genetics Maps: Locus Maps of Complex Genomes. 6th ed. Cold Spring
Harbor, ME: Cold Spring Harbor Lab. Press, 1993.
[51] Ozery-Flato, M. and Shamir, R. 2006. An O(n3/2√log n) algorithm for sorting by reciprocal
translocations. In Lewenstein, M. and Valiente, G., eds., Proceedings of the 17th Annual
Symposium on Combinatorial Pattern Matching (CPM 2006), Lecture Notes in Computer
Science, volume 4009, 258–269. Springer.
65
[52] Palmer, J. D. and Herbon, L. A. 1988. Plant mitochondrial DNA evolves rapidly in
structure, but slowly in sequence. Journal of Molecular Evolution, 28, 87–97.
[53] Pevzner, P. and Tesler, G. 2003. Genome rearrangements in mammalian evolution: lessons
from human and mouse genomes. Genome Research, 13, 37–45.
[54] Sankoff, D. 2003. Rearrangement and chromosomal evolution. Current Opinion in Genetics
and Development, 13, 583–587.
[55] Sankoff, D., Leduc, G., Antoine, N., et al. 1992. Gene order comparisons for phylogenetic
inference: evolution of the mitochondrial genome. Proceedings of the National Academy of
Sciences, 89, 6575–6579.
[56] Swenson, K.M., Rajan, V., Lin, Y., and Moret, B.M.E. 2009. Sorting signed permutations
by inversions in O(nlogn). In Batzoglou, S. (ed.), Proceedings of the 13th Annual International
Conference on Research in Computational Molecular Biology, 386–399. Springer.
[57] Tannier, E., Bergeron, A., and Sagot, M.-F. 2007. Advances on sorting by reversals.
Discrete Applied Mathematics, 155, 881–888.
[58] Tesler, G. 2002. GRIMM: genome rearrangements web server. Bioinformatics, 18, 492–
493.
[59] Watterson, G. A., Ewens, W. J., Hall, T. E. and Morgan, A. 1982. The chromosome
inversion problem. Journal of Theoreticla Biology, 19, 1-7.
[60] Yancopoulos, S., Attie, O., and Friedberg, R. 2005. Efficient sorting of genomic permutations
by translocation, inversion and block interchange. Bioinformatics, 21, 3340–3346.
[61] Zhao, H. and Bourque, G. 2009. Recovering genome rearrangements in the mammalian
phylogeny. Genome Research, 19, 934–942.