研究生: |
李奇諺 Lee, Chi Yen |
---|---|
論文名稱: |
基因演化樹之兩種修正問題的有效率演算法 Efficient Algorithms for Two Gene Tree Correction Problems |
指導教授: |
王炳豐
Wang, Biing Feng |
口試委員: |
蔡錫鈞
Tsai, Shi Chun 蔣宗哲 Chiang, Tsung Che |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 48 |
中文關鍵詞: | 譜系 、基因樹 、物種演化樹 、樹融合 、基因演化樹之修正問題 |
外文關鍵詞: | Phylogenies, Gene trees, Species trees, Reconciliations, Gene tree correction problems |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
將基因歸類成不同家族的同源基因(也就是從一個共同祖先基因演化出來的基因片段),並重建每個家族的譜系對於基因的註解、演化過程、以及功能方面的研究是一件非常重要的基礎工作。基因之間的直系同源(由物種分化事件演化而來)關係以及旁系同源(由基因重複事件演化而來)關係對於基因之間功能性的關聯有著很重要的含意。基因樹及物種樹之間的樹融合 (reconciliation) 是一種經常用來推論基因之間直系同源以及旁系同源關係的方法。這個方法的正確性非常依賴基因樹在拓樸結構上的正確性。基因樹拓樸結構上的錯誤可能來自於重建的過程(例如序列中的錯誤資訊)或者來自於重建的方法本身(例如一個啟發式演算法)。研究顯示即使少量的樹葉擺放錯誤就可能推論出一個差異極大的演化歷史。因此目前有非常多的研究投注在檢測並修正基因樹中的錯誤,本篇論文主要在探討基因樹的修正問題 (gene tree correction problems)。
本篇論文討論兩個基因樹修正問題分別稱為直系同源基因之修正問題 (the GOC problem) 以及直系同源支序群之修正問題 (the COC problem),給定一個基因樹 G 以及一組指證出錯誤的資訊 P,針對直系同源基因之修正問題,本論文提出了一個改進演算法,將原本 Lafond 等人提出的 O(|P|×|G|) 時間演算法改進到 O(|P|×log |G| + |G|) 的時間。關於直系同源支序群之修正問題, Lafond等人提出兩個有效率的演算法,兩個演算法都需要 O(|P|×|G|) 時間。他們的第一個演算法所找出的是最佳的答案中突變成本 (mutation cost) 最小的解;他們的第二個演算法所找出的是最佳的答案中維持最多原樹中的三聯體 (triplets) 的解,也就是和原給定之基因樹拓樸結構最接近的解。本篇論文為直系同源支序群之修正問題提出了一個新的演算法。這個新的演算法需要 O(|G|) 時間,所找出的是最佳的答案中含有最多直系同源基因對的解。
關鍵字:譜系、基因樹、物種演化樹、樹融合、基因演化樹之修正問題
本篇論文討論兩個基因樹修正問題分別稱為直系同源基因之修正問題 (the GOC problem) 以及直系同源支序群之修正問題 (the COC problem),給定一個基因樹 G 以及一組指證出錯誤的資訊 P,針對直系同源基因之修正問題,本論文提出了一個改進演算法,將原本 Lafond 等人提出的 O(|P|×|G|) 時間演算法改進到 O(|P|×log |G| + |G|) 的時間。關於直系同源支序群之修正問題, Lafond等人提出兩個有效率的演算法,兩個演算法都需要 O(|P|×|G|) 時間。他們的第一個演算法所找出的是最佳的答案中突變成本 (mutation cost) 最小的解;他們的第二個演算法所找出的是最佳的答案中維持最多原樹中的三聯體 (triplets) 的解,也就是和原給定之基因樹拓樸結構最接近的解。本篇論文為直系同源支序群之修正問題提出了一個新的演算法。這個新的演算法需要 O(|G|) 時間,所找出的是最佳的答案中基因複製成本 (duplication cost) 最小的解。
關鍵字:譜系、基因樹、物種演化樹、樹融合、基因演化樹之修正問題
Grouping genes into families of homologs (i.e. copies originating from a single ancestral gene) and reconstructing the phylogeny of each gene family is essential for a variety of annotation, evolutionary, and functional studies. The orthology (divergence by speciation) and paralogy (divergence by duplication) relationships between genes are important implications towards the functional relationships between gene copies. A popular approach for inferring these relationships is to reconcile the obtained gene tree with a species tree. The accuracy of reconciliation strongly depends on the reliability of the gene tree's topology. Topological errors in a gene tree can be caused by the inference process (e.g. noise in the underlying sequence data) or the inference method itself (e.g. heuristic results). Even a few misplaced leaves will lead to a totally different history. Therefore, a great deal of effort has been put into detecting errors and then correcting the errors in a gene tree. The focus of this thesis is the gene tree correction problem.
This thesis discusses two gene tree correction problems, named, respectively, the gene orthology correction problem (GOC problem) and the clade orthology correction problem (COC problem). Let G be the given gene tree and P be the given set of errors. For the GOC problem, we give an O(|P|×log |G| + |G|)-time algorithm, which improves the previous upper bound presented by Lafond et al. from O(|P|×|G|). For the COC problem, Lafond et al. gave two efficient algorithms. The first requires O(|P|×|G|) time and finds an optimal solution that induces a reconciliation (with the given species tree) minimizing the mutation cost. The second requires O(|P|×|G|) time and finds an optimal solution that also maximizes the number of common triplets with the original gene tree. In this thesis, a new algorithm is presented. The presented algorithm requires O(|G|) time and finds an optimal solution that induces a reconciliation maximizing the number of orthologous gene pairs.
Keywords: Phylogenies, Gene trees, Species trees, Reconciliations, Gene tree correction problems
[1] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic Local Alignment Search Tool," Molecular Biology, vol. 215, pp. 403-410, 1990.
[2] L. Arvestad, AC. Berglund, J. Lagergren, and B. Sennblad, "Bayesian gene/species tree reconciliation and orthology analysis using MCMC," Bioinformatics, vol. 19, pp. i7-i15, 2003.
[3] B. Behzadi, and M. Vingron, "Reconstructing domain compositions of ancestral multi-domain proteins," in Proc. Research in Computational Molecular Biology, 2006.
[4] P. Bonizzoni, G. D. Vedova, and R. Dondi, "Reconciling a gene tree to a species tree under the duplication cost model," Theoretical Computer Science, vol. 347, pp. 36-53, 2005.
[5] WC. Chang and O. Eulenstein, "Reconciling gene trees with apparent polytomies," in Proc. Computing and Combinatorics Conference, 2006.
[6] R. Chaudhary, J. Burleigh and O. Eulenstein, "Efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence," BMC Bioinformatics, vol. 13: S11, 2012
[7] C. Chauve and N. El-Mabrouk, "New perspectives on gene family evolution: losses in reconciliation and a link with supertrees," in Proc. Research in Computational Molecular Biology, 2009.
[8] C. Chauve, N. El-Mabrouk, and E. Tannier "Models and algorithms for genome evolution," Springer, 2013.
[9] K. Chen, D. Durand and M. Farach-Colton, "Notung: A program for dating gene duplications and optimizing gene family trees," Journal of Computational Biology, vol. 7, pp. 429-447, 2000.
[10] F. Chen, A. Mackey, J. Vermunt and D. Roos, "Assessing performance of orthology detection strategies applied to eukaryotic genomes," PLoS ONE, vol. 2: e383, 2007.
[11] R. Dondi, N. El-Mabrouk and K. M. Swenson, "Gene tree correction for reconciliation and species tree inference: Complexity and algorithms," Journal of Discrete Algorithms, vol. 25, pp. 51-65, 2014.
[12] A. Doroftei and N. El-Mabrouk, "Removing noise from gene trees," in Proc. Workshop on Algorithms in Bioinformatics, 2011.
[13] JP. Doyon, V. Ranwez, V. Daubin, and V. Berry, "Models, algorithms and programs for Phylogeny reconciliation," Briefings in Bioinformatics, vol. 12, pp. 392-400, 2011.
[14] D. Durand, BV. Halldorsson, and B. Vernot, "A hybrid micro-macroevolutionary approach to gene tree reconstruction," Journal of Computational Biology, 13, pp. 320-335, 2006.
[15] J. Felsenstein, "Evolutionary trees from DNA sequences: a maximum likelihood approach," Journal of Molecular Evolution, vol. 17, pp. 368-376, 1981.
[16] W. M. Fitch, "Toward defining the course of evolution: minimum change for a specified tree topology," Systematic Zoology, vol. 20, pp. 406-416, 1971.
[17] H. N. Gabow and R. E. Tarjan, "A linear-time algorithm for a special case of disjoint set union," Journal of Computer and System Sciences, vol. 30, pp. 209-221, 1985.
[18] M. Goodman, J. Czelusniak, G. W. Moore, A. E. Romero-Herrera, and G. Mat-suda, "Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences," Systematic Zoology, vol. 28, pp. 132-63, 1979.
[19] P. Górecki and O. Eulenstein, "A linear time algorithm for error-corrected reconciliation of unrooted gene trees," in Proc. International Symposium Bioinformatics Research and Applications, 2011.
[20] P. Górecki and O. Eulenstein, "Algorithms: simultaneous error-correction and rooting for gene tree reconciliation and the gene duplication problem," BMC Bioinformatics, vol. 13: S14, 2012.
[21] P. Górecki and J. Tiuryn, "DLS-trees: a model of evolutionary scenarios," Theoretical Computer Science, vol. 359, pp. 378-399, 2006.
[22] P. Górecki and J. Tiuryn, "Inferring phylogeny from whole genomes," Bioinformatics, vol. 23, pp. e116-e122, 2007.
[23] R. Guigo, I. Muchnik and TF. Smith, "Reconstruction of ancient molecular phylogeny," Molecular Phylogenetics and Evolution, vol. 6, pp. 189-213, 1996.
[24] M. W. Hahn, "Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution," Genome Biology, vol. 8: R141, 2007.
[25] J. Kim and T. Warnow, "Tutorial on phylogenetic tree estimation," in Proc. Intelligent Systems for Molecular Biology, 1999.
[26] M. Lafond, M. Semeria, K. Swenson, E. Tannier, and N. El-Mabrouk, "Gene tree correction guided by orthology," BMC Bioinformatics, vol. 14: S5, 2013.
[27] M. Lafond and N. El-Mabrouk, "Orthology and paralogy constraints: satisfiability and consistency," BMC Genomics, vol. 15: S12, 2014.
[28] M. Lafond, K. M. Swenson and N. El-Mabrouk, "An Optimal Reconciliation Algorithm for Gene Trees with Polytomies, "in Proc. Workshop on Algorithms in Bioinformatics, 2012.
[29] B. Ma, M. Li, and L. Zhang, "From gene trees to species trees," SIAM Journal of Computing, vol. 30, pp. 729-752, 2000.
[30] T. H. Nguyen, V. Ranwez, S. Pointet, AM. A. Chifolleau, JP. Doyon, and V. Berry, "Reconciliation and local gene tree rearrangement can be of mutual profit," Algorithms for Molecular Biology, vol. 8: 12, 2013.
[31] R. D. M. Page and M. A. Charleston, "Reconciled trees and incongruent gene and species trees," in Proc. Mathematical Hierarchies and Biology, vol. 37, 57-70, 1996.
[32] R. D. M. Page and M. A. Charleston, "Trees within trees: phylogeny and historical associations," Trends in Ecology and Evolution, vol. 13, pp. 356-359, 1998.
[33] DR. Robinson and LR. Foulds, "Comparison of phylogenetic trees," Mathematical Biosciences, vol. 53, pp. 131-147, 1981.
[34] MJ. Sanderson and MM. McMahon, "Inferring angiosperm phylogeny from EST data with widespread gene duplication," BMC Evolutionary Biology, vol. 7: S3 , 2007
[35] JB. Slowinski, "Molecular polytomies," Molecular Phylogenetics and Evolution, vol. 19, pp. 114-120, 2001.
[36] L. Li, C.J. Stoeckert Jr., and D. S. Roos, "OrthoMCL: identification of ortholog groups for eukaryotic genomes," Genome Research, vol. 13, pp. 2178-2189, 2003.
[37] K. M. Swenson, A. Doroftei, and N. El-Mabrouk, "Gene tree correction for reconciliation and species tree inference," Algorithms for Molecular Biology, vol. 7: 31, 2012.
[38] PD. Thomas, "GIGA: a simple, efficient algorithm for gene tree inference in the genomic age," BMC Bioinformatics, vol. 11: 312, 2010.
[39] I. Wapinski, A. Pfeffer, N. Friedman, and A. Regev, "Automatic genome-wide reconstruction of phylogenetic gene trees," Bioinformatics, vol. 23, pp. i549-i558, 2007.
[40] A. Wehe, MS. Bansal, GJ. Burleigh and O. Eulenstein, "Dup-Tree: a program for large-scale phylogenetic analyses using gene tree parsimony," Bioinformatics, vol. 24, pp. 1540-1541, 2008.
[41] J. Zhang, "Evolution by gene duplication: an update," Trends in Ecology and Evolution, vol. 18, pp. 292-298, 2003.
[42] Y. Zheng and L. Zhang, "Reconciliation with Non-binary Gene Trees Revisited," in Proc. Research in Computational Molecular Biology, 2014.
[43] C. M. Zmasek and S. R. Eddy, "ATV: display and manipulation of annotated phylogenetic trees," Bioinformatics, vol. 17, pp. 383-384, 2001.
[44] C. M. Zmasek and S. R. Eddy, "A simple algorithm to infer gene duplication and speciation events on a gene tree," Bioinformatics, vol. 17, 821-828, 2001.