簡易檢索 / 詳目顯示

研究生: 張光瑜
Chang, Kuang-Yu
論文名稱: 類樹狀演化網路研究:重建與編碼
A Study on Tree-Like Phylogenetic Networks: Reconstruction and Encodings
指導教授: 韓永楷
Hon, Wing-Kai
口試委員: 彭勝龍
Peng, Sheng-Lung
謝孫源
Hsieh, Sun-Yuan
李哲榮
Lee, Che-Rung
廖崇碩
Liao, Chung-Shou
學位類別: 博士
Doctor
系所名稱:
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 103
中文關鍵詞: 演化網路演算法編碼
外文關鍵詞: phylogenetic networks, level-k networks, k-articulated networks, galled trees
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 演化網路(phylogenetic networks)為用於表示「不同生物物種或是不同生物分類之間,含有物種混合事件(reticulation events)的演化歷史關係」的有向無環圖。此論文將探討其中兩種網路:單絞接網路(1-articulated networks)與第一級網路(galled trees, level-1 networks)。和它們的生成樹相比,這些網路只多包含了少量的邊,因此我們把它們稱為類樹狀網路(tree-like netowrks)。

    此論文主要探討兩個問題。其一為由表示物種之間演化距離的距離矩陣(distance matrices)重建單絞接網路。令n為物種的數量,我們的主要成果是一個時間複雜度為O(n^2)的演算法。在此演算法所建造的網路中,兩物種間的最短演化路線的長度為距離矩陣中的值。另一個問題為第一級網路的編碼。我們在此提出一個緊實的編碼,並證明我們可直接在最佳時間複雜度下用此編碼解決樹包含問題(tree-containment problems, TCP)。

    此論文還包括以下相關題目:(1)單絞接網路的樹包含問題。(2)以距離集合矩陣(set-distance matrices)重建單絞接網路。(3)以距離矩陣重建雙絞接網路(2-articulated networks)。


    A phylogenetic network is a directed acyclic graph for representing the evolutionary history between species or taxa involving reticulation events. Here, we focus on two classes of phylogenetic networks: 1-articulated networks and galled trees (level-1 networks). Since these networks contain only a small number of additional edges from their embedded spanning tree, we refer to them as tree-like networks.

    Two main problems are discussed in the dissertation: The first one is to reconstruct 1-articulated networks from distance matrices, which represent the evolutionary distance between species. Our main result is a O(n^2) time algorithm, where n is the number of species, for constructing networks where the shortest evolutionary path between any pair of species satisfies the input distance. The other is to encode a level-1 network. For this problem, we propose a compact encoding, and show that the tree containment with galled tree can be solved optimally with our encoding.

    Other related problems are also discussed in this dissertation. The problems include the \emph{tree containing problem} (TCP) for 1-articulated networks, reconstructing 1-articulated networks from set-distance matrices, and reconstructing 2-articulated networks from distance matrices.

    1 Introduction 7 1.1 Models of Networks . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 The Reconstruction Problem and the TCP Problem . . . . . . 10 1.3 1-Articulated Network Compared with Other Network Classes 12 1.4 Encoding Data Structures . . . . . . . . . . . . . . . . . . . . 12 2 Preliminaries 14 2.1 Phylogenetic Networks . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Ultrametric Networks and Evolutionary Distance . . . . . . . 16 3 Reconstructing 1-Articulated Phylogenetic Networks with Distance Matrices 18 3.1 Properties of an MSN . . . . . . . . . . . . . . . . . . . . . . 19 3.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 The Procedure BuildNet . . . . . . . . . . . . . . . . . 24 3.2.3 Assigning Edge Weights . . . . . . . . . . . . . . . . . 28 3.2.4 Verifying the Network . . . . . . . . . . . . . . . . . . 29 3.2.5 Time Complexity . . . . . . . . . . . . . . . . . . . . . 29 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Reconstructing Networks with More General Matrix . . . . . . 35 3.4.1 Properties of the Satisfying Networks . . . . . . . . . . 37 3.4.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . 41 3.5 Matrix With Errors . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.1 Quasi-Biclique and Maximal Quasi-Biclique . . . . . . 45 3.5.2 The Algorithm and Time Complexity . . . . . . . . . . 45 4 Encoding Galled Trees 48 4.1 Proposed Encoding Method for the Structure of a Galled Tree 48 4.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.1 Getting the Node Type . . . . . . . . . . . . . . . . . 51 4.2.2 Traversal: Getting the Parent Nodes . . . . . . . . . . 52 4.2.3 Traversal: Getting the Child Nodes . . . . . . . . . . . 53 4.2.4 Getting the Lowest Common Ancestors . . . . . . . . . 54 4.2.5 Counting the Reachable Leaf Nodes . . . . . . . . . . 55 4.3 Reconstructing a Galled Tree from the Encoding . . . . . . . . 57 4.4 Application: Tree Containment Problem . . . . . . . . . . . . 58 4.4.1 Preparing Data for N and T . . . . . . . . . . . . . . . 59 4.4.2 Comparing N and T . . . . . . . . . . . . . . . . . . . 60 4.5 Open Problem: Number of unlabeled Galled Trees . . . . . . . 62 4.6 Extension: Encoding Level-2 Networks . . . . . . . . . . . . . 63 4.6.1 Simple Level-k Generators and Simple Networks . . . . 63 4.6.2 Encoding Method . . . . . . . . . . . . . . . . . . . . . 65 4.6.3 Reconstructing a Level-2 Network from the Encoding . 67 5 Solving TCP problem for 1-Articulated Phylogenetic Networks 70 5.1 Algorithm for General 1-Articulated Networks . . . . . . . . . 71 5.2 Indexing 1-Articulated Networks without Skewed Loops . . . . 72 6 Reconstructing 1-Articulated Phylogenetic Networks with Set-Distance Matrices 74 6.1 Properties of Satisfying Networks . . . . . . . . . . . . . . . . 75 6.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 79 7 Reconstructing 2-Articulated Phylogenetic Networks with Distances Matrices 80 7.1 Properties of Satisfying 2-articulated Networks . . . . . . . . . 81 7.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2.1 Partition the Species Set S . . . . . . . . . . . . . . . 85 7.2.2 Fix the Partition . . . . . . . . . . . . . . . . . . . . . 86 7.2.3 Construct the Subnetworks . . . . . . . . . . . . . . . 87 7.2.4 Reconnect the Subnetworks . . . . . . . . . . . . . . . 88 7.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 91 8 Conclusion and Open Problems 92 8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.2 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    [1] Michael A. Bender, Martin Farach-Colton, Giridhar Pemmasani, Steven
    Skiena, and Pavel Sumazin. Lowest common ancestors in trees and
    directed acyclic graphs. Journal of Algorithms, 57(2), 2005.
    [2] Magnus Bordewich and Nihan Tokac. An algorithm for reconstructing
    ultrametric tree-child networks from inter-taxa distances. Discrete Applied
    Mathematics, 213:47–59, 2016. doi: 10.1016/j.dam.2016.05.011.
    URL https://doi.org/10.1016/j.dam.2016.05.011.
    [3] David Bryant and Vincent Moulton. Neighbornet: An agglomerative
    method for the construction of planar phylogenetic networks.
    In Algorithms in Bioinformatics, Second International Workshop,
    WABI 2002, Rome, Italy, September 17-21, 2002, Proceedings,
    pages 375–391, 2002. doi: 10.1007/3-540-45784-4_28. URL
    https://doi.org/10.1007/3-540-45784-4_28.
    [4] Gabriel Cardona, Francesc Rosselló, and Gabriel Valiente. Comparison
    of tree-child phylogenetic networks. IEEE/ACM Trans. Comput. Biology
    Bioinform., 6(4):552–569, 2009. doi: 10.1145/1671403.1671406. URL
    http://doi.acm.org/10.1145/1671403.1671406.
    [5] Ho-Leung Chan, Jesper Jansson, Tak Wah Lam, and Siu-Ming
    Yiu. Reconstructing an ultrametric galled phylogenetic network
    from a distance matrix. J. Bioinformatics and Computational Biology,
    4(4):807–832, 2006. doi: 10.1142/S0219720006002211. URL
    https://doi.org/10.1142/S0219720006002211.
    [6] William H. E. Day. Optimal algorithms for comparing trees
    with labeled leaves. Journal of Classification, 2(1):7–28, Dec
    1985. ISSN 1432-1343. doi: 10.1007/BF01908061. URL
    https://doi.org/10.1007/BF01908061.
    [7] Hicham El-Zein, J. Ian Munro, and Siwei Yang. On the succinct representation
    of unlabeled permutations. In Algorithms and Computation -
    26th International Symposium, ISAAC 2015, Nagoya, Japan, December
    9-11, 2015, Proceedings, pages 49–59, 2015. doi: 10.1007/978-3-662-
    48971-0_5. URL https://doi.org/10.1007/978-3-662-48971-0_5.
    [8] Johannes Fischer and Daniel Peters. GLOUDS: representing
    tree-like graphs. J. Discrete Algorithms, 36:
    39–49, 2016. doi: 10.1016/j.jda.2015.10.004. URL
    https://doi.org/10.1016/j.jda.2015.10.004.
    [9] Philippe Gambette, Andreas D. M. Gunawan, Anthony Labarre,
    Stéphane Vialette, and Louxin Zhang. Locating a tree in a phylogenetic
    network in quadratic time. In Research in Computational
    Molecular Biology - 19th Annual International Conference,
    RECOMB 2015, Warsaw, Poland, April 12-15, 2015, Proceedings,
    pages 96–107, 2015. doi: 10.1007/978-3-319-16706-0_12. URL
    https://doi.org/10.1007/978-3-319-16706-0_12.
    [10] Dan Gusfield, Satish Eddhu, and Charles H. Langley. Efficient
    reconstruction of phylogenetic networks with constrained recombination.
    In 2nd IEEE Computer Society Bioinformatics Conference,
    CSB 2003, Stanford, CA, USA, August 11-14, 2003,
    pages 363–374, 2003. doi: 10.1109/CSB.2003.1227337. URL
    https://doi.org/10.1109/CSB.2003.1227337.
    [11] Daniel H. Huson and Tobias H. Klöpper. Beyond galled trees - decomposition
    and computation of galled networks. In Research in Computational
    Molecular Biology, 11th Annual International Conference,
    RECOMB 2007, Oakland, CA, USA, April 21-25, 2007, Proceedings,
    pages 211–225, 2007. doi: 10.1007/978-3-540-71681-5_15. URL
    https://doi.org/10.1007/978-3-540-71681-5_15.
    [12] Daniel H Huson, Regula Rupp, and Celine Scornavacca. Phylogenetic
    networks: concepts, algorithms and applications. Cambridge University
    Press, 2010.
    [13] Trinh N. D. Huynh, Jesper Jansson, Nguyen Bao Nguyen, and Wing-Kin
    Sung. Constructing a smallest refining galled phylogenetic network. In
    Research in Computational Molecular Biology, 9th Annual International
    Conference, RECOMB 2005, Cambridge, MA, USA, May 14-18, 2005,
    Proceedings, pages 265–280, 2005. doi: 10.1007/11415770_20. URL
    https://doi.org/10.1007/11415770_20.
    [14] Guy Jacobson. Space-efficient static trees and graphs. In 30th
    Annual Symposium on Foundations of Computer Science, Research
    Triangle Park, North Carolina, USA, 30 October - 1 November
    1989, pages 549–554, 1989. doi: 10.1109/SFCS.1989.63533. URL
    https://doi.org/10.1109/SFCS.1989.63533.
    [15] Jesper Jansson and Wing-Kin Sung. Inferring a level-1 phylogenetic
    network from a dense set of rooted triplets. Theor. Comput.
    Sci., 363(1):60–68, 2006. doi: 10.1016/j.tcs.2006.06.022. URL
    https://doi.org/10.1016/j.tcs.2006.06.022.
    [16] Jinyan Li, Kelvin Sim, Guimei Liu, and Limsoon Wong. Maximal quasibicliques
    with balanced noise tolerance: Concepts and co-clustering
    applications. In Proceedings of the SIAM International Conference
    on Data Mining, SDM 2008, April 24-26, 2008, Atlanta, Georgia,
    USA, pages 72–83, 2008. doi: 10.1137/1.9781611972788.7. URL
    https://doi.org/10.1137/1.9781611972788.7.
    [17] Xiaowen Liu, Jinyan Li, and Lusheng Wang. Modeling protein
    interacting groups by quasi-bicliques: Complexity, algorithm,
    and application. IEEE/ACM Trans. Comput. Biology Bioinform.,
    7(2):354–364, 2010. doi: 10.1145/1791396.1791412. URL
    http://doi.acm.org/10.1145/1791396.1791412.
    [18] J. Ian Munro and Patrick K. Nicholson. Compressed representations
    of graphs. In Encyclopedia of Algorithms, pages
    382–386. 2016. doi: 10.1007/978-1-4939-2864-4_646. URL
    https://doi.org/10.1007/978-1-4939-2864-4_646.
    [19] J. Ian Munro and Venkatesh Raman. Succinct representation
    of balanced parentheses and static trees. SIAM J. Comput.,
    31(3):762–776, 2001. doi: 10.1137/S0097539799364092. URL
    https://doi.org/10.1137/S0097539799364092.
    [20] Luay Nakhleh, Tandy Warnow, C Randal Linder, and Katherine St
    John. Reconstructing reticulate evolution in species—theory and practice.
    Journal of Computational Biology, 12(6):796–811, 2005.
    [21] Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. ACM
    Comput. Surv., 39(1):2, 2007. doi: 10.1145/1216370.1216372. URL
    http://doi.acm.org/10.1145/1216370.1216372.
    [22] Gonzalo Navarro and Kunihiko Sadakane. Fully functional static and
    dynamic succinct trees. ACM Trans. Algorithms, 10(3):16:1–16:39, 2014.
    doi: 10.1145/2601073. URL http://doi.acm.org/10.1145/2601073.
    [23] Yakov Nekrich. Orthogonal range searching on discrete
    grids. In Encyclopedia of Algorithms, pages 1484–
    1489. 2016. doi: 10.1007/978-1-4939-2864-4_631. URL
    https://doi.org/10.1007/978-1-4939-2864-4_631.
    [24] Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti.
    Succinct indexable dictionaries with applications to encoding kary
    trees, prefix sums and multisets. ACM Trans. Algorithms,
    3(4):43, 2007. doi: 10.1145/1290672.1290680. URL
    https://doi.org/10.1145/1290672.1290680.
    [25] Kelvin Sim, Jinyan Li, Vivekanand Gopalkrishnan, and Guimei Liu.
    Mining maximal quasi-bicliques to co-cluster stocks and financial ratios
    for value investment. In Proceedings of the 6th IEEE International
    Conference on Data Mining (ICDM 2006), 18-22 December 2006, Hong
    Kong, China, pages 1059–1063, 2006. doi: 10.1109/ICDM.2006.111.
    URL https://doi.org/10.1109/ICDM.2006.111.
    [26] N. J. A. Sloane. The On-Line Encyclopedia of Integer Sequences. URL
    https://oeis.org/A001190.
    [27] Leo van Iersel, Judith Keijsper, Steven Kelk, Leen Stougie,
    Ferry Hagen, and Teun Boekhout. Constructing level-2 phylogenetic
    networks from triplets. In Research in Computational
    Molecular Biology, 12th Annual International Conference, RECOMB
    2008, Singapore, March 30 - April 2, 2008. Proceedings,
    pages 450–462, 2008. doi: 10.1007/978-3-540-78839-3_40. URL
    https://doi.org/10.1007/978-3-540-78839-3_40.
    [28] Leo van Iersel, Judith Keijsper, Steven Kelk, Leen Stougie, Ferry
    Hagen, and Teun Boekhout. Constructing level-2 phylogenetic networks
    from triplets. IEEE/ACM Trans. Comput. Biology Bioinform.,
    6(4):667–681, 2009. doi: 10.1145/1671403.1671415. URL
    http://doi.acm.org/10.1145/1671403.1671415.
    [29] Leo van Iersel, Charles Semple, and Mike A. Steel. Locating
    a tree in a phylogenetic network. Inf. Process. Lett.,
    110(23):1037–1043, 2010. doi: 10.1016/j.ipl.2010.07.027. URL
    https://doi.org/10.1016/j.ipl.2010.07.027.

    QR CODE