簡易檢索 / 詳目顯示

研究生: 許文弘
Sheu, Wen-Horng
論文名稱: 最小列分解問題之演算法研究
Algorithms for the Minimum Split-Row Problem
指導教授: 王炳豐
Wang, Biing-Feng
口試委員: 盧錦隆
Lu, Chin-Lung
王弘倫
Wang, Hung-Lung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 71
中文關鍵詞: 演算法固定參數可處理最小列分解問題最小相異列分解問題完美演化樹
外文關鍵詞: algorithms, fixed-parameter tractability, minimum split-row problem, minimum distinct conflict-free row split problem, perfect phylogenies
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 著眼於癌症基因學上的應用,Hajirasouliha 與 Raphael [WABI 2014] 提出了「最小列分解問題」。此問題的輸入為一個 m 列 n 行的二元矩陣 M。「列分解」是一種作用在 M 上的操作,其定義如下:取 k > 1 個列 r^{(1)}, r^{(2)}, ..., r^{(k)} 來取代 M 中原有的的任一列 r,其中 r^{(1)}, r^{(2)}, ..., r^{(k)} 的按位元或 (bitwise OR) 之結果須等於 r。列分解操作的花費等於 M 在操作後增加的列之數量,即 k − 1。此問題的目標是找出花費最少的一系列列分解操作,使得這些操作依序作用在 M 上後所得到的矩陣是一棵完美演化樹的矩陣表示法。在近來的一篇研究論文中,Hujdurović et al. [TALG 2018] 證明最小列分解問題是個 APX-hard 問題,並提出有效率的精準演算法與近似演算法,該論文最終建議以參數化研究做為最小列分解問題未來的研究方向。令 ε(M) 表示將二元矩陣 M 轉換為一棵完美演化樹之矩陣表示法所需要的最小花費。本論文提出一個在 O*(2^{min(n, 2ε(M))}) 時間內解決最小列分解問題的精準演算法。在參數化複雜度的術語中,此成果代表最小列分解問題由 ε(M) 作為參數時,是一個「固定參數可處理」問題。此外,在最差情況下,此演算法花費至多 O*(2^n) 的時間,顯著地改進了現有最佳演算法之時間複雜度 O*(n^n)。由應用面來看,列分解操作之目的為將 m 個混合的腫瘤樣本分解為 m + ε(M) 個細胞子群體,其中樣本混合之肇因來自於現有 DNA 定序技術之限制。在此觀點下,本論文的參數化成果代表著當樣本的混合情形不嚴重時,最小列分解問題可以有效率地被解決。經過適當的延伸後,本論文的演算法可被修改為枚舉最小列分解問題的所有最佳解,此延伸演算法之預處理時間為 O*(3^{min(n, 2ε(M))}),且任兩輸出之間的時間延遲與後輸出者之大小呈線性關係。
    Hujdurović et al. 的演算法經過修改後可以解決最小列分解問題的一個變化問題,稱作「最小相異列分解問題」。本論文對於最小列分解的精確演算法與枚舉演算法經過延伸後也可解決這一變化問題。此外,本論文對於最小列分解問題和最小相異列分解問題的演算法可被延伸為解決此兩問題帶有以下額外限制的版本之演算法:只有輸入指定的一個列的子集合可以被分解。現有所有針對最小列分解問題之演算法及本論文所提出之演算法皆需要預先計算一張有向圖──稱作包含關係圖──來表示輸入矩陣。過往的論文皆使用一個簡單的 O(mn^2) 演算法建構此關係圖,本論文提出了一個較有效率的建構方法法,其時間複雜度為 max{O(m^0.373 n^2), O(mn^1.373)}。


    Motivated by an application in cancer genomics, Hajirasouliha and Raphael [WABI 2014] proposed the minimum split-row problem (MSRP). In this problem, an m × n binary matrix M is given. A split-row operation on M is defined as replacing a row r by k > 1 rows r^{(1)}, r^{(2)}, ..., r^{(k)} whose bitwise OR is equal to r. The cost of the operation is the number of ad-ditional rows induced, that is, k − 1. The goal is to find a sequence of split-row operations that transforms M into a matrix corresponding to a perfect phylogeny and the total cost is minimized. Recently, Hujdurović et al. [TALG 2018] proved the APX-hardness of MSRP and presented efficient exact and approximation algorithms. The parameterized study of MSRP was left as a direction for future work. Let ε(M) denote the minimum cost of trans-forming a binary matrix M into a matrix corresponding to a perfect phylogeny. This thesis gives an O*(2^{min(n, 2ε(M))})-time exact algorithm for MSRP. This result implies that MSRP is fixed-parameter tractable when parameterized by ε(M). In addition, in the worst case, the new algorithm requires O*(2^n) time, significantly improving the previous upper bound of O*(n^n). For the application in cancer genomics, split-row operations are performed in order to decompose m mixed tumor samples into m + ε(M) tumor cell subpopulations. In this perspec-tive, the fixed-parameter tractability of MSRP indicates that when the amount of mixing, which results from the technical limitation of current sequencing technologies, in the samples is small, MSRP can be solved efficiently. The new algorithm can be extended to enumerate all optimal solutions with the following bounds: the preprocessing time is O*(3^{min(n, 2ε(M))}) and the delay between two consecutive outputs is linear in the size of the next output.
    Hujdurović et al.'s exact algorithm can be modified to solve a variant of MSRP, called the minimum distinct conflict-free row split problem (MDCRSP). The new exact and enumeration algorithms for MSRP can be modified to solve this variant as well. In addition, the new MSRP and MDCRSP algorithms can be extended to solve MSRP and MDCRSP with the following additional constraint: only the rows in a given subset are allowed to be split. All of the previous algorithms for MSRP and the algorithms presented in this thesis precompute a directed graph, called the containment digraph, to represent the input matrix. A naive construction of the graph requires O(mn^2) time. This thesis gives a more efficient construction, which requires max{O(m^0.373 n^2), O(mn^1.373)} time.

    Abstract i 摘要 ii Contents iii List of Figures v Chapter 1 Introduction 1 Chapter 2 Review of the branching formulation 7 2.1 Connection between conflict-free row splits and perfect phylogenies 7 2.2 The branching formulation 10 2.3 Bijection between compact conflict-free row splits and branchings 14 Chapter 3 An upper bound on t(M) 17 Chapter 4 An O*(2min(n, 2·ε(M)))-time algorithm 25 4.1 Phase 1 25 4.2 Phase 2 30 4.3 Phase 3 32 Chapter 5 An enumeration algorithm for MSRP 38 5.1 Layer 1 39 5.2 Layer 2 41 5.3 Layer 3 43 5.4 The algorithm 46 Chapter 6 Algorithms for CMSRP, MDCRSP, and CMDCRSP 52 6.1 CMSRP 52 6.2 MDCRSP 53 6.3 CMDCRSP 56 Chapter 7 Enumeration algorithms for CMSRP, MDCRSP, and CMDCRSP 57 7.1 CMSRP 57 7.2 MDCRSP 58 7.3 CMDCRSP 62 Chapter 8 An improved algorithm for constructing the containment digraph 65 Chapter 9 Conclusion and future work 68 References 69

    [1] A. V. Aho, M. R. Garey, and J. D. Ullman, "The Transitive Reduction of a Directed Graph," SIAM Journal on Computing, vol. 1, no. 2, pp. 131−137, 1972.
    [2] A. Bashashati, G. Ha, A. Tone, J. Ding, L. M. Prentice, A. Roth, J. Rosner, K. Shumansky, S. Kalloger, J. Senz, W. Yang, M. McConechy, N. Melnyk, M. Anglesio, M. T. Y. Luk, K. Tse, T. Zeng, R. Moore, Y. Zhao, M. A. Marra, B. Gilks, S. Yip, D. G. Huntsman, J. N. McAlpine, and S. P. Shah, "Distinct evolutionary trajectories of primary high-grade serous ovarian can-cers revealed through spatial mutational profiling," The Journal of Pathology, vol. 231, no. 1, pp. 21−34, 2013.
    [3] A. Björklund, T. Husfeldt, P. Kaski, and M. Koivisto, "Fourier Meets MöBius: Fast Subset Convolution," In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (STOC '07), pp. 67–74, 2007.
    [4] P. J. Campbell, E. D. Pleasance, P. J. Stephens, E. Dicks, R. Rance, I. Goodhead, G. A. Fol-lows, A. R. Green, P. A. Futreal, and M. R. Stratton, "Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing," in Proceedings of the National Academy of Sci-ences, vol. 105, no. 35, pp. 13081−13086, 2008.
    [5] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms (3rd ed.), The MIT Press, 2009.
    [6] M. Cygan, F. V. Fomin, L. Kowalik, D. Lokshtanov, D. Marx, M. Pilipczuk, M. Pilipczuk, and S. Saurabh, Parameterized Algorithms (1st ed.), Springer Publishing Company, Incorpo-rated, 2015.
    [7] S. Deep, X. Hu, and P. Koutris, "Fast Join Project Query Evaluation using Matrix Multiplica-tion," In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20), pp. 1213–1223, 2020.
    [8] R. G. Downey and M. R. Fellows, Parameterized Complexity, Springer Publishing Company, Incorporated, 2012.
    [9] P. Eirew, A. Steif, J. Khattra, G. Ha, D. Yap, H. Farahani, K. Gelmon, S. Chia, C. Mar, A. Wan, E. Laks, J. Biele, K. Shumansky, J. Rosner, A. McPherson, C. Nielsen, A. J. L. Roth, C. Lefebvre, A. Bashashati, C. Souza, C. Siu, R. Aniba, J. Brimhall, A. Oloumi, T. Osako, A. Bruna, J. L. Sandoval, T. Algara, W. Greenwood, K. Leung, H. Cheng, H. Xue, Y. Wang, D. Lin, A. J. Mungall, R. Moore, Y. Zhao, J. Lorette, L. Nguyen, D. Huntsman, C. J. Eaves, C. Hansen, M. A. Marra, C. Caldas, S. P. Shah, and S. Aparicio, "Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution," Nature, vol. 518, no. 7539, pp. 422−426, 2015.
    [10] M. El-Kebir, L. Oesper, H. Acheson-Field, and B. J. Raphael, "Reconstruction of clonal trees and tumor composition from multi-sample sequencing data," Bioinformatics, vol. 31, no. 12, pp. i62−i70, 2015.
    [11] D. Fernández-Baca, "The Perfect Phylogeny Problem," in Steiner Trees in Industry, Springer US, pp. 203−234, 2001.
    [12] D. Fernández-Baca and J. Lagergren, "A Polynomial-Time Algorithm for Near-Perfect Phy-logeny," SIAM Journal on Computing, vol. 32, pp. 1115−1127, 2003.
    [13] F. L. Gall, "Powers of tensors and fast matrix multiplication," In Proceedings of the 39th In-ternational Symposium on Symbolic and Algebraic Computation (ISSAC '14), pp. 296303, 2014.
    [14] M. Gerlinger, S. Horswell, J. Larkin, A. J. Rowan, M. P. Salm, I. Varela, R. Fisher, N. McGranahan, N. Matthews, C. R. Santos, P. Martinez, B. Phillimore, S. Begum, A. Rab-inowitz, B. Spencer-Dene, S. Gulati, P. A. Bates, G. Stamp, L. Pickering, M. Gore, D. L. Nicol, S. Hazell, P. A. Futreal, A. Stewart, and C. Swanton, "Genomic architecture and evo-lution of clear cell renal cell carcinomas defined by multiregion sequencing," Nature Genetics, vol. 46, no. 3, pp. 225−233, 2014.
    [15] D. Gusfield, "Efficient algorithms for inferring evolutionary trees," Networks, vol. 21, no. 1, pp. 19−28, 1991.
    [16] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computa-tional Biology, Cambridge University Press, 1997.
    [17] I. Hajirasouliha, A. Mahmoody, and B. J. Raphael, "A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data," Bioinformatics, vol. 30, no. 12, pp. i78−i86, 2014.
    [18] I. Hajirasouliha and B. J. Raphael, "Reconstructing Mutational History in Multiply Sampled Tumors Using Perfect Phylogeny Mixtures," in Proceedings of the 14th International Work-shop on Algorithms in Bioinformatics, pp. 354−367, 2014.
    [19] A. Hujdurović, E. Husić, M. Milanič, R. Rizzi, and A. I. Tomescu. "Perfect Phylogenies via Branchings in Acyclic Digraphs and a Generalization of Dilworth’s Theorem," ACM Transac-tions on Algorithms, vol. 14, no. 2, Article 20, 26 pages, 2018.
    [20] A. Hujdurović, U. Kacar, M. Milanič, B. Ries, and A. I.Tomescu, "Complexity and Algo-rithms for Finding a Perfect Phylogeny from Mixed Tumor Samples," IEEE/ACM Transac-tions on Computational Biology and Bioinformatics, vol. 15, no. 1, pp. 96−108, 2018.
    [21] E. Husić, X. Li, A. Hujdurović, M. Mehine, R. Rizzi, V. Mäkinen, M. Milanič, and A. I. Tomescu, "MIPUP: minimum perfect unmixed phylogenies for multi-sampled tumors via branchings and ILP," Bioinformatics, vol. 35, no. 5, pp. 769−777, 2019.
    [22] W. Jiao, S. Vembu, A. G. Deshwar, L. Stein, and Q. Morris, "Inferring clonal evolution of tumors from single nucleotide somatic mutations," BMC Bioinformatics, vol. 15, suppl. 1, pp. 35, 2014.
    [23] S. Kannan and T. Warnow, "A Fast Algorithm for the Computation and Enumeration of Per-fect Phylogenies," SIAM Journal on Computing, vol. 26, no. 6, pp. 1749−1763, 1997.
    [24] S. Malikic, A. W. McPherson, N. Donmez, and C. S. Sahinalp, "Clonality inference in multi-ple tumor samples using phylogeny," Bioinformatics, vol. 31, no. 9, pp. 1349−1356, 2015.
    [25] M. Mehine, H. Heinonen, N. Sarvilinna, E. Pitkänen, N. Mäkinen, R. Katainen, S. Tuupanen, R. Bützow, J. Sjöberg, and L. A. Aaltonen, "Clonally related uterine leiomyomas are common and display branched tumor evolution," Human Molecular Genetics, vol. 24, no. 15, pp. 4407−4416, 2015.
    [26] W. Mohammed Ismail, E. Nzabarushimana, and H. Tang, "Algorithmic approaches to clonal reconstruction in heterogeneous cell populations," Quantitative Biology, vol. 7, no. 4, pp. 255−265, 2019.
    [27] D. E. Newburger, D. Kashef-Haghighi, Z. Weng, R. Salari, R. T. Sweeney, A. L. Brunner, S. X. Zhu, X. Guo, S. Varma, M. L. Troxell, R. B. West, S. Batzoglou, and A. Sidow, "Genome evolution during progression to breast cancer," Genome Research, vol. 23, no. 7, pp. 1097–1108, 2013.
    [28] S. Nik-Zainal, P. Van Loo, D. C. Wedge, L. B. Alexandrov, C. D. Greenman, K. W. Lau, K. Raine, D. Jones, J. Marshall, M. Ramakrishna, A. Shlien, S. L. Cooke, J. Hinton, A. Menzies, L. A. Stebbings, C. Leroy, M. Jia, R. Rance, L. J. Mudie, S. J. Gamble, P. J. Stephens, S. McLaren, P. S. Tarpey, E. Papaemmanuil, H. R. Davies, I. Varela, D. J. McBride, G. R. Big-nell, K. Leung, A. P. Butler, J. W. Teague, S. Martin, G. Jönsson, O. Mariani, S. Boyault, P. Miron, A. Fatima, A. Langerod, S. A. J. R. Aparicio, A. Tutt, A. M. Sieuwerts, Å. Borg, G. Thomas, A. V. Salomon, A. L. Richardson, A. L. Borresen-Dale, P. A. Futreal, M. R. Strat-ton, and P. J.Campbell, "The life history of 21 breast cancers," Cell, vol. 149, no. 5, pp. 994−1007, 2012.
    [29] I. Pe’er, R. Shamir, and R. Sharan, "Incomplete Directed Perfect Phylogeny," SIAM Journal on Computing, vol. 33, no. 3, pp. 590−607, 2000.
    [30] V. Popic, R. Salari, I. Hajirasouliha, D. Kashef-Haghighi, R. B. West, and S. Batzoglou, "Fast and scalable inference of multi-sample cancer lineages," Genome Biology, vol. 16, no. 1, pp. 91, 2015.
    [31] F. Strino, F. Parisi, M. Micsinai, and Y. Kluger, "TrAp: a tree approach for fingerprinting subclonal tumor composition," Nucleic Acids Research, vol. 41, no. 17, pp. e165, 2013.

    QR CODE