研究生: |
莊勝翔 JHUANG, SHENG-SIANG |
---|---|
論文名稱: |
根據居中配對模式解決scaffolding之研究 The Study of Solving Scaffolding Problem Based on Intermediate-matching Model |
指導教授: |
盧錦隆
Lu, Chin-Lung |
口試委員: |
林苕吟
Lin, Tiao-Yin 邱顯泰 Chiu, Hsien-Tai |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 76 |
中文關鍵詞: | 演算法 、基因體組裝 、居中配對模式 、整數線性規劃 、生物資訊 、次世代定序 |
外文關鍵詞: | algorithm, scaffolding problem, intermediate-matching model, integer linear programming, bioinformatics, next generation sequencing |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Reference-based scaffolding是要根據一個參考基因體 (reference genome) 來決定目標基因體 (target genome) 內contigs的次序與方向。它對於獲得一個物種更完整的genome是重要且有幫助的。在本論文中,我們利用intermediate-matching breakpoint distance的概念來定義一個IBD-based scaffolding problem,這個問題的目的是要去決定出一個target genome和一個reference genome的scaffolds,使得這兩個scaffolds之間的intermediate-matching breakpoint distance要最小。在此問題中,target genomes和reference genome被表示成一些可以重複的sequence markers。在本研究中,我們設計了一個integer linear programming的方法來解決 IBD-based scaffolding problem。最後,我們在模擬與真實數據的實驗結果都顯示出我們的IBD-based scaffolding演算法在有考慮duplicate markers時的準確度比它在沒有考慮duplicate markers時的準確度還來的好。除此之外,在這個研究我們用來測試的scaffolding演算法中,IBD-based scaffolding演算法有比較好的準確度表現,但是它比其他scaffolding的演算法還需要更多的執行時間來完成它的scaffolding。
Reference-based scaffolding is to determine the order and orientation of contigs in a target genome based on a reference genome. It is important and helpful to obtain a more complete genome sequence of a species. In this thesis, we utilize intermediate-matching breakpoint distance (IBD) to define an IBD-based scaffolding problem, which is to determine the scaffolds of a target genome and a reference genome such that the intermediate-matching breakpoint distance between the resulting scaffolds is minimized. In this problem, the target and reference genomes are represented in terms of sequence markers that can be duplicate. In this study, we design an integer linear programming approach to solve the IBD-based scaffolding problem. Finally, our experimental results on simulated and real datasets have shown that the accuracy performance of our IBD-based scaffolding algorithm with considering duplicate markers is better than that of our IBD-based scaffolding algorithm without considering duplicate markers. In addition, our IBD-based scaffolding algorithm has better accuracy performance among the scaffolding algorithms we tested in this study, but it requires more running time to finish its scaffolding than the others.
[1] Assefa, S., Keane, T. M., Otto, T. D., Newbold, C., & Berriman, M. (2009). ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics (Oxford, England), 25(15), 1968–1969. doi:10.1093/bioinformatics/btp347
[2] Galardini, M., Biondi, E. G., Bazzicalupo, M., & Mengoni, A. (2011). CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. Source code for biology and medicine, 6, 11. doi:10.1186/1751-0473-6-11
[3] Husemann, P., & Stoye, J. (2010). r2cat: synteny plots and comparative assembly. Bioinformatics (Oxford, England), 26(4), 570–571. doi:10.1093/bioinformatics/btp690
[4] Daniel C. Richter, Stephan C. Schuster, Daniel H. Huson. (2007) OSLay: optimal syntenic layout of unfinished assemblies, Bioinformatics, 23(13),1573–1579, https://doi.org/10.1093/bioinformatics/btm153
[5] Rissman, A. I., Mau, B., Biehl, B. S., Darling, A. E., Glasner, J. D., & Perna, N. T. (2009). Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics (Oxford, England), 25(16), 2071–2073. doi:10.1093/bioinformatics/btp356
[6] van Hijum, S. A., Zomer, A. L., Kuipers, O. P., & Kok, J. (2005). Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic acids research, 33(Web Server issue), W560–W566. doi:10.1093/nar/gki356
[7] Muñoz, A., Zheng, C., Zhu, Q., Albert, V. A., Rounsley, S., & Sankoff, D. (2010). Scaffold filling, contig fusion and comparative gene order inference. BMC bioinformatics, 11, 304. doi:10.1186/1471-2105-11-304
[8] Dias, Z., Dias, U., & Setubal, J. C. (2012). SIS: a program to generate draft genome sequence scaffolds for prokaryotes. BMC bioinformatics, 13, 96. doi:10.1186/1471-2105-13-96
[9] Lu, C. L., Chen, K. T., Huang, S. Y., & Chiu, H. T. (2014). CAR: contig assembly of prokaryotic draft genomes using rearrangements. BMC bioinformatics, 15(1), 381. doi:10.1186/s12859-014-0381-3
[10] Chen K. T., Liu C. L., Huang S. H., Shen H. T., Shieh Y. K., Chiu H. T., Lu C. L. (2018) CSAR: a contig scaffolding tool using algebraic rearrangements, Bioinformatics, 34(1), 109–111, https://doi.org/10.1093/bioinformatics/btx543
[11] A Bailey, Jeffrey & E Eichler, Evan. (2006). Primate segmental duplications: Crucibles of evolution, diversity and disease. Nature reviews. Genetics. 7. 552-64. 10.1038/nrg1895.
[12] M. Lynch. (2007) The Origins of Genome Architecture. Sinauer, Sunderland, MA.
[13] D. Sankoff. (1999). Genome rearrangement with gene families. Bioinformatics 15(11), 909-917.
[14] G. Blin, C. Chauve, G. Fertin. (2004) The breakpoint distance for signed sequences. In: Proceedings of the 1st Conference on Algorithms and Computational Methods for Biochemical and Evolutionary Networks (CompBioNets), vol. 3, pp. 3–16
[15] S. Angibaud, G. Fertin, I. Rusu, A. Thévenin, S. Vialette. (2007) A pseudo-boolean programming approach for computing the breakpoint distance between two genomes with duplicate genes. In: Tesler, G., Durand, D. (eds.) RECMOB-CG 2007. LNCS (LNBI), vol. 4751, pp. 16–29. Springer, Heidelberg
[16] M. Shao, B.M.E. Moret. (2015) A fast and exact algorithm for the exemplar breakpoint distance. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 309–322. Springer, Heidelberg
[17] M. Shao, B.M.E. Moret. (2016) On Computing Breakpoint Distances for Genomes with Duplicate Genes. In: Singh M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science, vol 9649. Springer, Cham