研究生: |
翁鈺閎 Weng, Yu-Hung |
---|---|
論文名稱: |
部分酶切問題之研究 A Study On the Partial Digest Problem |
指導教授: |
王炳豐
Wang, Biing-Feng |
口試委員: |
王家祥
Wang, Jia-Shung 黃耀廷 Huang, Yao-Ting |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 49 |
中文關鍵詞: | 部分酶切問題 、演算法 、去氧核醣核酸 、限制位點分析 |
外文關鍵詞: | Partial Digest Problem, Algorithms, DNA, Restriction site analysis |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
這篇論文主要探討「部分酶切問題 (partial digest problem)」,此問題是用來解決一個著名「限制位點分析 (restriction sites analysis)」問題的數學模型。目前此問題尚未被證明是否為 NP-complete 也沒有人提出一個可以在多項式時間內解決的演算法。Skiena et al. 提出一個時間複雜度為 O(t n lg n) 的回溯演算法,其中 n 為限制位點的個數,t >= n為整個回溯過程中所產生的樹節點個數。在實際的實驗中顯示,他們的演算法通常只需要 O(n^2 lg n) 的時間,但是在分析上最差的情況下會需要 O(n lg n 2^n) 的時間。Zhang 提出一個 Skiena et al. 的演算法執行時間為指數 (exponential) 的例子。Abbas and Bahig 提出一個可以避免重複展開相同子樹的方法來加速 Skiena et al. 的演算法,其時間複雜度為 O(t' n^2),其中t' <= t。
這篇論文提出 4 個演算法,分別稱為 Algorithm 1、2、3 和 4。Algorithm 1 將 Skiena et al. 的最差時間複雜度由 O(n lg n 2^n) 改進成 O(n 2^n)。實驗證實此演算法在 Zhang 的例子上會得到很好的加速。和 Skiena et al. 的演算法相同,Algorithm 2 的時間也是 expected O(n^2 lg n),但是我們透過實驗證實 Algorithm 2 的係數比較小,少於 1/2。Algorithm 3 是 Abbas and Bahig 演算法的改進版,時間由 O(t' n^2) 改進為 O(t' n lg n);除此之外,也將空間由 O(n^2 2^(n/2)) 大幅改進為 O(n^2)。Algorithm 4 結合了 Algorithm 1 和 3,在 Zhang 的例子上可以將Algorithm 1 加速 2^(2n/5) 倍並且將Algorithm 3 加速 lg n 的。這篇論文最後提出一個簡單的技巧稱為 longest-first checking,可以稍微加速所有由 Skiena 想法為基底的演算法,包含以上提及的所有演算法。論文中同時提供理論分析並透過實驗來驗證這個技巧。
The focus of this thesis is the partial digest problem (PDP), which is the mathematical model of a well-known method for restriction sites analysis. So far neither a proof of NP-completeness nor a polynomial time algorithm is known for PDP. A famous solution for PDP is a backtracking algorithm given by Skiena et al., which requires O(t n lg n) time, where n is the number of restriction sites and t >= n is the number of visited nodes of the solution tree. Skiena et al.'s algorithm requires O(n^2 lg n) time in practice, but may require O(n lg n 2^n) time in the worst case. Zhang had an example for which this algorithm takes exponential time. Abbas and Bahig speeded up Skiena et al.'s algorithm by avoiding exploring identical subtrees, which results in an O(t' n^2)-time algorithm, where t' <= t.
This thesis proposes four algorithms for PDP, denoted by Algorithms 1, 2, 3, and 4, respectively. Algorithm 1 improves Skiena et al.'s upper bound from O(n lg n 2^n) to O(n 2^n). Experimental results showed that our algorithm is more efficient on Zhang's example. As compared with Skiena et al.'s algorithm, Algorithm 2 has the same expected time O(n^2 lg n). However, Algorithm 2 has a smaller constant factor. Empirically, Algorithm 2 is at least two times faster. Algorithm 3 is an improved version of Abbas and Bahig's algorithm. The time complexity is reduced from O(t' n^2) to O(t' n lg n); in addition, the worst case space is significantly reduced from O(n^2 2^(n/2)) to O(n^2). Algorithm 4 is a combination of Algorithms 1 and 3. For Zhang's example, Algorithm 4 improves Algorithms 1 and 3, respectively, by a factor of 2^(2n/5) and lg n. This thesis also presents a simple and interesting technique, called the longest-first checking, which can slightly speed up all algorithms designed based upon Skiena et al.'s idea, including all the algorithms mentioned above. A theoretical analysis that supports this technique is given. In addition, its effectiveness is demonstrated by experiments.
1. M. M. Abbas, H. M. Bahig, "A fast exact algorithm for the partial digest problem," BMC Bioinformatics, vol. 17, suppl. 19, pp. 139–148, 2016.
2. Z. Abrams, H. L. Chen, "The simplified partial digest problem: hardness and a probabilistic analysis," in Proceedings of RECOMB Satellite Meeting on DNA Sequencing Technologies and Computation, 2004.
3. H. Ahrabian, M. Ganjtabesh, A. Nowzari-Dalini, Z. Razaghi-Moghadam-Kashani, "Genetic algorithm solution for partial digest problem," International Journal of Bioinformatics Research and Applications, vol. 9, no. 6, pp. 584–594, 2013.
4. H. M. Bahig, M. M. Abbas, M.M. Mohie-Eldin, "Parallelizing partial digest problem on multicore system," Lecture Notes in Computer Science, vol. 10209, pp. 95–104, 2017.
5. H. M. Bahig, M. M. Abbas, "Speeding up the partial digest algorithm," Journal of Informatics and Mathematical Sciences, vol. 10, no. 1–2, pp. 217–225, 2018.
6. J. Blazewicz, E. K. Burke, M. Kasprzak, A. Kovalev, M. Y. Kovalyov, "The simplified partial digest problem: enumerative and dynamic programming algorithms," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 668–680, 2007.
7. J. Blazewicz, E. K. Burke, M. Kasprzak, A. Kovalev, M. Y. Kovalyov, "On the approximability of the simplified partial digest problem," Discrete Applied Mathematics, vol. 157, no. 17, pp. 3586-3592, 2009.
8. J. Blazewicz, E. K. Burke, M. Kasprzak, A. Kovalev, M. Y. Kovalyov, "The simplified partial digest problem approximation and a graph theoretic model," European Journal of Operational Research, vol. 208, no. 2, pp. 142–152, 2011.
9. J. Blazewicz, E. Burke, M. Jaroszewski, M. Kasprzak, B. Paliswiat, P. Pryputniewicz, "On the complexity of the double digest problem," Control Cybern, vol. 33, no. 1, pp. 133–140, 2004.
10. J. Blazewicz, P. Formanowicz, M. Kasprzak, M. Jaroszewski, W. T. Markiewicz, "Construction of DNA restriction maps based on a simplified experiment," Bioinformatics, vol. 17, no. 5, pp. 398–404, 2001.
11. J. Blazewicz, M. Jaroszewski, "New algorithm for the simplified partial digest problem," In Proceedings of International Workshop on Algorithms in Bioinformatics(WABI), 2003, pp. 95-110.
12. J. Blazewicz, M. Kasprzak, "Combinatorial optimization in DNA mapping - a computational thread of the simplified partial digest problem," RAIRO Operations Research, vol. 39, no. 4, pp. 227–241, 2005.
13. T. Chen, M. Y. Kao, M. Tepel, J. Rush, G. M. Church, "A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry," Journal of Computational Biology, vol. 8, no. 3, pp. 325–337, 2001.
14. M. Cieliebak, S. Eidenbenz, P. Penna, "Noisy data make the partial digest problem NP-hard," in Proceedings of International Workshop on Algorithms in Bioinformatics (WABI), 2003, pp. 111–123.
15. M. Cieliebak and S. Eidenbenz, "Measurement errors make the partial digest problem NP-hard," in Proceedings of Latin American Symposium on Theoretical Informatics(LATIN), 2004, pp. 379–390.
16. T. Dakic, "On the turnpike problem," PhD dissertation, Department of Computer Science, Simon Fraser University, 2000.
17. W. M. Fitch, T. F. Smith, W. W. Ralph, "Mapping the order of DNA restriction fragments," Gene, vol. 22, no. 1, pp. 19–29, 1983.
18. E. Fomin, "A simple approach to the reconstruction of a set of points from the multiset of n2 pairwise distances in n2 steps for the sequencing problem: I. theory," Journal of Computational Biology, vol. 23, no. 9, pp. 769–775, 2016.
19. E. Fomin, "A simple approach to the reconstruction of a set of points from the multiset of n2 pairwise distances in n2 steps for the sequencing problem: II. algorithm," Journal of Computational Biology, vol. 23, no. 12, pp. 934–942, 2016.
20. E. Fomin, "A simple approach to the reconstruction of a set of points from the multiset of n2 pairwise distances in n2 steps for the sequencing problem: III. noise inputs for the beltway case," Journal of Computational Biology, vol. 26, no. 1, pp. 68-75, 2018.
21. H. L. Fu, Y. T. Hsiao, "Mid-labeled partial digest problem," Master thesis, Department of Applied Mathematics, National Chiao Tung University, Hsinchu, Taiwan, 2013.
22. M. Ganjtabesh, H. Ahrabian, A. Nowzari-Dalini, Z. R. K. Moghadam, "Genetic algorithm solution for double digest problem," Bioinformation, vol. 8, no. 10, pp. 453–456, 2012.
23. L. Goldstein, M. S. Waterman, "Mapping DNA by stochastic relaxation," Advances in Applied Mathematics, vol. 8, no. 2, pp. 194–207, 1987.
24. S. T. Ho, L. Allison, C. N. Yee, "Restriction site mapping for three or more enzymes," Computer Applications in the Biosciences, vol. 6, no. 3, pp. 195–204, 1990.
25. R. M. Karp, L. A. Newberg, "An algorithm for analyzing probed partial digestion," Computer applications in the Biosciences, vol. 11, no. 3, pp. 229–235, 1995.
26. M. Krawczak, "Algorithms for the restriction-site mapping of DNA molecules," Proceedings of the National Academy of Sciences of the United States of America, vol. 85, no. 19, pp. 7298–7301, 1988.
27. P. Lemke, S. S. Skiena, W. D. Smith, "Reconstructing sets from interpoint distances," in Discrete and Computational Geometry, pp. 597–631, 2003.
28. L. Lovasz, A. Schrijver, "Cones of matrices and set-functions and 0-1 optimization," SIAM Journal on Optimization, vol. 1, no.2, pp. 166–190, 1990.
29. R. Nadimi, H. S. Fathabadi, M. Ganjtabesh, "A fast algorithm for the partial digest problem," Japan Journal of Industrial and Applied Mathematics, vol. 28, no. 2, pp. 315–325, 2011.
30. D. Naor, "A note on the number of distinct solutions to the probed partial digestion reconstruction problem," Technical Report CSE-90-40, Division of Computer Science, University of California, Davis, 1990.
31. L. A. Newberg, D. Naor, "A lower bound on the number of solutions to the probed partial digest problem," Advances in Applied Mathematics, vol. 14, no. 2, pp. 172–183, 1993.
32. G. Pandurangan, H. Ramesh, "Restriction mapping problem revisited," Journal of Computer and System Sciences, vol. 65, no. 3, pp. 526–544, 2002.
33. P. A. Pevzner, M. S. Waterman, "Open combinatorial problems in computational molecular biology," in Proceedings of the Third Israel Symposium on Theory of Computing and Systems, 1995, pp. 158–173.
34. J. Rosenblatt, P. D. Seymour, "The structure of homometric sets," SIAM. Journal on Algebraic and Discrete Methods, vol. 3, no. 3, pp. 343-350, 1981.
35. M. I. Shamos, "Problems in computational geometry," unpublished manuscript, Carnegie Mellon University, 1977.
36. S. S. Skiena, W. D. Smith, P. Lemke, "Reconstructing sets from interpoint distances," in Proceedings of Sixth ACM Symposium on Computational Geometry, 1990, pp. 332–339.
37. S. S. Skiena, G. Sundaram, "A partial digest approach to restriction site mapping," Bulletin of Mathematical Biology, vol. 56, no. 2, pp. 275–294, 1994. (also appear in Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, 1993, pp. 362–370.)
38. C. L. Smith, J. R. Econome, A. Schutt, S. Klco, C. R. Cantor, "A physical map of the Escherichia coli K12 genome," Science, vol. 236, no. 4807, pp. 1148–1453, 1987.
39. J. Wang, "Average-case intractable NP problems," in Advances in Algorithms, Languages, and Complexity, pp. 317–318, 1997.
40. M. A. Weiss, "Data structure and algorithm analysis," Benjamin/Cummings Publishing Company, Inc., Redwood City, California, 1992.
41. L. W. Wright, J. B. Lichter, J. Reinitz, M. A. Shifman, K. K. Kidd, P. L. Miller, "Computer-assisted restriction mapping: an integrated approach to handling experimental uncertainty," Bioinformatics, vol. 10, no. 4, pp. 435-442, 1994.
42. Z. Zhang, "An exponential example for a partial digest mapping algorithm," Journal of -Computational Biology, vol. 1, no. 3, pp. 235–239, 1994.