研究生: |
謝詠翔 Yong-Hsiang Hsieh |
---|---|
論文名稱: |
限定長度與平均範圍之區間找尋問題的最佳演算法 Optimal Algorithms for the Interval Location Problem with Range Constraints on Length and Average |
指導教授: |
王炳豐
Biing-Feng Wang |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 60 |
中文關鍵詞: | 演算法 、資料結構 、演算法分析 、即時演算法 |
外文關鍵詞: | algorithms, data structures, analysis of algorithms, on-line algorithms |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
令 A 為一長度 n 的實數數列, L1 和 L2 為兩個整數且 L1 <= L2, R1 和R2 為兩個實數且 R1 <= R2。一段 A 的區間若其長度介於 L1 和 L2 之間且平均介於 R1 和 R2 之間,則該區間為「可行區間」。在本篇論文中,我們探討以下的問題: 找出 A 中全部的可行區間,計算 A 中全部可行區間的個數,找出 A 中的一組個數最多的不相交可行區間,找尋 A 中一個最長的可行區間,以及找尋 A 中的一個最短的可行區間。探討這些問題的動機是出自於找尋 DNA 序列上的 CpG 島 (CpG islands)。在本篇論文中,我們證明所有提出的問題都有 □(nlog n) 的時間下界 (lower bound),此外,我們也利用幾何方法為所有提出的問題設計出最佳演算法。本篇論文中所有提出的演算法皆為即時演算法 (on-line algorithms),並都使用 O(n) 的空間。
Let A be a sequence of n real numbers, L1 and L2 be two integers such that L1 <= L2 , and R1 and R2 be two real numbers such that R1 <= R2. An interval of A is feasible if its length is between L1 and L2 and its average is between R1 and R2. In this dissertation, we study the following problems: finding all feasible intervals of A, counting all feasible intervals of A, finding a maximum cardinality set of non-overlapping feasible intervals of A, locating a longest feasible interval of A, and locating a shortest feasible interval of A. The problems are motivated from the problem of locating CpG islands of a DNA sequence. Locating CpG islands is important for gene finding as well as for cancer research. In this dissertation, we firstly show that all the problems have an Ω(n log n)-time lower bound in the comparison model. Then, we use geometric approaches to design optimal algorithms for the problems. All the presented algorithms run in an on-line manner and use O(n) space.
References
[1] A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974.
[2] L. Allison, "Longest biased interval and longest non-negative sum interval," Bioinformatics, vol. 19, no. 10, pp. 1294-1295, 2003.
[3] F. Antequera, "Structure, function and evolution of CpG island promoters," Cellular and Molecular Life Sciences, vol. 60, no. 8, pp. 1647-1658, 2003.
[4] F. Antequera and A. Bird, "Number of CpG islands and genes in human and mouse," in Proceedings of the National Academy of Sciences of the United States of America, vol. 90, no. 24, pp. 11995-11999, 1993.
[5] R. Bayer, "Symmetric binary B-trees: data structure and maintenance algorithms," Acta Informatica, vol. 1, pp. 290-306, 1972.
[6] R. Bayer and E. M. McCreight, "Organization and maintenance of large ordered indexes," Acta Informatica, vol. 1, pp. 173-189, 1972.
[7] B. Chazelle, "A functional approach to data structures and its use in multidimensional searching," SIAM Journal on Computing, vol. 17, no. 3, pp. 427-462, 1988.
[8] K.-Y. Chen and K.-M. Chao, "Optimal algorithms for locating the longest and shortest segments satisfying a sum or an average constraint," Information Processing Letters, vol. 96, no. 6, pp. 197-201, 2005.
[9] K.-M. Chung and H.-I. Lu, "An optimal algorithm for the maximum-density segment problem," SIAM Journal on Computing, vol. 34, no. 2, pp. 373-387, 2005.
[10] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, McGraw-Hill, 2nd ed., 2001.
[11] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.
[12] M. Esteller, "CpG island hypermethylation and tumor suppressor genes: a booming present, a brighter future," Oncogene, vol. 21, no. 35, pp. 5427-5440, 2002.
[13] G. N. Frederickson and S. Rodger, "A new approach to the dynamic maintenance of maximal points in a plane," Discrete and Computational Geometry, vol. 5, no. 4, pp. 365-374, 1990.
[14] M. H. Goldwasser, M.-Y. Kao, and H.-I. Lu, "Linear-time algorithms for computing maximum-density sequence segments with bioinformatics applications," Journal of Computer and System Sciences, vol. 70, no. 2, pp. 128-144, 2005.
[15] L. J. Guibas and R. Sedgewick, "A dichromatic framework for balanced trees," in Proceedings of the 19th Annual Symposium on Foundations of Computer Science, pp. 8-21, 1978.
[16] S.-Y. Hsieh and T.-Y. Chou, "Finding a weight-constrained maximum-density subtree in a tree," in Proceedings of the 16th Annual International Symposium on Algorithms and Computation, pp. 944-953, 2005.
[17] X. Huang, "An algorithm for identifying regions of a DNA sequence that satisfy a content requirement," Computer Applications in the Biosciences, vol. 10, no. 3, pp. 219-225, 1994.
[18] I. P. Ioshikhes and M. Q. Zhang, "Large-scale human promoter mapping using CpG islands," Nature Genetics, vol. 26, pp. 61-63, 2000.
[19] R. Janardan, "On the dynamic maintenance of maximal points in the plane," Information Processing Letters, vol. 40, no. 2, pp. 59-64, 1991.
[20] S. Kapoor, "Dynamic maintenance of maxima of 2-d point sets," SIAM Journal on Computing, vol. 29, no. 6, pp. 1858-1877, 2000.
[21] S. K. Kim, "Finding a longest nonnegative path in a constant degree tree," Information Processing Letters, vol. 93, no. 6, pp. 275-279, 2005.
[22] S. K. Kim, "Linear-time algorithm for finding a maximum-density segment of a sequence," Information Processing Letters, vol. 86, no. 6, pp. 339-342, 2003.
[23] Y.-L. Lin, X. Huang, T. Jiang, and K.-M. Chao, "MAVG: Locating non-overlapping maximum average segments in a given sequence," Bioinformatics, vol. 19, no. 1, pp. 151-152, 2003.
[24] Y.-L. Lin, T. Jiang, and K.-M. Chao, "Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequences analysis," Journal of Computer and System Sciences, vol. 65, no. 3, pp. 570-586, 2002.
[25] R.-R. Lin, W.-H. Kuo, and K.-M. Chao, "Finding a length-constrained maximum-density path in a tree," Journal of Combinatorial Optimization, vol. 9, no. 2, pp. 147-156, 2005.
[26] E. M. McCreight, "Priority search trees," SIAM Journal on Computing, vol. 14, no. 2, pp. 257-276, 1985.
[27] A. Nekrutenko and W.-H. Li, "Assessment of compositional heterogeneity within and between eukaryotic genomes," Genome Research, vol. 10, no. 12, pp. 1986-1995, 2000.
[28] M. H. Overmars and J. van Leeuwen, "Maintenance of configurations in the plane," Journal of Computer and System Sciences, vol. 23, no. 2, pp. 166-204, 1981.
[29] F. P. Preparata and M. I. Shamos, Computational Geometry, Springer-Verlag, New York, 1985.
[30] L. Scotto and R. K. Assoian, "A GC-rich domain with bifunctional effects on mRNA and protein levels: implications for control of transforming growth factor beta 1 expression," Molecular and Cellular Biology, vol. 13, no. 6, pp. 3588-3597, 1993.
[31] L. Wang and Y. Xu, "SEGID: Identifying interesting segments in (multiple) sequence alignments," Bioinformatics, vol. 19, no. 2, pp. 297-298, 2003.
[32] B.-Y. Wu, K.-M. Chao, and C.-Y. Tang, "An efficient algorithm for the length-constrained heaviest path problem on a tree," Information Processing Letters, vol. 69, no. 2, pp. 63-67, 1999.