簡易檢索 / 詳目顯示

研究生: 謝詠翔
Yong-Hsiang Hsieh
論文名稱: 限定長度與平均範圍之區間找尋問題的最佳演算法
Optimal Algorithms for the Interval Location Problem with Range Constraints on Length and Average
指導教授: 王炳豐
Biing-Feng Wang
口試委員:
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 60
中文關鍵詞: 演算法資料結構演算法分析即時演算法
外文關鍵詞: algorithms, data structures, analysis of algorithms, on-line algorithms
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 令 A 為一長度 n 的實數數列, L1 和 L2 為兩個整數且 L1 <= L2, R1 和R2 為兩個實數且 R1 <= R2。一段 A 的區間若其長度介於 L1 和 L2 之間且平均介於 R1 和 R2 之間,則該區間為「可行區間」。在本篇論文中,我們探討以下的問題: 找出 A 中全部的可行區間,計算 A 中全部可行區間的個數,找出 A 中的一組個數最多的不相交可行區間,找尋 A 中一個最長的可行區間,以及找尋 A 中的一個最短的可行區間。探討這些問題的動機是出自於找尋 DNA 序列上的 CpG 島 (CpG islands)。在本篇論文中,我們證明所有提出的問題都有 □(nlog n) 的時間下界 (lower bound),此外,我們也利用幾何方法為所有提出的問題設計出最佳演算法。本篇論文中所有提出的演算法皆為即時演算法 (on-line algorithms),並都使用 O(n) 的空間。


    Let A be a sequence of n real numbers, L1 and L2 be two integers such that L1 <= L2 , and R1 and R2 be two real numbers such that R1 <= R2. An interval of A is feasible if its length is between L1 and L2 and its average is between R1 and R2. In this dissertation, we study the following problems: finding all feasible intervals of A, counting all feasible intervals of A, finding a maximum cardinality set of non-overlapping feasible intervals of A, locating a longest feasible interval of A, and locating a shortest feasible interval of A. The problems are motivated from the problem of locating CpG islands of a DNA sequence. Locating CpG islands is important for gene finding as well as for cancer research. In this dissertation, we firstly show that all the problems have an Ω(n log n)-time lower bound in the comparison model. Then, we use geometric approaches to design optimal algorithms for the problems. All the presented algorithms run in an on-line manner and use O(n) space.

    Content Abstract i Acknowledgement iii Content iv List of Figures v List of Tables vi Chapter 1 Introduction 1 1.1 Related Work 4 1.2 Summary of Results 8 1.3 Dissertation Organization 8 Chapter 2 Notation, Definitions, and Lower Bounds 9 2.1 Notation and Definitions 9 2.2 Lower Bounds 11 Chapter 3 Algorithms for Problems 1, 2, and 3 15 3.1 Finding All Feasible Intervals 15 3.2 Counting All Feasible Intervals 19 3.3 Finding a Maximum Cardinality Set of Non-overlapping Feasible Intervals 22 Chapter 4 Locating a Longest Feasible Interval 24 Chapter 5 Locating a Shortest Feasible Interval 37 Chapter 6 Conclusion and Future Work 46 References 48

    References

    [1] A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974.

    [2] L. Allison, "Longest biased interval and longest non-negative sum interval," Bioinformatics, vol. 19, no. 10, pp. 1294-1295, 2003.

    [3] F. Antequera, "Structure, function and evolution of CpG island promoters," Cellular and Molecular Life Sciences, vol. 60, no. 8, pp. 1647-1658, 2003.

    [4] F. Antequera and A. Bird, "Number of CpG islands and genes in human and mouse," in Proceedings of the National Academy of Sciences of the United States of America, vol. 90, no. 24, pp. 11995-11999, 1993.

    [5] R. Bayer, "Symmetric binary B-trees: data structure and maintenance algorithms," Acta Informatica, vol. 1, pp. 290-306, 1972.

    [6] R. Bayer and E. M. McCreight, "Organization and maintenance of large ordered indexes," Acta Informatica, vol. 1, pp. 173-189, 1972.

    [7] B. Chazelle, "A functional approach to data structures and its use in multidimensional searching," SIAM Journal on Computing, vol. 17, no. 3, pp. 427-462, 1988.

    [8] K.-Y. Chen and K.-M. Chao, "Optimal algorithms for locating the longest and shortest segments satisfying a sum or an average constraint," Information Processing Letters, vol. 96, no. 6, pp. 197-201, 2005.

    [9] K.-M. Chung and H.-I. Lu, "An optimal algorithm for the maximum-density segment problem," SIAM Journal on Computing, vol. 34, no. 2, pp. 373-387, 2005.

    [10] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, McGraw-Hill, 2nd ed., 2001.

    [11] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.

    [12] M. Esteller, "CpG island hypermethylation and tumor suppressor genes: a booming present, a brighter future," Oncogene, vol. 21, no. 35, pp. 5427-5440, 2002.

    [13] G. N. Frederickson and S. Rodger, "A new approach to the dynamic maintenance of maximal points in a plane," Discrete and Computational Geometry, vol. 5, no. 4, pp. 365-374, 1990.

    [14] M. H. Goldwasser, M.-Y. Kao, and H.-I. Lu, "Linear-time algorithms for computing maximum-density sequence segments with bioinformatics applications," Journal of Computer and System Sciences, vol. 70, no. 2, pp. 128-144, 2005.

    [15] L. J. Guibas and R. Sedgewick, "A dichromatic framework for balanced trees," in Proceedings of the 19th Annual Symposium on Foundations of Computer Science, pp. 8-21, 1978.

    [16] S.-Y. Hsieh and T.-Y. Chou, "Finding a weight-constrained maximum-density subtree in a tree," in Proceedings of the 16th Annual International Symposium on Algorithms and Computation, pp. 944-953, 2005.

    [17] X. Huang, "An algorithm for identifying regions of a DNA sequence that satisfy a content requirement," Computer Applications in the Biosciences, vol. 10, no. 3, pp. 219-225, 1994.

    [18] I. P. Ioshikhes and M. Q. Zhang, "Large-scale human promoter mapping using CpG islands," Nature Genetics, vol. 26, pp. 61-63, 2000.

    [19] R. Janardan, "On the dynamic maintenance of maximal points in the plane," Information Processing Letters, vol. 40, no. 2, pp. 59-64, 1991.

    [20] S. Kapoor, "Dynamic maintenance of maxima of 2-d point sets," SIAM Journal on Computing, vol. 29, no. 6, pp. 1858-1877, 2000.

    [21] S. K. Kim, "Finding a longest nonnegative path in a constant degree tree," Information Processing Letters, vol. 93, no. 6, pp. 275-279, 2005.

    [22] S. K. Kim, "Linear-time algorithm for finding a maximum-density segment of a sequence," Information Processing Letters, vol. 86, no. 6, pp. 339-342, 2003.

    [23] Y.-L. Lin, X. Huang, T. Jiang, and K.-M. Chao, "MAVG: Locating non-overlapping maximum average segments in a given sequence," Bioinformatics, vol. 19, no. 1, pp. 151-152, 2003.

    [24] Y.-L. Lin, T. Jiang, and K.-M. Chao, "Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequences analysis," Journal of Computer and System Sciences, vol. 65, no. 3, pp. 570-586, 2002.

    [25] R.-R. Lin, W.-H. Kuo, and K.-M. Chao, "Finding a length-constrained maximum-density path in a tree," Journal of Combinatorial Optimization, vol. 9, no. 2, pp. 147-156, 2005.

    [26] E. M. McCreight, "Priority search trees," SIAM Journal on Computing, vol. 14, no. 2, pp. 257-276, 1985.

    [27] A. Nekrutenko and W.-H. Li, "Assessment of compositional heterogeneity within and between eukaryotic genomes," Genome Research, vol. 10, no. 12, pp. 1986-1995, 2000.

    [28] M. H. Overmars and J. van Leeuwen, "Maintenance of configurations in the plane," Journal of Computer and System Sciences, vol. 23, no. 2, pp. 166-204, 1981.

    [29] F. P. Preparata and M. I. Shamos, Computational Geometry, Springer-Verlag, New York, 1985.

    [30] L. Scotto and R. K. Assoian, "A GC-rich domain with bifunctional effects on mRNA and protein levels: implications for control of transforming growth factor beta 1 expression," Molecular and Cellular Biology, vol. 13, no. 6, pp. 3588-3597, 1993.

    [31] L. Wang and Y. Xu, "SEGID: Identifying interesting segments in (multiple) sequence alignments," Bioinformatics, vol. 19, no. 2, pp. 297-298, 2003.

    [32] B.-Y. Wu, K.-M. Chao, and C.-Y. Tang, "An efficient algorithm for the length-constrained heaviest path problem on a tree," Information Processing Letters, vol. 69, no. 2, pp. 63-67, 1999.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE