簡易檢索 / 詳目顯示

研究生: 林建欣
Lin, Chien-Hsin
論文名稱: 基因組樹問題之改進演算法
Improved Algorithms for the Gene Team Tree Problem
指導教授: 王炳豐
Wang, Biing-Feng
口試委員: 黃興燦
Huang, Shing-Tsaan
許聞廉
Hsu, Wen-Lian
蔡錫鈞
Tsai, Shi-Chun
譚建民
Tan, Jimmy J.M.
楊昌彪
Yang, Chang-Biau
謝孫源
Hsieh, Sun-Yuan
王家祥
Wang, Jia-Shung
韓永楷
Hon, Wing-Kai
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 88
中文關鍵詞: 演算法資料結構計算生物基因組基因組樹基因序列比對
外文關鍵詞: algorithms, data structures, bioinformatics, gene teams, gene team trees, comparative genomics
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 基因序列的比對是目前在計算生物領域非常重要的問題。許多文獻指出,如果有一組基因在多個基因序列上的距離相互靠的很近,那麼這一組基因往往具有功能上或歷史上的關聯性。若有一組基因在不同物種的染色體上兩兩相鄰的距離都不超過一個給定的臨介值 δ,便稱這組基因為一個「基因組」。「基因組樹」則是一種簡練的方法,可以用來表示出所有不同臨介值 δ 下可以產生出來的所有基因組。這一篇論文提出了一些新演算法能更有效率造出兩條染色體的基因組樹。對於這個問題,Zhang 和 Leong 曾經提出一個時間為O(n lg^2 n) 的演算法,其中 n 為基因的總數。這篇論文提出兩個改進演算法,時間分別為O(n lg n lglg n) 以及 O(n lg n α(n)),其中 α(n) 為 Ackermann's function 的反函數。和 Zhang 及 Leong 提出的演算法一樣,這兩個演算法都可以被推廣來造出 k > 2 條染色體的基因組樹,其時間都只會增加 k 倍。在實際的應用中,所有基因在染色體上的位置皆為整數。除了上述兩個改進的演算法外,在假設所有基因在染色體上的位置皆為整數的情況下,這篇論文另外提出了一個時間為 O(n lg n + z) 的演算法來造基因組樹,其中 z 為兩兩相鄰基因中最長的距離。在實際應用中,z 通常會小於 n,在這個狀況下,第三個演算法會比前兩個演算法更有效率。同樣的,這個演算法也可以被推廣來造出 k 條染色體的基因組樹,其所需的時間為 O(kn lg n + z)。


    Comparing multiple genome sequences is an important method to discover new biological insights. If a group of genes remain physically close to each other in multiple genomes, often called a conserved gene cluster, then the genes may be either historically or functionally related. A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold δ. A gene team tree is a succinct way to represent all gene teams for every possible value of δ. In this dissertation, new efficient algorithms are presented for the problem of constructing a gene team tree of two chromosomes. For this problem, Zhang and Leong had an O(n lg^2 n)-time algorithm, where n is the number of genes. In this dissertation, two improved algorithms are presented, which require, respectively, O(n lg n lglg n) and O(n lg n α(n)) time, where α(n) is the inverse of Ackermann's function. Similar to Zhang and Leong's gene-team-tree algorithm, the presented algorithms can be extended to k chromosomes with the time complexities increased only by a factor of k, where k > 2 is an integer. In practice, the distance between two genes is integer. In addition to the two improved algorithms, assuming that the positions of genes are integers, this dissertation presents an O(n lg n + z)-time algorithm, where z is the maximum distance between any two adjacent genes. For real-world applications, z is usually smaller than n and thus the third algorithm is more efficient than the other two algorithms. Similarly, it can be extended to k chromosomes. The extended algorithm requires O(kn lg n + z) time.

    Abstract i Acknowledgement iii Contents iv List of Figures v List of Tables vii List of Notations viii Chapter 1. Introduction 1 Chapter 2. Notation and Preliminaries 8 Chapter 3. An O(n lg n lglg n) Time Algorithm for the Gene Team Tree Problem 15 Chapter 4. A Two-Level Approach for the Gene Team Tree Problem 35 Chapter 5. A Multi-Level Approach for the Gene Team Tree Problem 52 Chapter 6. An Efficient Algorithm for the Gene Team Tree Problem with Integer Positions 71 Chapter 7. Conclusion and Future Work 81 References 85

    [1] W. Ackermann, "Zum Hilbertschen Aufbau der reellen Zahlen," Mathematische Annalen, vol. 99, pp. 118-133, 1928.
    [2] L. Allison, "Longest biased interval and longest non-negative sum interval," Bioinformatics, vol. 19, no. 10, pp. 1294-1295, 2003.
    [3] A. Amir, L. Gasieniec, and R. Shalom, "Improved approximate common interval," Information Processing Letters, vol. 103, no. 4, pp. 142-149, 2007.
    [4] M. P. Béal, A. Bergeron, S. Corteel, and M. Raffinot, "An algorithmic view of gene teams," Theoretical Computer Science, vol. 320, no. 2-3, pp. 395-418, 2004.
    [5] A. Bergeron, Y. Gingras, and C. Chauve, Formal models of gene clusters, in I. Mandoiu and A. Zelikovsky, editors, Bioinformatics Algorithms: Techniques and Applications, chapter 8, pp. 177-202, 2008, Wiley, New York.
    [6] A. Bergeron and J. Stoye, "On the similarity of sets of permutations and its applications to genome comparison," Journal of Computational Biology, vol. 13, no. 7, pp. 1340-1354, 2006.
    [7] G. Blin, D. Faye, and J. Stoye, "Finding nested common intervals efficiently," Journal of Computational Biology, vol. 17, no. 9, pp.1183-1194, 2010.
    [8] S. Böcker, K. Jahn, J. Mixtacki, and J. Stoye, "Computation of median gene clusters," Journal of Computational Biology, vol. 16, no. 8, pp. 1085-1099, 2009.
    [9] K.-Y. Chen and K.-M. Chao, "Optimal algorithms for locating the longest and shortest segments satisfying a sum or an average constraint," Information Processing Letters, vol. 96, no. 6, pp. 197-201, 2005.
    [10] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, MIT Press, 1st ed., 1990.
    [11] F. Coulon and M. Raffinot, ''Fast algorithms for identifying maximal common connected sets of interval graphs," Discrete Applied Mathematics, vol. 154, no. 12, pp. 1709-1721, 2006.
    [12] T. Dandekar, B. Snel, M. Huynen, and P. Bork, "Conservation of gene order: a fingerprint of proteins that physically interact," Trends in Biochemical Sciences, vol. 23, no. 9, pp. 324-328, 1998.
    [13] G. Didier, "Common intervals of two sequences," in Proceedings of the 3rd International Workshop on Algorithms in Bioinformatics (WABI 2003), LNCS 2812, pp. 17-24.
    [14] R. Eres, G. Landau, and L. Parida, "A combinatorial approach to automatic discovery of cluster-patterns," in Proceedings of the 3rd International Workshop on Algorithms in Bioinformatics (WABI 2003), LNCS 2812, pp. 139-150.
    [15] M. D. Ermolaeva, O. White, and S. L. Salzberg, "Prediction of operons in microbial genomes," Nucleic Acids Research, vol. 29, no. 5, pp. 1216-1221, 2001.
    [16] W. Feller, An Introduction to Probability Theory and Its Applications, vol. 2, John Wiley & Sons, 2008.
    [17] A.-T. Gai, M. Habib, C. Paul, and M. Raffinot, ''Identifying common connected components of graphs," Technical Report, RR-LIRMM-03016, France: LIRMM, Université de Montpellier 2, 2003.
    [18] M. H. Goldwasser, M.-Y. Kao, and H.-I. Lu, "Linear-time algorithms for computing maximum-density sequence segments with bioinformatics applications," Journal of Computer and System Sciences, vol. 70, no. 2, pp. 128-144, 2005.
    [19] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics, Addison-Wesley, 2nd ed., 1994.
    [20] J. Gramm, R. Niedermeier, and P. Rossmanith, "Fixed-parameter algorithms for closest string and related problems," Algorithmica, vol. 37, no. 1, pp. 25-42, 2003.
    [21] M. Habib, C. Paul, and M. Raffinot, ''Common connected components of interval graphs," Technical Report, RR-LIRMM-03014, France: LIRMM, Université de Montpellier 2, 2003.
    [22] X. He and M. H. Goldwasser, "Identifying conserved gene clusters in the presence of homology families," Journal of Computational Biology, vol. 12, no. 6, pp. 638-656, 2005.
    [23] S. Heber and J. Stoye, "Finding all common intervals of k permutations," in Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM 2001), LNCS 2089, pp. 207-218.
    [24] R. Hoberman and D. Durand, "The incompatible desiderata of gene cluster properties," in Proceedings of the 3rd Annual RECOMB Satellite Workshop on Comparative Genomics (RECOMB-CG 2005), LNCS 3678, pp. 73-87.
    [25] Y.-H. Hsieh, C.-C. Yu, and B.-F. Wang, "Optimal algorithms for the interval location problem with range constraints on length and average," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 2, pp. 281-290, 2008.
    [26] X. Huang, "An algorithm for identifying regions of a DNA sequence that satisfy a content requirement," Computer Applications in the Biosciences, vol. 10, no. 3, pp. 219-225, 1994.
    [27] J. JáJá, An Introduction to Parallel Algorithms, Addison-Wesley, 1992.
    [28] S. Kim, J.-H. Choi, A. Saple, and J. Yang, "A hybrid gene team model and its application to genome analysis," Journal of Bioinformatics and Computational Biology, vol. 4, no. 2, pp. 171-196, 2006.
    [29] R. Kolpakov and M. Raffinot, "New algorithms for text fingerprinting," Journal of Discrete Algorithms, vol. 6, no. 2, pp. 243-255, 2008.
    [30] W. C. Lathe III, B. Snel, and P. Bork, "Gene context conservation of a higher order than operons," Trends in Biochemical Sciences, vol. 25, no. 10, pp. 474-479, 2000.
    [31] J. Lawrence, "Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes," Current Opinion in Genetics & Development, vol. 9, no. 6, pp. 642-648, 1999.
    [32] Y.-L. Lin, X. Huang, T. Jiang, and K.-M. Chao, "MAVG: Locating non-overlapping maximum average segments in a given sequence," Bioinformatics, vol. 19, no. 1, pp. 151-152, 2003.
    [33] Y.-L. Lin, T. Jiang, and K.-M. Chao, "Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequences analysis," Journal of Computer and System Sciences, vol. 65, no. 3, pp. 570-586, 2002.
    [34] X. Ling, X. He, and D. Xin, "Detecting gene clusters under evolutionary constraint in a large number of genomes," Bioinformatics, vol. 25, no. 5, pp. 571-577, 2009.
    [35] X. Ling, X. He, D. Xin, and J. Han, "Efficiently identifying max-gap clusters in pairwise genome comparison," Journal of Computational Biology, vol. 15, no. 6, pp. 593-609, 2008.
    [36] J. S. Liu and C. E. Lawrence, "Unified Gibbs method for biological sequence analysis," in Proceedings of the American Statistical Association, Statistical Computing Section, ASA 1996, pp. 194-199.
    [37] N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot, "Gene teams: a new formalization of gene clusters for comparative genomics," Computational Biology and Chemistry, vol. 27, no. 1, pp. 59-67, 2003.
    [38] R. Overbeek, M. Fonstein, M. D’Souza, G. D. Pusch, and N. Maltsev, "The use of gene clusters to infer functional coupling," Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 6, pp. 2896-2901, 1999.
    [39] L. Parida, "Gapped permutation pattern discovery for gene order comparisons," Journal of Computational Biology, vol. 14, no. 1, pp. 45-55, 2007.
    [40] S. Rahmann and G. W. Klau, "Integer linear programs for discovering approximate gene clusters," in Proceedings of the 6th Workshop on Algorithms in Bioinformatics (WABI 2006), LNCS 4175, pp. 298-309.
    [41] T. Schmidt and J. Stoye, "Quadratic time algorithms for finding common intervals in two and more sequences," in Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM 2004), LNCS 3109, pp. 347-359.
    [42] B. Snel, P. Bork, and M. A. Huynen, "The identification of functional modules from the genomic association of genes," Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 9, pp. 5890-5895, 2002.
    [43] R. E. Tarjan, "Efficiency of a good but not linear set union algorithm," Journal of the ACM, vol. 22, no. 2, pp. 215-225, 1975.
    [44] T. Uno and M. Yagiura, "Fast algorithms to enumerate all common intervals of two permutations," Algorithmica, vol. 26, no. 2, pp. 290-309, 2000.
    [45] B.-F. Wang, C.-C. Kuo, S.-J. Liu, and C.-H. Lin, "A new efficient algorithm for the gene-team problem on general sequences," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 330-344, 2012.
    [46] B.-F. Wang and C.-H. Lin, "Improved algorithms for finding gene teams and constructing gene team trees," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1258-1272, 2011.
    [47] B.-F. Wang, S.-J. Liu, and C.-H. Lin, "Improved algorithms for the gene team problem," in Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications (COCOA 2009), LNCS 5573, pp. 61-72.
    [48] L. Wang and Y. Xu, "SEGID: Identifying interesting segments in (multiple) sequence alignments," Bioinformatics, vol. 19, no. 2, pp. 297-298, 2003.
    [49] M. Zhang and H. W. Leong, "Gene team tree: a hierarchical representation of gene teams for all gap lengths," Journal of Computational Biology, vol. 16, no. 10, pp. 1383-1398, 2009.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE