研究生: |
劉尚儒 Liu, Shang-Ju |
---|---|
論文名稱: |
Improved Algorithms for Identifying Gene Teams of Genomes and Common Connected Components of Interval Graphs 辨識基因組中基因隊以及區間圖上共有連通元件的改進演算法 |
指導教授: |
王炳豐
Wang, Biing-Feng |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 英文 |
論文頁數: | 46 |
中文關鍵詞: | 生物資訊 、基因比對 、基因隊 、區間圖 、共有連通元件 、演算法 |
外文關鍵詞: | bioinformatics, comparative genomics, gene teams, interval graphs, common connected components, algorithms, conserved gene clusters |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Comparing multiple genome sequences is an important method to discover new biological insights. In this thesis, we consider two problems arising in comparative genomics: the gene team problem and the common connected problem.
A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold. The gene team problem is to find all gene teams of multiple genomes. Béal et al. [2] gave an O(kn(log n)^2)-time algorithm for this problem, where k is the number of genomes and n is the number of distinct genes. In this thesis, an O(knlog d)-time improved algorithm is proposed, where d <= n is the number of gene teams. The proposed algorithm is simple and efficient in practice. We also give an extension to circular chromosomes that achieves the same efficiency.
Let F = {G1 = (V, E1), …, Gk = (V, Ek)} be a set of k graphs defined on the same vertex set V. A common connected component of F is a maximal subset S in V such that the sub-graph induced by S in each Gi in F is connected. The common connected problem is to find all common connected components of F, which is a generalization of the gene team problem. In this thesis, we consider the case that all Gi are interval graphs. For this case, Coulon and Raffinot [8] had an O(m + knlog n)-time algorithm, where m = sum1<=i<=k |Ei| and n = |V|. In addition, when the k interval graphs are given as k sets of n intervals, Coulon and Raffinot [8] solved this problem in O(kn(log n)^2) time. In this thesis, when the input is k sets of n intervals, an O(knlog n)-time improved algorithm is proposed. We also show how to extend our algorithm to solve the common connected problem on k circular-arc graphs in O(min{kn(k + log n), knlog nlog d}) time, where d <= n is the number of common connected components.
比對多種生物之基因組是一種發掘新生物資訊的重要方法,在本論文中,我們討論兩個從基因比對而衍生出來的問題,一個是基因隊問題,另一個是共有連通問題。
基因隊指的是一群基因同時出現在兩個以上不同的物種,這群基因在各物種染色體上的順序雖然可能不相同,但相鄰的兩基因距離總是不超過某個限制。基因隊問題要找出複數基因組中的所有基因隊,Béal等人 [2] 對於這個問題給了一個O(kn(log n)^2)時間的演算法,其中k代表有幾組基因組,n代表有幾種不同的基因。在本論文中,我們提出了一個O(knlog d)時間的改進演算法,其中d <= n表示最後找出來的基因隊數目,這個演算法非常的簡單且實作上非常有效率,我們也討論如何在同樣的時間下把問題延伸到環狀染色體。
用F = {G1 = (V, E1), …, Gk = (V, Ek)}代表k個定義在相同點集合上的圖形,F的一個共有連通元件是V的一個最大子集合S,且由S所產生出的子圖在各個圖Gi in F中都是連通的,共有連通問題要找出F中所有的共有連通元件,這是一個廣義的基因隊問題。本論文我們考慮每個Gi都是區間圖的情況,在這情況下,Coulon和Raffinot [8]有一個O(m + knlog n)時間的演算法,其中m = sum1<=i<=k |Ei|且n = |V|。除此之外,當k個區間圖是用k組n個區間的集合來表達時,Coulon和Raffinot [8] 在O(n(log n)^2)的時間內解掉了這個問題。在本論文中,當輸入是k組n個區間的集合時,我們提出了一個O(knlog n)時間的改進演算法,並且我們也指出如何擴展此演算法好在O(min{kn(k + log n), knlog nlog d})的時間內解掉k個環弧圖上的共有連通問題,其中d <= n代表最後找出來的共有連通元件數目。
[1] A. Amir, A. Apostolico, G.M. Landau, and G. Satta, "Efficient text fingerprinting via Parikh mapping," Journal of Discrete Algorithms, vol. 1, no. 5–6, pp. 409–421, 2003.
[2] M.-P. Béal, A. Bergeron, S. Corteel, and M. Raffinot, "An algorithmic view of gene teams," Theoretical Computer Science, vol. 320, no. 2-3, pp. 395-418, 2004.
[3] J. L. Bentley, "Solutions to Klee’s rectangle problems," Department of Computer Science, Carnegie Mellon University, Manuscript, 1977.
[4] A. Bergeron and J. Stoye, "On the similarity of sets of permutations and its applications to genome comparison," Journal of Computational Biology, vol. 13, no. 7, pp. 1340–1354, 2006.
[5] S. Böcker, K. Jahn, J. Mixtacki, and J. Stoye, "Computation of Median Gene Clusters," Lecture Notes in Bioinformatics, vol. 4955, pp. 331–345, 2008.
[6] B.-M. Bui-Xuan, M. Habib, and C. Paul, "Competitive graph searches," Theoretical Computer Science, vol. 393, no. 1-3 pp. 72–80, 2008.
[7] D. Corneil, S. Olariu, and L. Stewart, "The ultimate interval graph recognition algorithm?," Symposium on Discrete Algorithms, pp. 175–180, 1998.
[8] F. Coulon and M. Raffinot, ''Fast Algorithms for identifying maximal common connected sets of interval graphs," Discrete Applied Mathematics, vol. 154, pp. 1709-1721, 2006.
[9] F. Coulon and M. Raffinot, "Identification of maximal common connected sets of interval graphs and tree forests," 1st International Conference on Algorithms and Computational Methods for Biochemical and Evolutionary Networks, 2004.
[10] T. Dandekar, B. Snel, M. Huynen, and P. Bork. "Conservation of gene order: a fingerprint for proteins that physically interact," Trends in Biochemical Sciences, vol. 23, pp. 324–328, 1998.
[11] G. Didier, "Common intervals of two sequences," Lecture Notes in Computer Science, vol. 2812, pp. 17–24, 2003.
[12] G. Didier, T. Schmidt, J. Stoye, and D. Tsur, "Character sets of strings," Journal of Discrete Algorithms, vol. 5, pp. 330–340, 2007.
[13] M.D. Ermolaeva, O. White, and S. L. Salzberg, "Prediction of operons in microbial genomes," Nucleic Acids Research, vol. 29, no. 5, pp. 1216–1221, 2001.
[14] A.-T. Gai, M. Habib, C. Paul, and M. Raffinot, ''Identifying common connected components of graphs," Technical Report LIRMM-03016.
[15] M. Habib, C. Paul, and M. Raffinot, "Maximal common connected sets of interval graphs," Lecture Notes in Computer Science, vol. 3109, pp. 359–372, 2004.
[16] X. He and M. H. Goldwasser, "Identifying conserved gene clusters in the presence of homology families," Journal of Computational Biology, vol. 12, no. 6, pp. 638–656, 2005.
[17] S. Heber, J. Stoye, "Algorithms for finding gene clusters," Lecture Notes in Computer Science, vol. 2149, pp. 252–263, 2001.
[18] S. Heber and J. Stoye, "Finding all common intervals of k permutations," Lecture Notes in Computer Science, vol. 2089, pp. 207–218, 2001.
[19] R. Kolpakov and M. Raffinot, "New algorithms for text fingerprinting," Journal of Discrete Algorithms, vol. 6, pp. 243–255, 2008.
[20] W. C. Lathe III, B. Snel, and P. Bork, "Gene context conservation of a higher order than operons," Trends in Biochemical Sciences, vol. 25, pp. 474-479, 2000.
[21] J. Lawrence, "Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes," Current Opinion in Genetics & Development, vol. 9, no. 6, pp. 642–648, 1999.
[22] C. Lekkerkerker, and J. Boland, "Representation of a finite graph by a set of intervals on the real line," Fundamenta Mathematicae, vol. 51, pp. 45–64, 1962.
[23] X. Ling, X. He, D. Xin, and J. Han, "Efficiently Identifying Max-Gap Clusters in Pairwise Genome Comparison," Journal of Computational Biology, vol. 15, no. 6, pp. 593–609, 2008.
[24] N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot, "Gene teams: a new formalization of gene clusters for comparative genomics," Computational Biology and Chemistry, vol. 27, no. 1, pp. 59–67, 2003.
[25] B. Ma, J. Tromp, and M. Li, "Patternhunter: faster and more sensitive homology search," Bioinformatics, vol. 18, pp. 440–445, 2002.
[26] R. Overbeek, M. Fonstein, M. D’Souza, G. D. Pusch, and N. Maltsev, "The use of gene clusters to infer functional coupling," Proceedings of the National Academy of Sciences of that United States of America, vol. 96, no. 6, pp. 2896–2901, 1999.
[27] L. Parida, "Gapped Permutation Pattern Discovery for Gene Order Comparisons," Journal of Computational Biology, vol. 14, no. 1, pp. 45–55, 2007.
[28] P. Pevzner, and G. Tesler, "Genome rearrangements in mammalian evolution: lessons from human and mouse genomic sequences," Genome Research, vol. 13, pp. 37–45.
[29] T. Schmidt and J. Stoye, "Quadratic time algorithms for finding common intervals in two and more sequences," Lecture Notes in Computer Science, vol. 3109, pp. 347–359, 2004.
[30] B. Snel, P. Bork, and M. A. Huynen, "The identification of functional modules from the genomic association of genes," Proceedings of the National Academy of Sciences of that United States of America, vol. 99, no. 9, pp. 5890-5895, 2002.
[31] T. Uno and M. Yagiura, "Fast algorithms to enumerate all common intervals of two permutations," Algorithmica, vol. 26, no. 2, pp. 290-309, 2000.
[32] M. Zhang and H. W. Leong, "Gene Team Tree: A Compact Representation of All Gene Teams," Lecture Notes in Bioinformatics, vol. 5267, pp. 100-112, 2008.