一般序列上基因組問題之研究｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	郭忠欽 Kuo, Chung-Chin
論文名稱：	一般序列上基因組問題之研究 A Study on the Gene-Team Problem on General Sequences
指導教授：	王炳豐 Wang, Biing-Feng
口試委員:	王有禮 Wang, Yue-Li 何錦文 Ho, Chin-Wen 謝孫源 Hsieh, Sun-Yuan 蔡錫鈞 Tsai, Shi-Chun 王家祥 Wang, Jia-Shung
學位類別：	博士 Doctor
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2015
畢業學年度：	103
語文別：	英文
論文頁數：	51
中文關鍵詞：	演算法、資料結構、基因組、基因序列比對、基因群
外文關鍵詞：	algorithms, data structures, gene teams, comparative genomics, conserved gene clusters
相關次數：	點閱：93 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

「基因群(conserved gene cluster)」的辨認是瞭解基因演化與預測基因功能的一個重要步驟。「基因組(gene team)」是其中一個用來捕捉基因群的基本生物學特徵的著名模型。找出兩條一般序列上基因組的問題是本論文的重點。He 和 Goldwasser 對於這個問題提出了一個需要 O(mn) 時間和 O(m + n) 空間的演算法，其中 m、n 分別是兩條序列的基因總數。這篇論文提出一個新的高效率演算法。假設 m 小於等於 n。令 C = ∑_α∈Σ o_1(α)o_2(α)，其中 Σ 表示不同基因的集合，o_1(α) 和 o_2(α) 分別是 α 在兩條序列中的出現次數。我們的新演算法需要 O(min{C lg n, mn}) 時間和 O(m + n) 空間的演算法。與 He 和 Goldwasser 的演算法比較，我們的演算法更加實用，因為實際應用中 C 比 mn 還要小得多。此外，我們的演算法是 output sensitive。其執行時間是輸出大小乘上 O(lg n)。而且，我們的演算法可以很容易地推廣到 k 條序列上，時間是 O(k C lg (n_1 n_2 ... n_k))，其中 n_i 是第 i 條序列上基因的數目。

Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this dissertation. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this dissertation, a new efficient algorithm is presented. Assume m ≤ n. Let C = ∑_α∈Σ o_1(α)o_2(α), where α is the set of distinct genes, and o_1(α) and o_2(α) are, respectively, the numbers of copies of α in the two given sequences. Our new algorithm requires O(min{C lg n, mn}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O(lg n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(k C lg (n_1 n_2 ... n_k)) time, where n_i is the number of genes in the ith input sequence.

Abstract i
Acknowledgement iii
Contents iv
Introduction 1
1 Related work 2
2 Summary of results 6
3 Organization of the dissertation 7
Notation and Definitions 8
An O(C lg^2 n)-Time Algorithm for the Gene Team Problem 11
1 The algorithm 11
2 Time complexity 18
An O(C lg n)-Time Algorithm for the Gene Team Problem 22
1 The algorithm 22
2 An O(mn) time bound 27
3 An O(lg n Nout) time bound 30
Reducing the Working Space and Extension to k Sequences 32
1. Reducing the working space 32
2. Extension to k sequences 41
Experimental Results 43
Conclusion and Future Work 48
References 49
                                

[1] R. Agrawal and R. Srikant, "Fast algorithms for mining association rules in large databases," in Proceedings of the 20th International Conference on Very Large Data Bases, 1994, pp. 487-499.
[2] M. P. Béal, A. Bergeron, S. Corteel, and M. Raffinot, "An algorithmic view of gene teams," Theoretical Computer Science, vol. 320, no. 2-3, pp. 395-418, 2004.
[3] A. Bergeron, Y. Gingras, and C. Chauve, Formal models of gene clusters, in I. Mandoiu and A. Zelikovskym, editors, Bioinformatics Algorithms: Techniques and Applications, chapter 8, pp. 177-202, 2008, Wiley, New York.
[4] A. Bergeron, and J. Stoye, "On the similarity of sets of permutations and its applications to genome comparison," Journal of Computational Biology, vol. 13, pp. 1340-1354, 2006.
[5] G. Blin and J. Stoye, "Finding Nested common intervals efficiently," Journal of Computational Biology, vol. 17, no. 9, pp. 1183-1194, 2010.
[6] T. Dandekar, B. Snel, M. Huynen, and P. Bork. "Conservation of gene order: a fingerprint for proteins that physically interact," Trends in Biochemical Sciences, vol. 23, pp. 324-328, 1998.
[7] G. Didier, "Common intervals of two sequences," Lecture Notes in Computer Science, vol. 2812, pp. 17-24, 2003.
[8] M. D. Ermolaeva, O. White, and S. L. Salzberg, "Prediction of operons in microbial genomes," Nucleic Acids Research, vol. 29, no. 5, pp. 1216-1221, 2001.
[9] X. He and M. H. Goldwasser, "Identifying conserved gene clusters in the presence of homology families," Journal of Computational Biology, vol. 12, no. 6, pp. 638-656, 2005.
[10] S. Heber and J. Stoye, "Finding all common intervals of k permutations," Lecture Notes in Computer Science, vol. 2089, pp. 207-218, 2001.
[11] C.-C. Kuo, GeneralGTF, http://venus.cs.nthu.edu.tw/~superior/GeneralGTF.html
[12] S. Kim, J. -H. Choi, A. Saple, and J. Yang, "A hybrid gene team model and its application to genome analysis," Journal of Bioinformatics and Computational Biology, vol. 4, no. 2, pp. 171-196, 2006.
[13] W. C. Lathe III, B. Snel, and P. Bork, "Gene context conservation of a higher order than operons," Trends in Biochemical Sciences, vol. 25, pp. 474-479, 2000.
[14] J. Lawrence, "Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes," Current Opinion in Genetics & Development, vol. 9, no. 6, pp. 642-648, 1999.
[15] X. Lin, X. He, and D. Xin, "Detecting gene clusters under evolutionary constraint in a large number of genomes," Bioinformatics, vol. 25, no. 5, pp. 571-577, 2009.
[16] N. Luc, J. -L. Risler, A. Bergeron, and M. Raffinot, "Gene teams: a new formalization of gene clusters for comparative genomics," Computational Biology and Chemistry, vol. 27, no. 1, pp. 59-67, 2003.
[17] R. Overbeek, M. Fonstein, M. D’Souza, G. D. Pusch, and N. Maltsev, "The use of gene clusters to infer functional coupling," in Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 6, pp. 2896-2901, 1999.
[18] S. Rahmann and G. W. Klau, "Integer linear programs for discovering approximate gene clusters," Lecture Notes in Bioinformatics, vol. 4175, pp. 298-309, 2006.
[19] T. Schmidt and J. Stoye, "Quadratic time algorithms for finding common intervals in two and more sequences," Lecture Notes in Computer Science, vol. 3109, pp. 347-359, 2004.
[20] B. Snel, P. Bork, and M. A. Huynen, "The identification of functional modules from the genomic association of genes," in Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 9, pp. 5890-5895, 2002.
[21] T. Uno and M. Yagiura, "Fast algorithms to enumerate all common intervals of two permutations," Algorithmica, vol. 26, no. 2, pp. 290-309, 2000.
[22] B.-F. Wang and C.-H. Lin, "Improved algorithms for finding gene teams and constructing gene team trees," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1258-1272, 2011.
[23] B. -F. Wang, C.-H. Lin, and I-T. Yang, "Constructing a gene team tree in almost O(n lg n) time," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11, no.1, pp. 142-153, 2014.
[24] M. Zhang and H. W. Leong, "Gene team tree: a hierarchical representation of gene teams for all gap lengths," Journal of Computational Biology, vol. 16, no. 10, pp. 1383-1398, 2009.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文