研究生: |
彭千華 Chien-Hua Peng |
---|---|
論文名稱: |
A Systematical Approach for Discovering Gene Regulatory Binding Motifs in Silico 以系統化的方法預測基因調控序列 |
指導教授: |
唐傳義
Chuan-Yi Tang |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 英文 |
論文頁數: | 76 |
中文關鍵詞: | transcription element 、backup gene 、pattern discovery 、systems biology 、gene regulation |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
The identification of regulatory elements recognized by transcription factors and chromatin remodeling factors is essential to studying the regulation of gene expression. When no auxiliary data, such as orthologous sequences or expression profiles, are used, the accuracy of most tools for motif discovery is strongly influenced by the motif degeneracy and the lengths of sequence. Since suitable auxiliary data may not always be available, more work must be conducted to enhance tool performance to identify transcription elements in the metazoan. A non-alignment-based algorithm, MotifSeeker, is proposed to enhance the accuracy of discovering degenerate motifs. MotifSeeker utilizes the property that variable sites of transcription elements are usually position-specific to reduce exposure to noise. Consequently, the efficiency and accuracy of motif identification are improved. Using data fusion, the ranking process integrates two measures of motif significance, resulting in a more robust significance measure. Testing results for the synthetic data reveal that the accuracy of MotifSeeker is less sensitive to the motif degeneracy and the length of input sequences. Furthermore, MotifSeeker has been tested on a well-known benchmark, yielding a correlation coefficient of 0.262, which compares favorably with those of other tools. The high applicability of MotifSeeker to biological data is further demonstrated experimentally on regulons of S. cerevisiae and liver-specific genes with experimentally verified regulatory elements.
In order to investigate the transcriptional reprogramming between backup paralogs, we use a systematic approach to find clusters of co-regulated genes. Moreover, we also apply high throughput genome-wide ChIP-chip data and MotifSeeker to identify shared transcription regulators between both backup gene members. The results shows that transcriptional reprogramming is one of the duplicate-associated genetic buffering mechanisms, but other mechanisms beyond transcriptional level appear to exist.
尋找、辨識基因調控序列,在基因表現的探討研究上仍是十分重要的議題。目前有許多調控序列,被發現在基因表現上扮演著重要的角色,例如,轉錄因子結合位點和甲機化位點等。目前眾多的方法,其辨識率大多深受輸入序列總長以及調控序列多型性程度的影響。本篇研究利用調控序列特定位點必須保留的特性 (positional specificity),設計了一個有效的演算法,配合異種同源共同調控(多物種多基因)和組織特異的基因等後處理。並利用資料融合(data fuion)的原理,發展一套融合兩種基因調控序列特性的排序方法,以從眾多可能的調控序列中找出最具統計顯著性的代表。在啟動子模擬資料的測試下,此方法的準確性較不容易受到啟動子序列長度以及調控序列多型性程度的影響。在酵母菌共同調控基因群的測試中,亦有優異的結果。配合輔助資料的後處理,也能在人類肝臟特異細胞中,找到數個實驗上已證實的基因調控序列。
本研究亦以系統生物學的觀點,結合蛋白質交互作用與基因表現的資料,提出找尋共同調控基因群的方法。並將調控序列辨識的方法應用在互補基因間轉錄調控的探索上。結果顯示,轉錄調控的改編,的確是互補基因可能的機制之一,但其他層次的調控仍極有可能存在著。
1. Keich, W. N. et al (2004) A mutation in a functional Sp1 binding site of the telomerase RNA gene (hTERC) promoter in a patient with Paroxysmal Nocturnal Haemoglobinuria. BMC Blood Disorders, 4(1): 3.
2. Elemento, O. and Tavazoie, S. (2005) Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biol. 6, R18.
3. Berezikov E., Guryev V., and Cuppen E. (2005) CONREAL web server: identification and visualization of conserved transcription factor binding sites, Nucleic Acids Res., Jul 2005; 33: W447 - W450.
4. Prakash A, and Tompa M. (2005) Discovery of regulatory elements in vertebrates through comparative genomics. Nat Biotechnol. Oct; 23(10):1249-56.
5. Ho Sui SJ, Mortimer JR, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP, and Wasserman WW. (2005) oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes, Nucleic Acids Res., Jun 2005; 33: 3154 - 3164.
6. Wang T, and Stormo, G.D. (2005) Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. U S A. Nov 29;102(48):17400-5.
7. Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, and Kellis M. (2005) Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434, 338-345.
8. Timothy L. Bailey and Charles Elkan, (1995) The value of prior knowledge in discovering motifs with MEME, Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, pp. 21-29, AAAI Press, Menlo Park, California.
9. Hertz, G. Z. and Stormo, G. D. (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 563-577.
10. Lawrence C. E., Altschul, S. F.,Bogouski, M. S., Liu, J. S., Neuwald, A. F. and Wooten, J. C. (1993) Detecting sublte sequence signals; A Gibbs sampling strategy for multiple alignment, Science, 262, 208-214.
11. Thompson, W., Rouchka E. C. and Lawrence, C. E. (2003) Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Research, Vol. 31, No. 13, 3580-3585
12. Br‾azma, A., Jonassen, I., Vilo, J., and Ukkonen, E. 1998. Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 15, 1202–15.
13. Galas, D.J., Eggert, M., and Waterman,M.S. 1985. Rigorous pattern-recognitionmethods for DNA sequences: Analysis of promoter sequences from Escherichia coli. J. Mol. Biol. 186, 117–28.
14. Sagot, M.-F. 1998. Spelling approximate repeated or common motifs using a suf. x tree. In Lucchesi, C.L., and Moura, A.V., eds., Latin ’98: Theoretical Informatics, vol. 1380 of Lecture Notes in Computer Science, 111–27. Springer, New York.
15. Sinha, S., and Tompa, M. 2000. A statistical method for finding transcription factor binding sites. Proc. 8th Int. Conf. Intelligent Systems for Molecular Biology, 344–54.
16. Tompa, M. 1999. An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. Proc. 7th Int. Conf. Intelligent Systems for Molecular Biology, 262–71.
17. Staden, R. 1989. Methods for discovering novel motifs in nucleic acid sequences. Comput. Appl. Biosci. 5, 293–8.
18. Van Helden, J., André, B., and Collado-Vides, J. 1998. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–42.
19. Wolfertstetter, F., Frech, K., Gunter H., and and Werner, T. (1996) Identification of functional elements in unaligned nucleic acid sequencesby a novel tuple search algorithm. Comp. Appl. BioScience, 12, 71-80.
20. T. Bailey and C. Elkan. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning, 21:51–80, 1995.
21. Y. Fraenkel, Y. Mandel, D. Friedberg, and H. Margalit. Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon. Comp. Appl. Biosci., 11:379–387, 1995.
22. M. Gelfand, E. Koonin, and A. Mironov. Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res., 28:695–705, 2000.
23. I. Rigoutsos and A. Floratos. Combinatorial pattern discovery in biological sequences. Bioinformatics, 14:55–67, 1998.
24. Sinha, S. and Tompa, M. (2002) Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Research, 20 (24), 5549-5560.
25. Pevzner P. A. and Sze, S. H. (2000) Combinatorial approaches to finding subtle signals in DNA sequences. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), 269-278.
26. Keich, U. and Pevzner, P. A. (2002) Finding motifs in the twilight zone. Bioinformatics, 18, 1374-1381.
27. Buhler, J. and Tompa M. (2002) Finding motifs using random projections. Journal of Computational Biology, 9(2), 225-242.
28. Price, A., Ramabhadran, S. and Pevzner, P. A. (2003) Finding subtle motifs by branching from sample strings, Bioinformatics, Vol. 19, Suppl. 2, ii149-155.
29. Ng, K. B. and Kantor, P. (2000) Predicing the effectiveness of Native Data Fusion on the basis of system characteristics. JASIS (51), 1177-1189.
30. Hsu, D. F. and Taksa, I. (2005) Comparing rank and score combination methods for data fusion in information retrieval. Information Retrieval Vol. 8, No. 3. pp. 449-480.
31. Tompa, M. et al. (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology, Vol. 23, 1, 137-144.
32. Elnitski, L. et al. (2003) Distinguishing regulatory DNA from neutral sites. Genome Res., 13, 64 – 72.
33. Chuang, H. Y. et al (2004) Identifying significant genes from microarray data. Proceedings of BIBE’04, IEEE Computer Society Press, 358-365.
34. Ginn, C. M. R., Willett, P. and Bardshaw; J. (2002) Combination of molecular similarity measures using data fusion. Perspective in Drug Discovery and Design. 20,1-16.
35. Hsu, D. F. and Palumbo, A. (2004) A study of data fusion in Cayley graphs G(Sn,Pn). Proceedings of I-SPAN’04, 557-567.
36. Hsu, D. F., Shapiro, J. and Taksa, I. (2002) Methods of data fusion in information retrieval rank vs. score combination, DIMACS Technical Report, 2002-58.
37. Kuriakose, M. A. et al (2004) Selection and validation of differentially expressed genes in head and neck cancer, Cellular and Molecular Life. Science, 1372-1383.
38. Thompson, W., Rouchka E. C. and Lawrence, C. E. (2003) Gibbs recursive sampler: finding transcription factor binding sites, Nucleic Acids Research, Vol. 31, No. 13, 3580-3585.
39. Vogt C. C. and Cortrell, G. W. (1999) Fusion via a linear combination of scores, Info. Ret. (1), 151-172.
40. Hsu, D. F. and Taksa, I. Comparing rank and score combination methods for data fusion in information retrieval. Information Retrieval, Vol. 8, 3, 449-480.
41. Hsu, D. F., Chung, Y.-S. and Kristal, B. S. (2006) Combinatorial fusion analysis: methods and practices of combining multiple scoring systems, In H. Hsu (Ed.), Advanced Data Mining Technologies in Bioinformatics (pp.32-62). Hershey, PA: Idea Group Publishing.
42. Yang, J. M., Chen, Y. –F., Shen, T. –W., Kristal B. S. and Hsu, D. F. (2005) Consensus scoring criteria for improving enrichment in virtual screening, J. of Chem. Inf. Model. 45, 1134-1146.
43. Pavesi G, Mereghetti P, Mauri G, and Pesole G. (2004) Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Research, Jul 1;32:W199-203.
44. Down, T. A. and Hubbard, T. J. P. (2005) NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Research, Vol. 33, No. 5, 1445-1453.
45. Zhu, J. and Zhang, M. Q. (1999) SCPD: a promoter database of the yeast Saccharomyces cerevisiae, Bioinformatics, 15, 607-611.
46. Krivan, W. and Wasserman, W. W. (2001) A predictive model for regulatory sequences directing liver-specific transcription. Genome Research, 11, 1159-1566.
47. Wang, T. and Stormo, G. D. (2003) Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics, 18, 2369-2380.
48. Wasserman W. W, and Fickett J.W. (1998) Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol, 278:167-81
49. Zhang, J., 2003 Evolution by gene duplication: an update. Trends Ecol. Evol. 18: 292–298.
50. Nowak, M.A., Boerlijst, M.C., Cooke, J. & Smith, J.M. Evolution of genetic redundancy. Nature 388, 167–171 (1997).
51. Lynch, M., O’Hely, M., Walsh, B. & Force, A. The probability of preservation of a newly arisen gene duplicate. Genetics 159, 1789–1804 (2001).
52. Lynch, M. & Conery, J.S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).
53. Force, A. et al. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545 (1999).
54. Wagner, A. The role of population size, pleiotropy and fitness effects of mutations in the evolution of overlapping gene functions. Genetics 154, 1389–1401 (2000).
55. Gu, Z., Nicolae, D., Lu, H.H. & Li, W.H. Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 18, 609–613 (2002).
56. Conant, G. C., and A. Wagner, 2004 Duplicate genes and robustness to transient gene knock-downs in Caenorhabditis elegans. Proc. Biol. Sci. 271: 89–96. 2004
57. Kirschner, M. & Gerhart, J. (1998) Evolvability Proc. Natl. Acad. Sci. USA 95, 8420–8427.
58. Ohno, S. (1970) Evolution by Gene and Genome Duplication (Springer, New York, NY).
59. Hughes, A. L. The evolution of functionally novel proteins after. gene duplication. (1994) Proc. Biol. Sci. 256, 119–124.
60. Weiss, K., Stock, D., Zhao, Z., Buchanan, A., Ruddle, F. & Shashikant, C. Perspectives on genetics aspects of dental patterning. (1998) Eur. J. Oral Sci. 106, Suppl. 1, 55–63.
61. Schwarz, M., Alvarez-Bolado, G., Urba´nek, P., Busslinger, M. & Gruss, P. Conserved biological function between Pax-2 and Pax-5 in midbrain and cerebellum development : Evidence from targeted mutations. (1997) Proc. Natl. Acad. Sci. USA 94, 14518–14523.
62. Mansouri, A. & Gruss, P. Pax3 and Pax7 are expressed in commissural neurons and restrict ventral neuronal identity in the spinal cord. 1998 Mech. Dev. 78, 171–178.
63. van den Berg, M.A. et al. The two acetyl-coenzyme A synthetases of Saccharomyces cerevisiae differ with respect to kinetic properties and transcriptional regulation. J Biol Chem 271, 28953-9 (1996).
64. Garcia-Rodriguez, L.J. et al. Characterization of the chitin biosynthesis process as a compensatory mechanism in the fks1 mutant of Saccharomyces cerevisiae. FEBS Lett 478, 84-8 (2000).
65. Onda, M., Ota, K., Chiba, T., Sakaki, Y. & Ito, T. Analysis of gene network regulating yeast multidrug resistance by artificial activation of transcription factors: involvement of Pdr3 in salt resistance. (2004) Gene 332, 51–59.
66. Kafri, R., A. Bar-Even and Y. Pilpel, Transcription control reprogramming in genetic backup circuits. 2005. Nat. Genet. 37: 295–299.
67. Kafri R, Levy M, Pilpel Y.The regulatory utilization of genetic redundancy through responsive backup circuits. Proc Natl Acad Sci U S A. 2006 Aug 1;103(31):11653-8.
68. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25: 25–29.
69. Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N., Macisaac, K.D., Danford, T.D., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., Jennings, E.G., Zeitlinger, J., Pokholok, D.K., Kellis, M., Rolfe, P.A., Takusagawa, K.T., Lander, E.S., Gifford, D.K., Fraenkel, E. and Young, R.A. Transcriptional Regulatory Code of a Eukaryotic Genome. 2004. Nature 431: 99-104.
70. Reuter, I.; Chekmenev, D.; Krull, M.; Hornischer, K.; Voss, N.; Stegmaier, P.; Lewicki-Potapov, B.; Saxel, H.; Kel, A. E.; Wingender E.. TRANSF AC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes. 2006. Nucleic Acids Res. 34(Database issue):D108-110.
71. Kratzer S, Schüller HJ. Carbon source-dependent regulation of the acetyl-coenzyme A synthetase-encoding gene ACS1 from Saccharomyces cerevisiae. Gene. 1995 Aug 8;161(1):75-9.
72. Mazur P, Morin N, Baginsky W, el-Sherbeini M, Clemas JA, Nielsen JB, Foor F.Differential expression and function of two homologous subunits of yeast 1,3-beta-D-glucan synthase.Mol Cell Biol. 1995 Oct;15(10):5671-81.
73. Zhao C, Jung US, Garrett-Engele P, Roe T, Cyert MS, Levin DE. Temperature-induced expression of yeast FKS2 is under the dual control of protein kinase C and calcineurin. Mol Cell Biol. 1998 Feb; 18(2):1013-22.
74. Ram, A.F.J., Brekelmans, S.S.C., Oehlen, L.J.W.M., Klis, F.M. Identification of two cell cycle regulated genes affecting the beta1,3-glucan content of cell walls in Saccharomyces cerevisiae, 1995, FEBS Letters 358 , P.165-170.
75. Tachibana C, Yoo JY, Tagne JB, Kacherovsky N, Lee TI, Young ET. Combined global localization analysis and transcriptome data identify genes that are directly coregulated by Adr1 and Cat8. Mol Cell Biol. 2005 Mar;25(6):2138-46.
76. Mosley AL, Lakshmanan J, Aryal BK, Ozcan S. Glucose-mediated phosphorylation converts the transcription factor Rgt1 from a repressor to an activator. J Biol Chem. 2003 Mar 21; 278(12):10322-7. Epub 2003 Jan 13.
77. He X, Zhang J. Transcriptional reprogramming and backup between duplicate genes: is it a genomewide phenomenon? Genetics. 2006 Feb;172(2):1363-7.
78. Wijaya, E., Rajaraman, K., Yiu, S.M. and Sung, W.K. Detection of Generic Spaced Motifs Using Submotif Pattern Mining, Bioinformatics, 23(12): 1476-1485, 2007.
79. Wang, G. and Zhang, W. (2005) An iterative learning algorithm for deciphering stegoscripts: a grammatical approach for motif discovery. 2005. Washington University, Department of Computer Science and Engineering, Technical Report No. 12, St. Louis, MO, USA
80. Wang G, Yu T, and Zhang W. (2005) WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res. Vol 33: W412-W416.