研究生: |
許承偉 Chen-Wei Hsu |
---|---|
論文名稱: |
利用關聯圖及其展開貝氏網路建立轉錄啟動點和啟動子元素之模型 Modeling Transcription Start Site and Promoter Elements with Dependency Graphs and Their Expanded Bayesian Networks |
指導教授: |
呂忠津
Chung-Chin Lu |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 啟動子 、塔塔盒子 |
外文關鍵詞: | TATA, promoter, prediction, transcription |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
感謝人類基因體計畫 (HGP) 中,定序人類DNA序列這部份的提前完成,因此我們現在能從網路上得到大量的人類原始的DNA序列。而至目前為止也有許多的程式被發展出來分析這些DNA序列。在基因的5’端有一區塊它包含了轉錄啟動點稱為啟動子,啟動子主要的功能為調控基因的表現,我們也可以藉由分析它來增進尋找基因的準確度。現在已有許多程式可以來做啟動子的和轉錄啟動點的預測,但是就目前為止,它們做出來的結果並不佳,主要的原因是因為錯誤的預測過多,造成太多假的訊號,而我們需要的是更準確更有效率的預測程式。
在這篇論文中,我們的方法首先是利用卡方分佈對轉錄啟動點做出關連圖當做基本的模型,再利用展開貝氏網路將關連圖做進一步處理,展開時我們不限定每個圖中節點出現的次數,以便能完全的抓到各節點間的關連性。而考慮到在啟動子內有許多不同的促進元素,例如塔塔盒子、菊西盒子和凱特盒子,於是我們對不同的啟動子元素用同樣的方法都做一個相似的模型,我們整合了這些不同的元素來做出最後對轉錄啟動點的預測。為了確認我們模型的效果,我們從網路上挑選了最有名中的四個程式來和我們的程式做比較,經過測試的結果,我們的程式可以在比其它的程式有較高的敏感度下,仍能有較高的專一性,證明我們的方法是最好的。
We have a large amount of raw genomic DNA sequence data now with the completion of the Human Genome
Project (HGP). There are hundreds of programs developed to analyze these DNA sequences. Promoter is
a region usually located at the 5' flanking end of a gene and encompasses the transcription start
site. The promoter plays an important role in gene regulation and the detection of the promoter
region could help to improve the accuracy of gene-finding. There are also several in silico
approaches to predict promoter region or transcription start site, but the performance of these
programs are usually unsatisfactory since the number of false positives is too high.
In this thesis, we first develop a dependency graph as the basic model for the transcription start
site by chi-square test and then expand this graph with a Bayesian network by allowing nucleotides
in each position to appear more than once to catch their inter-dependency but avoid overfitting. In
consideration of more than one signals within the promoter region, we also construct dependency
graph and it's expanded Bayesian network to model TATA box. The prediction of TATA box will be
integrated into the prediction of transcription start site in this thesis. The results show that
our method has the best performance comparing with four most famous programs available on the
Internet.
Bucher, P. (1990). Weight matrix descriptions of four eukaryotic rna polymearse ii promoter
elements derived from 502 unrelated prmoter sequences. Journal of Molecular Biology,
212, 563–578.
Burge, C. and Karlin, S. (1997). Prediction of complete gene structures in human genomic
dna. Journal of Molecular Biology, 268, 78–94.
Burke, T. W. and Kadonaga, J. T. (1996). Drosophila tfiid binds to a conserved downstream
basal promoter element that is present in many tata-box-deficient prmoters. Genes and
Development, 10, 711–724.
Burke, T. W. and Kadonaga, J. T. (1997). The downstream promoter element, dpe, is
conserved from drosophila to humans and is recognized by tafII60 of drosophila. Genes
and Development, 11, 3020–3031.
Butler, J. E. and Kadonaga, J. T. (2002). The rna polymerase ii core promoter: a key
component in the regulation of gene expression. Genes and Development, 16, 2583–
2592.
Cai, D., Delcher, A., Kao, B., and Kasif, S. (2000). Modeling splice sites with bayes networks.
Bioinformatics, 16, 152–158.
Chen, T. M. (2002). Modeling splice sites with dependency graphs and their approximation
by bayesian networks. Master’s thesis, National Tsing Hua University.
37
Ewens, W. J. and Grant, G. R. (2001). Statistical Methods in Bioinformatics: An Introduction.
Springer Science Business Media.
Huet, J., Sentenac, A., and Fromageot, P. (1982). Spot-immunodetection of conserved determinants
in eukaryotic rna polymerases. Journal of Biology Chemistry, 257, 2613–2618.
Kadonaga, J. T. (2002). The dpe, a core promoter element for transcription by rna polymerase
ii. Experimental and Molecular Medicine, 4, 259–264.
Kutach, A. K. and Kadonaga, J. T. (2000). The downstream promoter element dpe appears
to be as widely used as the tata box in drosophila core promoters. Molecular Cell
Biology, 14, 116–127.
Mathe, C., Sagot, M. F., Schiex, T., and Rouze, P. (2002). Current methods of gene prediction,
their strengths and weaknesses. Nucleic Acids Research, 30, 4103–4117.
Orphanides, G., Lagrange, T., and Reinberg, D. (1996). The general transcription factors of
rna polymerase ii. Genes and Development, 10, 2657–2683.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
Morgan Kaufmann Publishers Inc.
Roberts, S. G. E. (2000). Mechanisms of action of transcription activation and repression
domains. Cell and Molecular Life Sciences, 57, 1149–1160.
Roeder, R. (1996). The role of general initiation factors in transcription by rna polymerase
ii. Trends of Biochemistry Sciences, 21, 327–335.
Rouaida, C. P., Viviane, P., Thomas, J., Claude, B., and Philipp, B. (2000). The eukaryotic
promoter database (epd). Nucleic Acids Research, 28, 302–303.
Smale, S. T. (1994). Dna sequence requirements for transcriptional initiator activity in
mammalian cells. Molecular Cell, 14, 116–127.
38
Suzuki, Y., Ishihara, D., Sasaki, M., Nakagawa, H., Hata, H., Tsunoda, T., Watanabe, M.,
Komatsu, T., Ota, T., Isogai, T., and Suyama, A. (2000). Statistical analysis of the 5’
untranslated region of human mrna using oligo-capped cdna libraries. Genomics, 64,
286–297.
Suzuki, Y., Tsunoda, T., Sese, J., Taira, H., Mizushima, J. S., Hata, H., Ota, T., Isogai,
T., Tanaka, T., Nakamura, Y., Suyama, A., Sakaki, Y., Morishita, S., Okubo, K., and
Sugano, S. (2001). Identification and caracterization of the potential promoter regions
of 1031 kinds of human genes. Genome Research, 11, 677–684.
Willy, P. J., Kobayashi, R., and Kadonaga, J. T. (2000). A basal transcription factor that
activates or represses transcription. Science, 290, 982–984.
Woychik, N. A. and Michael, H. (2002). The rna polymerase ii machinery: Structure illuminates
function. Cell, 108, 453–463.
39