簡易檢索 / 詳目顯示

研究生: 許承偉
Chen-Wei Hsu
論文名稱: 利用關聯圖及其展開貝氏網路建立轉錄啟動點和啟動子元素之模型
Modeling Transcription Start Site and Promoter Elements with Dependency Graphs and Their Expanded Bayesian Networks
指導教授: 呂忠津
Chung-Chin Lu
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 39
中文關鍵詞: 啟動子塔塔盒子
外文關鍵詞: TATA, promoter, prediction, transcription
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 感謝人類基因體計畫 (HGP) 中,定序人類DNA序列這部份的提前完成,因此我們現在能從網路上得到大量的人類原始的DNA序列。而至目前為止也有許多的程式被發展出來分析這些DNA序列。在基因的5’端有一區塊它包含了轉錄啟動點稱為啟動子,啟動子主要的功能為調控基因的表現,我們也可以藉由分析它來增進尋找基因的準確度。現在已有許多程式可以來做啟動子的和轉錄啟動點的預測,但是就目前為止,它們做出來的結果並不佳,主要的原因是因為錯誤的預測過多,造成太多假的訊號,而我們需要的是更準確更有效率的預測程式。

    在這篇論文中,我們的方法首先是利用卡方分佈對轉錄啟動點做出關連圖當做基本的模型,再利用展開貝氏網路將關連圖做進一步處理,展開時我們不限定每個圖中節點出現的次數,以便能完全的抓到各節點間的關連性。而考慮到在啟動子內有許多不同的促進元素,例如塔塔盒子、菊西盒子和凱特盒子,於是我們對不同的啟動子元素用同樣的方法都做一個相似的模型,我們整合了這些不同的元素來做出最後對轉錄啟動點的預測。為了確認我們模型的效果,我們從網路上挑選了最有名中的四個程式來和我們的程式做比較,經過測試的結果,我們的程式可以在比其它的程式有較高的敏感度下,仍能有較高的專一性,證明我們的方法是最好的。


    We have a large amount of raw genomic DNA sequence data now with the completion of the Human Genome
    Project (HGP). There are hundreds of programs developed to analyze these DNA sequences. Promoter is
    a region usually located at the 5' flanking end of a gene and encompasses the transcription start
    site. The promoter plays an important role in gene regulation and the detection of the promoter
    region could help to improve the accuracy of gene-finding. There are also several in silico
    approaches to predict promoter region or transcription start site, but the performance of these
    programs are usually unsatisfactory since the number of false positives is too high.

    In this thesis, we first develop a dependency graph as the basic model for the transcription start
    site by chi-square test and then expand this graph with a Bayesian network by allowing nucleotides
    in each position to appear more than once to catch their inter-dependency but avoid overfitting. In
    consideration of more than one signals within the promoter region, we also construct dependency
    graph and it's expanded Bayesian network to model TATA box. The prediction of TATA box will be
    integrated into the prediction of transcription start site in this thesis. The results show that
    our method has the best performance comparing with four most famous programs available on the
    Internet.

    1 Introduction 1 1.1 Gene Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Promoter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 TSS Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 The Biology of Eukaryotic Promoter 5 2.1 DNA and Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 The RNA Polymerase II Machinery . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 The RNA Polymerase II Core Promoter . . . . . . . . . . . . . . . . . . . . 9 3 Method 11 3.1 Chi-Square Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Model and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.1 Dependency Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.2 Expanded Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . 17 3.4 G+C Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 Detection of Promoter Elements . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 Datasets 25 4.1 TATA Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Transcription Start Site Datasets . . . . . . . . . . . . . . . . . . . . . . . . 26 5 Results 28 5.1 Measures for Predictive Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6 Conclusion 36 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    Bucher, P. (1990). Weight matrix descriptions of four eukaryotic rna polymearse ii promoter
    elements derived from 502 unrelated prmoter sequences. Journal of Molecular Biology,
    212, 563–578.
    Burge, C. and Karlin, S. (1997). Prediction of complete gene structures in human genomic
    dna. Journal of Molecular Biology, 268, 78–94.
    Burke, T. W. and Kadonaga, J. T. (1996). Drosophila tfiid binds to a conserved downstream
    basal promoter element that is present in many tata-box-deficient prmoters. Genes and
    Development, 10, 711–724.
    Burke, T. W. and Kadonaga, J. T. (1997). The downstream promoter element, dpe, is
    conserved from drosophila to humans and is recognized by tafII60 of drosophila. Genes
    and Development, 11, 3020–3031.
    Butler, J. E. and Kadonaga, J. T. (2002). The rna polymerase ii core promoter: a key
    component in the regulation of gene expression. Genes and Development, 16, 2583–
    2592.
    Cai, D., Delcher, A., Kao, B., and Kasif, S. (2000). Modeling splice sites with bayes networks.
    Bioinformatics, 16, 152–158.
    Chen, T. M. (2002). Modeling splice sites with dependency graphs and their approximation
    by bayesian networks. Master’s thesis, National Tsing Hua University.
    37
    Ewens, W. J. and Grant, G. R. (2001). Statistical Methods in Bioinformatics: An Introduction.
    Springer Science Business Media.
    Huet, J., Sentenac, A., and Fromageot, P. (1982). Spot-immunodetection of conserved determinants
    in eukaryotic rna polymerases. Journal of Biology Chemistry, 257, 2613–2618.
    Kadonaga, J. T. (2002). The dpe, a core promoter element for transcription by rna polymerase
    ii. Experimental and Molecular Medicine, 4, 259–264.
    Kutach, A. K. and Kadonaga, J. T. (2000). The downstream promoter element dpe appears
    to be as widely used as the tata box in drosophila core promoters. Molecular Cell
    Biology, 14, 116–127.
    Mathe, C., Sagot, M. F., Schiex, T., and Rouze, P. (2002). Current methods of gene prediction,
    their strengths and weaknesses. Nucleic Acids Research, 30, 4103–4117.
    Orphanides, G., Lagrange, T., and Reinberg, D. (1996). The general transcription factors of
    rna polymerase ii. Genes and Development, 10, 2657–2683.
    Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
    Morgan Kaufmann Publishers Inc.
    Roberts, S. G. E. (2000). Mechanisms of action of transcription activation and repression
    domains. Cell and Molecular Life Sciences, 57, 1149–1160.
    Roeder, R. (1996). The role of general initiation factors in transcription by rna polymerase
    ii. Trends of Biochemistry Sciences, 21, 327–335.
    Rouaida, C. P., Viviane, P., Thomas, J., Claude, B., and Philipp, B. (2000). The eukaryotic
    promoter database (epd). Nucleic Acids Research, 28, 302–303.
    Smale, S. T. (1994). Dna sequence requirements for transcriptional initiator activity in
    mammalian cells. Molecular Cell, 14, 116–127.
    38
    Suzuki, Y., Ishihara, D., Sasaki, M., Nakagawa, H., Hata, H., Tsunoda, T., Watanabe, M.,
    Komatsu, T., Ota, T., Isogai, T., and Suyama, A. (2000). Statistical analysis of the 5’
    untranslated region of human mrna using oligo-capped cdna libraries. Genomics, 64,
    286–297.
    Suzuki, Y., Tsunoda, T., Sese, J., Taira, H., Mizushima, J. S., Hata, H., Ota, T., Isogai,
    T., Tanaka, T., Nakamura, Y., Suyama, A., Sakaki, Y., Morishita, S., Okubo, K., and
    Sugano, S. (2001). Identification and caracterization of the potential promoter regions
    of 1031 kinds of human genes. Genome Research, 11, 677–684.
    Willy, P. J., Kobayashi, R., and Kadonaga, J. T. (2000). A basal transcription factor that
    activates or represses transcription. Science, 290, 982–984.
    Woychik, N. A. and Michael, H. (2002). The rna polymerase ii machinery: Structure illuminates
    function. Cell, 108, 453–463.
    39

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE