簡易檢索 / 詳目顯示

研究生: 何瓊雯
Chiung-Wen Ho
論文名稱: 使用關聯圖及其貝氏網路展開建立人類基因3'端之隨機文法
A Stochastic Grammar of 3' Terminal of Homo Sapiens Genes with Dependency Graphs and Their Expanded Bayesian Networks
指導教授: 呂忠津
Chung-Chin Lu
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 57
中文關鍵詞: 關聯圖貝氏網路展開多聚腺苷酸化基因
外文關鍵詞: dependency graph, expanded Bayesian network, polyadenylation, genes
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前生物資訊領域中一挑戰性課題即是,於人類的三十億個鹼基對當中,標記出每個基因特定的結構。其中,多聚腺苷酸化點(Polyadenylation site)是位於基因結尾的一種特徵結構。在多聚腺苷酸化點,前信使核醣核酸(pre-mRNA)會發生核酸內切,並緊接著在切開的位置形成多腺苷酸尾巴(poly(A) tail),而多腺苷酸尾巴幾乎在所有成熟的真核生物前信使核醣核酸的末端都有發現。

    前信使核醣核酸的”切開/多聚腺苷酸化作用”,最少與兩個調控訊號相關。這兩個調控訊號分別是(1)多腺苷酸訊號(PAS),位於多聚腺苷酸化點的上游,約距離十到三十個核苷酸,且包含一個核苷酸組成具高度保守特性的AAUAAA六聚物(或是一個常見的變化型AUUAAA); (2)下游元素(DE),位於多聚腺苷酸化點的下游,約相距二十到四十個核苷酸,是由特徵較不顯著但已知富含尿嘧啶(U)或尿嘧啶-鳥嘌呤(G-U)的序列所組成。

    在這篇論文裡,我們以多聚腺苷酸化點為參考點,將人類基因的3’端依相關位置分為幾個狀態,採用關聯圖及其貝氏網路展開來建立各狀態的數學模型,以捕捉各狀態的特徵關聯性,並串連各狀態以建立一套人類基因3’端的隨機語法。接著,我們使用整合GenBank與Ensembl資料庫所得的基因資料測試我們的隨機語法對人類基因3’端的預測效果。


    In bioinformatics, one of the challenging issue is to determine the specific structure of each
    gene from the 3 billion base-pairs of human DNA sequences. Polyadenylation site is a
    specific feature at the terminus of a gene which involves the endonucleolytic cleavage of the
    pre-mRNA followed by the addition of a poly(A) tail, which is found at the 3’-terminal of
    the majority of mRNA.
    Factors related to cleavage and polyadenylation have to recognize associated signals, i.e.,
    polyadenylation signal (PAS) and downstream element(DSE). PAS is the signal appearing
    in 10 to 30 nucleotide upstream of the cleavage and polyadenylation site and is with a highly
    conserved hexamer AAUAAA and a common variant AUUAAA in pre-mRNAs. DSE is in
    20 to 40 nucleotide downstream to the cleavage and polyadenylation site and consists of a
    much less conserved U- or GU-rich sequence.
    In this thesis, we will construct a stochastic grammar of 3’-terminal of human genes by
    establishing the dependency graphs and their expanded Bayesian networks of the features
    in this region. Further more we will compare the performances of this stochastic grammar
    and the PAS detector provided by former researchers.

    Contents Acknowledgment i Abstract i Contents i List of Figures iii List of Tables vi 1 Introduction 1 2 Background 3 2.1 Mechanism of pre-mRNA polyadenylation . . . . . . . . . . . . . . . . . . . 3 2.1.1 Polyadenylation Signal and Downstream Element . . . . . . . . . . . 4 2.1.2 Cleavage of a pre-mRNA . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 Initiation and Elongation of Polyadenylation . . . . . . . . . . . . . . 9 2.2 Famous Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 i 3 Methods 12 3.1 Definition of States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Dependency Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Expanded Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4 Viterbi Decoding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Results 26 4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.1 Data for PAS Detection . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 Accuracy Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.2 PAS Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.3 Viterbi Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5 Conclusion 41 A 42 A.1 Chi-square Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 A.2 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Bibliography 45

    Bibliography
    [1] Barobino, S. M., Hubner, W., Jenny, A., Minvielle, S. L., and Keller, W. (1997). The
    30-kD subunit of mammalian cleavage and polyadenylation specificity factor and its yeast
    homolog are RNA-binding zinc finger proteins. Genes Dev., 11, 1703–1716.
    [2] Beaudoing, E., Freier, S., Wyatt, J. R., and Claverie, J. M. (2000). Patterns of variant
    polyadenylation signal usage in human genes. Genome Res., 10, 1001–1010.
    [3] Bienroth, S., Keller, W., and Wahle, E. (1993). Assembly of a processive messenger RNA
    polyadenylation complex. EMBO J., 12, 585–594.
    [4] Burge, C. and Karlin, S. (1997). Prediction of complete gene structures in human ge-
    neomic DNA. Journal of Molecular Biology, 268, 78–94.
    [5] Calvo, O. and Manley, J. L. (2003). Strange bedfellows: polyadenylation factors at the
    promoter. Genes Dev., 17, 1321–1327.
    [6] Chang, C. C. (2004). Modeling polyadenylation signal with dependency graphs and their
    expanded bayesian networks. Master’s thesis, National Tsing Hua University.
    [7] Chen, F., MacDonald, C. C., and Wilusz, J. (1995). Cleavage site determinants in the
    mammalian polyadenylation signal. Nuc. Acid. Res., 23, 2614–2620.
    [8] Chen, T. M. (2002). Modeling splice sites with dependency graphs and their approxima-
    tion by Bayesian networks. Master’s thesis, National Tsing Hua University.
    [9] Chen, Z., Li, Y., and Krug, R. (1999). Influenza A virus NS1protein targets poly(A)-
    binding protein ii of the cellular 3’ end processing machinery. EMBO J., 18, 2273–2283.
    45
    [10] Chou, Z. F., Chen, F., and Wilusz, J. (1994). Sequence and position requirements
    for uridylate-rich downstream elements of polyadenylation signals. Nuc. Acid. Res., 22,
    2525–2531.
    [11] Ewens, W. J. and Grant, G. R. (2001). Statistical Methods in Bioinformatics: An
    Introduction. Springer Science Business Media.
    [12] Fitzgerald, M. and Shenk, T. (1981). The sequence 5’-AAUAAA-3’ forms part of the
    recognition site for polyadenylation of late SV40 mRNAs. Cell, 24, 251–260.
    [13] Gautheret, D. and Lambert, A. (2001). Direct RNA motif definition and identification
    from multiple sequence alignments using secondary structure profiles. J. Mol. Biol., 313,
    1003–1011.
    [14] Gil, A. and Proudfoot, N. J. (1987). Position-dependent sequence elements downstream
    of aauaaa are required for efficient rabbit β−globin mrna 3’ end formation. Cell, 49,
    399–406.
    [15] Graber, J. H., Cantor, C. R., Mohr, S. C., and Smith, T. F. (1999). In silico detection
    of control signals: mRNA 3’-end-processing sequences in deverse species. Proc. Nat. Acad.
    Sci., 96, 14055–14060.
    [16] Hajarnavis, A., Korf, I., and Durbin, R. (2004). A probabilistic model of 3’ end forma-
    tion in Caenorhabditis elegans. Nuc. Aci. Res., 32(11), 3392–3399.
    [17] Hirose, Y. and Manley, J. (1998). RNA polymerase II is an essential mRNA polyadeny-
    lation factor. Nature, 395, 94.
    [18] Hofer, E. and Darnell, J. J. (1981). The primary transcription unit of the mouse beta-
    major globin gene. Cell, 23, 585–593.
    [19] Kaufmann, I., Martin, G., Friedlein, A., Langen, H., and Keller, W. (2004). Human fip1
    is a subunit of cpsf that binds to u-rich rna elements and stimulates poly(a) polymerase.
    EMBO J., 23, 616–626.
    46
    [20] Lander, E., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon,
    K., Dewar, K., Doyle, M., and FitzHugh, W. (2001). Initial sequencing and analysis of
    the human genome. Nature, 409, 860–921.
    [21] Legendre, M. and Gautheret, D. (2003). Sequence determinants in human polyadeny-
    lation site selection. BMC Genom., 4(1), 7.
    [22] MacDonald, C., Wilusz, J., and Shenk, T. (1994). The 64-kilodalton subunit of CstF
    polyadenylation factor binds to pre-mrnas downstream of the cleavage site and influences
    cleavage site location. Mol. Cell. Biol., 14, 6647–6654.
    [23] Maniatis, T. and Reed, R. (2002). An extensive network of coupling among gene ex-
    pression machines. Nature, 416, 499–506.
    [24] Mason, P. J., Elkington, J. A., Lloyd, M. M., Jones, M. B., and Williams, J. G. (1986).
    Mutations downstream of the polyadenylation site of a Xenopus beta-globin mRNA affect
    the position but not the efficiency of 3’ processing. Cell, 46, 263–270.
    [25] McDevitt, M. A., Hart, R. P., Wong, W. W., and Nevins, J. R. (1986). Sequence capable
    of restoring poly(A) site function define two distinct downstream element. EMBO J., 5,
    2907–2931.
    [26] McLauchlan, J., Gaffney, D., Whitton, J. L., and Clements, J. B. (1985). The consensus
    sequence YGTGTTYY located downstream from the AAUAAA signal is required for
    efficient formation of mRNA 3’ termini. Nuc. Acid. Res., 13, 1347–1368.
    [27] Moore, C. and Sharp, P. (1988). Two proteins crosslinked to RNA containing the
    adenovirus L3 polyadenylation site require the AAUAAA sequence for binding. EMBO
    J., 7, 3159–3169.
    [28] Murthy, K. G. K. and Manley, J. L. (1995). The 160 kD subunit of human cleavage-
    polyadenylation specificity factor coordinates pre-mrna 3’ end formation. Genes Dev., 9,
    2672–2683.
    47
    [29] Nemeth, A., Krause, S., Blank, D., Jenny, A., Jeno, P., Lusting, A., and Wahle, E.
    (1995). Isolation of genomic and cDNA clones encoding bovine poly(A) binding protein
    II. Nuc. Acids. Res, 23, 4034–4041.
    [30] Neugebauer, K. M. (2002). On the importance of being co-transcriptional. J. Cell. Sci.,
    115, 3865–3871.
    [31] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
    Inference. Morgan Kaufmann Publishers Inc.
    [32] Proudfoot, N. (2004). New perspectives on connecting messenger RNA 3’ end formation
    to transcription. Curr. Opin. Cell. Biol., 16, 272–278.
    [33] Ruegsegger, U., Beyer, K., and Keller, W. (1996). Purification and characterization of
    human cleavage factor Im involved in the 3’ end processing of messenger RNA precursors.
    J. Biol. Chem., 271, 6107–6113.
    [34] Salamov, A. A. and Solovyev, V. V. (1994). Recognition of 3’-processing sites of human
    mRNA precursors. Comput. Appl. Biosci., 10, 23–28.
    [35] Sittler, A., Gallinaro, H., and Jacob, M. (1994). Upstream and downstream cis-acting
    elements for cleavage at the L4 polyadenylation site of adenovirus-2. Nuc. Acid. Res., 22,
    222–231.
    [36] Tabaska, J. E. and Zhang, M. Q. (1999). Detection of polyadenylation signals in human
    DNA seuqnces. Gene, 231, 77–86.
    [37] Takagaki, Y. and Manley, J. (1994). A polyadenylation factor subunit is the human
    homologue of the Drosophila suppressor of forked protein. Nature, 372, 471–474.
    [38] Takagaki, Y. and Manley, J. L. (1997). RNA recognition by the human polyadenylation
    factor CstF. Mol. Cell Bio., 17, 3907–3914.
    48
    [39] Tian, B., Hu, J., Zhang, H., and Lutz, C. (2005). A large-scale analysis of mrna
    polyadenylation of human and mose genes. Nuc. Acid. Res., 33, 201–212.
    [40] Wahle, E. and Keller, W. (1992). The biochemistry of 3’-end cleavage and polyadeny-
    lation of messenger RNA precursors. Annu. Re. Biochem., 61, 419–440.
    [41] Wahle, E. and Keller, W. (1996). The biochemistry of polyadenylation. Trends Biochem
    Sci., 21, 247–250.
    [42] Wahle, E., Lusting, A., Jeno, P., and Maurer, P. (1993). Mammalian poly(A)-bindign
    protein II. J. Biol. Chem., 268, 2937–2945.
    [43] Weaver, R. F. (2002). Molecular Biology. Mc Graw-Hill, 2nd edition.
    [44] Wickens, M. (1990). How the messenger ot its tail: addition of poly(A) in the nucleus.
    Trends Biochem Sci., 15, 277–281.
    [45] Wilusz, J. and Shenk, T. (1990). A 64kD nuclear protein binds to RNA segments that
    include the AAUAAA polyadenylation motif. Cell, 52, 221–228.
    [46] Zarkower, D. and Wickens, M. (1988). A functionally redundant downstream sequence
    in SV40 late pre-mRNA is required for mRNA 3’ end formation and for assembly of a
    precleavage complex in vitro. J. Biol. Chem., 263, 5780–5788.
    [47] Zhao, J., Hyman, L., and Moore, C. (1999). Formation of mRNA 3’ ends in eukary-
    otes: Mechanism, regulation, and interrelationships with other steps in mRNA synthesis.
    Microbiol. Mol. Biol., 63, 405–445.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE