研究生: |
張兆中 Chao-Chung Chang |
---|---|
論文名稱: |
使用關聯圖及其貝氏網路展開實現多聚腺苷酸化點之模型 Modeling Polyadenylation Signal with Dependency Graphs and Their Expanded Bayesian Networks |
指導教授: |
呂忠津
Chung-Chin Lu |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 多聚腺苷酸化 、關聯圖 、貝氏網路展開 |
外文關鍵詞: | polyadenylation, dependency graph, expanded bayesian network |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
預測人類的新基因是目前生物資訊領域中的一個重要課題,其終極目標便是於人類的三十億個鹼基對中,標記出每個基因特定的結構。其中,多聚腺苷酸化點(polyadenylation site)是位於基因結尾的一種結構,在這個點出現的位置,前信使核糖核酸(pre-mRNA)將會精確地發生核酸內切,並在切開的位置上形成多腺苷酸尾巴,而多腺苷酸尾巴幾乎在所有成熟的真核生物前信使核糖核酸的末端都有發現;正由於這種普遍性,我們選定多聚腺苷酸化點為我們預測人類基因結構的第一步。
多聚腺苷酸化點的辨識,最少被兩個訊號所調控,分別是 (1) 多腺苷酸訊號(PAS),位於裂縫/多聚腺苷酸化點的上游,約距離十到三十個核苷酸,且擁有一個高度保留的AAUAAA五聚物 (或是一個常見的變化型:AUUAAA) ;(2) 下游元素(DE),位於裂縫/多聚腺苷酸化點的下游,約相距二十到四十個核苷酸上,是由特徵較不顯著但已知富含尿嘧啶(U)或尿嘧啶-鳥嘌呤(G-U)的序列所組成。
在這篇畢業論文中,我們撰寫了一個可以用以預測人類多聚腺苷酸化點的程式,而所使用的方法是將多聚腺苷酸化點分為多腺苷酸訊號及下游元素的訊號這兩部分,並採用關連圖及其展開的貝氏網路來建立數學模型。接著,我們使用GeneBank的人類基因資料,將我們的程式與POLYAH及ERPIN這兩個著名的程式針對預測的正確性作比較,最後的結果顯示,我們的程式表現的結果最佳。
Currently, one of the important issues in bioinformatics is the prediction of novel genes in
human genome. Genes with specifc structures are the targets for annotation in the three
billions base-pairs of the human genome. Polyadenylation site, a structure at the terminus
of a gene, involves a precise endonucleolytic cleavage of the pre-mRNA followed by synthesis
of the polyA tail which is found at the 3' end of nearly every mature eukaryotic mRNA.
The recognition of polyadenylation site is governed by at least two signals : One is 10-30
nucleotides upstream to the cleavage/polyadenylation site and named as polyA signal (PAS),
a highly conserved hexamer AAUAAA (and the common variant AUUAAA). The other is
20-40 nucleotides downstream to the cleavage/polyadenylation site, the downstream element
(DE) consisting of a much less well-characterized U or G-U rich sequence.
In this thesis, we will provide a program for the prediction of human polyadenylation
site by the detection of the PAS signal and the DE signal with dependency graphs and their
expanded Bayesian networks. Then we will compare the accuracy of prediction with famous
programs POLYAH and ERPIN, and show that our program performs the best results in
the polyadenylation dataset of GeneBank.
Barabino, S. M., Hubner, W., Jenny, A., Minvielle, S. L., and Keller, W. (1997). The 30-
kd subunit of mammalian cleavage and polyadenylation speci‾city factor and its yeast
homolog are rna-binding zinc ‾nger proteins. Gene & Dev., 11, 1703{1716.
Berget, S. M. (1984). Are u4 small nuclear ribonucleoproteins involved in polyadenylation?
Nature, 309, 179{182.
Bienroth, S., Wahle, E., Satler, C. C., and Keller, W. (1991). Puri‾cation of the cleavage
and polyadenylation factor incolved in 3' processing of mrna precursors. J. Biol. Chem.,
266, 19768{19776.
Burge, C. and Karlin, S. (1997). Prediction of complete gene structures in human genomic
dna. J. Mol. Biol., 268, 78{94.
Chen, F., MacDonald, C. C., and Wilusz, J. (1995). Cleavage site determinants in the
mammalian polyadenylation signal. Nucleic Acids Res., 23, 2614{2620.
Chen, T. M. (2002). Modeling splice sites with dependency graphs and their approximation
by bayesian networks. Master's thesis, National Tsing Hua University.
Chou, Z. F., Chen, F., and Wilusz, J. (1994). Sequence and position requirements for
uridylate-rich downstream elements of polyadenylation signals. Nucleic Acids Res., 13,
2525{2531.
Colgan, D. F. and Manley, J. L. (1997). Mechanism and regulation of mrna polyadenylation.
Genes Devel., 11, 2755{2766.
42
Edmonds, M. and Abrams, R. (1960). Polynucleotide biosynthesis : Formation of a sequence
of adenylate units from adenosine triphosphate by an enzyme form thymus nuclei. J.
Biol. Chem., 235, 1142{1148.
Ewens, W. J. and Grant, G. R. (2001). Statistical Methods in Bioinformatics : An Introduc-
tion. New York: Springer-Verlag.
Gautheret, D. and Lambert, A. (2001). Direct rna motif de‾nition and identi‾cation from
multiple sequence alignments using secondary structure pro‾les. J. Mol. Biol., 313,
1003{1011.
Kondrakhin, Y. V., Shamin, V. V., and Kolchanov, N. A. (1994). Construction of a gen-
eralized consensus matrix for recognition of vertebrate pre-mrna 3'-terminal processing
sites. Comput. Appl. Biosci., 10, 597{603.
Legendre, M. and Gautheret, D. (2003). Sequence determinants in human polyadenylation
site selection. BMC Genom., 4(1), 7.
Lewis, J. S., Gunderson, and Mattaj, I. W. (1995). The in°uence of 5' and 3' end structures
on pre-mrna metabolism. J. Cell Sci. (Suppl.), 19, 13{19.
MacDonald, C., Wilusz, J., and Shenk, T. (1994). The 64-kilodalton subunit of cstf
polyadenylation factor binds to pre-mrnas downstream of the cleavage site and in°uences
cleavage site location. Mol. Cell. Biol., 14, 6647{6654.
MacDonald, C. C. and Redondo, J. L. (2002). Reexamining the polyadenylation signal :
were we wrong about aauaaa? Mol. Cell. Endo., 190, 1{8.
Margarita, I. Z., Iryna, M. K., Andriy, L. P., and Dmytro, M. H. (2003). Downstream
elements of mammalian pre-mrna polyadenylation signals : primary, secondary and
higher-order structures. Nucleic Acids Res., 31, 1375{1386.
McCracken, S., Fong, N., Yankulov, K., Ballantyne, S., Pan, G., Greenblatt, J., Patterson,
S. D., Wickens, M., and Bentley, D. L. (1997). The c terminal domain of rna polymerase
ii couples mrna processing to transcription. Nature, 385, 357{361.
43
McLauchlan, J., Ga®ney, D., Whitton, L., and Clements, J. B. (1985). The consensus
sequence ygtgttyy located downstream from the aauaaa signal is required for e±cient
formation of 3' termini. Nucleic Acids Res., 13, 1347{1368.
Murthy, K. G. K. and Manley, J. L. (1992). Characterization of the multisubunit cleavage
and polyadenylation factor from calf thymus. J. Biol. Chem., 267, 14804{14811.
Pesole, G., Liuni, S., and Saccone, C. (1997). Structural and compositional features of
untranslated regions of eukaryotic mrnas. Gene, 205, 95{102.
Proudfoot, N. J. and Brownlee, G. G. (1976). 3' non-coding region sequences in eukaryotic
messenger rna. Nature, 263, 211{214.
Sachs, A. A., Sarnow, P., and Hentze, M. W. (1997). Starting at the beginning middle and
end:translation initiation in eukaryotes. Cell, 89, 831{838.
Salamov, A. A. and Solovyev, V. V. (1994). Recognition of 3'-processing sites of human
mrna precursors. Comput. Appl. Biosci., 10, 23{28.
Sheets, M. D., Ogg, S. C., and Wickens, M. (1990). Point mutations in aauaaa and poly(a)
addition site : e®ects on the accuracy and e±ciency of cleavage and polyadenylation in
vitro. Nucleic Acids Res., 18, 5799{5805.
Tabaska, J. E. and Zhang, M. Q. (1999). Detection of polyadenylation signals in human dna
sequences. Gene, 231, 77{86.
Takagaki, Y. and Manley, J. L. (1997). Rna recognition by the human polyadenylation factor
cstf. Mol. Cell. Biol., 17, 3907{3914.
Wahle, E. (1995). 3'-end cleavage and polyadenylation of mrna precursors. Biochim. Biophy.
Acta, 1261, 183{194.
Weaver, R. F. (2002). Molecular Biology. McGraw-Hill.
Wickens, M., Anderson, P., and Jackson, R. J. (1997). Life and death in the cytoplasm:
Messages from the 3' end. Curr. Opin. Genet. Dev., 7, 202{232.
44
Wickens, M. and Stephenson, P. (1984). Role of the conserved aauaaa sequence : four aauaaa
point mutants prevent messenger rna 3' end formation. Science, 226, 1045{1051.
Zhao, J., Hyman, L., and Moore, C. (1999). Formation of mrna 3' ends in eukaryotes :
mechanism, regulation, and interrelationships with other steps in the mrna synthesis.
Microbio. Mol. Bio. Rev., 63, 405{445.