研究生: |
袁偉豪 Wei-Hao Yuan |
---|---|
論文名稱: |
使用統計模型從基因序列中提取出轉錄子結合點位置 Extracting Transcription Factor Binding Sites from Unaligned Gene Sequences with Statistical Models |
指導教授: |
呂忠津
Chung-Chin Lu |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 英文 |
論文頁數: | 35 |
中文關鍵詞: | 轉錄因子連結點 、統計模型 、貝氏網路 |
外文關鍵詞: | transcription factor binding site, statistical model, Bayesian networks |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
蛋白質的合成是生物的生理反應中最為重要的步驟,經研究的結果我們知道,脫氧核糖核酸序列經過轉錄、轉譯的作用之後,會合成出生理反應所需要的蛋白質產物。而轉錄、轉譯的作用會被特定的轉錄因子連結點所調控,這些連結點會影響到脫氧核糖核酸序列是否會合成出相對的蛋白質產物,所以轉錄因子的聯接點在調控生物的生理反應中有著非常關鍵的位置。
設法找出不同物種的各種轉錄因子連結點是目前生物資訊領域中一個很重要的研究方向。近來,由於技術的進步,已經可以利用脫氧核糖核酸微陣列互交在相對基因組位置分析的方法來找出有被轉錄因子調控的基因序列,但遺憾的是這種實驗方法只能找出一段大略的轉錄因子調控區間,但卻無法準確的找出真正的轉錄因子連結點位置。因此,我們希望能利用統計的方式來準確的找出真正的連結點位置。
在這篇畢業論文中,我們撰寫了一個可以找出在基因組位置分析的基因序列中特定的轉錄因子聯接點的程式,所使用的方法是以二項式分布機率模型的統計性找出最為顯著的基因樣式來建立一開始的搜尋位置,並結合關聯圖及其展開的貝氏網路和吉氏取樣方法反覆地的搜尋出最為可能的轉錄因子聯接點。接者,我們先收集已知的轉錄因子連結點資料,再將我們的結果和其他方法的結果做比較。在各種方法中,我們的程式在和其他方法比較下有著較佳的表現。
Transcription factor binding sites (motifs) are crucial in the regulation of the gene transcription. Recently, the chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP array) have been used to identify potential regulatory sequences, but the procedure can only map the probable protein-DNA interaction loci within 1-2 kilobases resolution. To find out the exact binding motifs, it is necessary to build a computational method to examine the ChIP-array binding sequences and search for possible motifs representing the transcription factor binding sites. In this thesis, we design a program to find out accurate motif sites in the yeast genome with dependency graphs and their expanded Bayesian networks. The program incorporates with the binomial probability model to build significant
initial motif sets. Finally, we compare our results with those obtained from famous programs and show that our program outperforms these program in the consistence with known specificities.
[1] H. T. Robert, Priciple of Genetics. McGraw-Hill, 2002.
[2] R. F. Weaver, Molecular Biology. McGraw-Hill, 2002.
[3] B. Ren, R. F., J. J. Wyrick, A. O., E. G. Jennings, I. Simon, S. J. Zeitlinger, J. and,N. Hannett, E. Kanin, T. Volkert, C. Wilson, S. P. Bell, and R. A. Young, Genome-wide location and function of dna binding protein." Science, vol. 290, pp. 2306-2309,2000.
[4] C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K. D. Macisaac, T. W. Danford,N. M. Hannett, J. B. Tagne, D. B. Reynolds, J. Yoo, E. G. Jennings, J. Zeitlinger, D. K.
Pokholok, M. Kellis, P. A. Rolfe, T. KT., E. S. Lander, D. K. Giord, E. Fraenkel, and R. A. Young, Transcriptional regulatory code of a eukaryotic genome." Nature, vol.
431, pp. 99-104, 2004.
[5] C. E. Lawrence, S. F. Altschul, M. S. Bogouski, J. S. Liu, A. F. Neuwald, and J. C.
Wooten, Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment." Science, vol. 262, pp. 208-214, 1993.
[6] M. Zhang and T. Marr, A weight array method for splicing signal analysis." Comput.Appl.Biosci., vol. 9, pp. 499-509, 1993.
[7] X. Liu, D. L. Brutlag, and J. S. Liu, An algorithm for finding protein-dna binding sites
with application to chromatin-immunoprecipitation microarray experiments." nature biotechnology, vol. 20, pp. 835-839, 2002.
[8] W. Thompson, E. C. Rouchka, and C. E. Lawrence, Gibbs recursive sampler: finding transcription factor binding sites." Nucleic Acids Res., vol. 20, pp. 3580-3585, 2003.
[9] D. B. Gordon, L. Nekludova, S. McCallum, and E. Fraenkel, Tamo: a °exible, object-oriented framework for analyzing transcriptional regulation using dna-sequence motifs."Bioinformatics, vol. 21, pp. 3164-3165, 2005.
[10] T. Bailey and C. Elkan, Unsupervised learning of multiple motif in biopolmers using
expectation maximization." Machine Learning, vol. 21, pp. 51-80, 1995.
[11] J. van Helden, B. Andre, and Collado-Vides, Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies."
J. Mol. Biol., vol. 281, pp. 827-842, 1998.
[12] J. Liu, A. F. Neuwald, and C. E. Larence, Bayesian models for multiple local sequence alignment and gibbs sampling strategies." American Statistical Association, vol. 90, pp.1156-1170, 1995.
[13] A. F. Neuwald, J. S. Liu, and C. E. Lawrence, Gibbs motif sampling: Detection of bacterial outer membrane protein repeats." Protein Science, vol. 4, pp. 1618-1632, 1995.
[14] G. E. Crooks, J. M. Hon G.and Chandonia, and S.Brenner, Weblogo: A sequence logo generator." Genome Research, vol. 14, pp. 1188-1190, 2004.
[15] K. D. MacIsaac, T.Wang, D. B. Gordeon, D. K. Giord, G. D. Stormo, and E. Fraenkel, An improved map of onserved regulatory sites for saccharomyces cerevisiae," BMC
Bioinformatics, 2006.
[16] S. Barbaric, M. Munsterkotter, J. Svaren, and W. Horz, The homeodomain protein pho2 and the basic-helix-loop-helix protein pho4 bind dna cooperatively at the yeast
pho5 promoter." Nucleic Acids Res., vol. 24, pp. 4479-4486, 1996.
[17] J. Pearl, Probabilitic reasoning in intelligent system: Network of plausible inference." Morgan Kaufmann Publishers Inc, 1988.
[18] W. J. Ewens and G. G. R., Statistical methods in bioinformatics : An introduction." New York: Springer-Verlag, 2001.
[19] T. M. Chen, C. C. Lu, and W. H. Li, Prediction of splice sites with dependency graphs and their expanded bayesian networks." Bioinformatics, vol. 21, pp. 471-482, 2004.