研究生: |
廖一憲 Liao, Yi-Sian |
---|---|
論文名稱: |
從未對齊之基因序列中預測出轉錄子結合點位置 Prediction of Transcription Factor Binding Sites from Unaligned Gene Sequences |
指導教授: |
呂忠津
Lu, Chung-Chin |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 英文 |
論文頁數: | 30 |
中文關鍵詞: | 轉錄因子 |
外文關鍵詞: | Transcription factor |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
蛋白質的合成在生物的生理調控中扮演著相當重要的角色,從過去的研究中我們已經知道蛋白質的合成是從去氧核醣核酸先轉錄成核醣核酸,最後再轉譯成蛋白質。轉譯和轉錄的過程會受到許多因子的調控。轉錄因子就是指能夠結合在某基因上游特定核苷酸序列上的蛋白質,轉錄因子對於轉錄的活化有決定性的作用。所以轉錄因子結合點對於基因調控是一個重要的訊息。
對於了解基因轉錄的調控,轉錄因子結合點是一個相當重要的資訊。事實上,互補去氧核醣核酸微陣列晶片互交是一個用來辨認基因序列上轉錄因子結合點的常用工具。但是互補去氧核醣核酸微陣列晶片互交的解析度大約只有一千到兩千個鹼基左右。在這個情況下利用電腦程式從互補去氧核醣核酸微陣列晶片互交的實驗結果來找到實際的轉錄因子結合點位置,是一個相當可行的做法。
我們的目標是在轉錄因子結合點的實際長度未知的情況下,從去氧核醣核酸序列中找到實際的轉錄因子結合點位置。為了達到這個目標,我們設計了一個電腦程式。我們的做法是以互補去氧核醣核酸微陣列晶片互交的結果來找出可能性高的序列,再使用二項式分布模型從中找到許多可能的片段。最後我們利用兩個排序的分數來找到最有可能的片段。最後我們會將我們的結果跟其他程式做比較來驗證程式的可靠性。
To know the regulation of gene transcription, transcription factor binding sites (motifs) are helpful information. In fact, cDNA microarray hybridization (ChIP array) has became a popular tool for recognizing motif from gene sequences. However the ChIp array can only map the probable sequence within 1-2 kilobases resolution.
Our goal is to find out the motif binding site without the information of motif length. To reach this goal we design a computational program, base on the discriminator and binomial model to find the most possible patterns.
And we compare our performance to the program called constraint-less Cosmo [1]. From the simulation results, we can prove that our program is better than Cosmo.
[1] M. v. d. L. O. Bembom, S. Keles, “Supervised detection of conserved motifs in dna sequences with cosmo.” Stat Appl Genet Mol Biol, 2007.
[2] C. C. Lu, W. H. Yuan, and T. M. Chen, “Extracting transcription factor binding sites from unaligned gene sequences with statistical models.” BMC Bioinformatics, vol. 9, 2008.
[3] R. F. Weaver, Molecular Biology. McGraw-Hill, 2002.
[4] X. Liu, D. L. Brutlag, and J. S. Liu, “An algorithm for finding protein-dna binding sites with application to chromatin-immunoprecipitation microarray experiments.” nature biotechnology, vol. 20, pp. 835–839, 2002.
[5] T. Bailey and C. Elkan, “Unsupervised learning of multiple motif in biopolmers using expectation maximization.” Machine Learning, vol. 21, pp. 51–80, 1995.
[6] T. M. Chen, C. C. Lu, and W. S. Li, “Prediction of splice sites with dependency graphs and their expanded bayesian networks.” BMC Bioinformatics, vol. 21, pp. 471–482, 2005.
[7] C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K. D.Macisaac, T. W. Danford, N. M. Hannett, J. B. Tagne, D. B. Reynolds, J. Yoo, E. G. Jennings, J. Zeitlinger, D. K.
Pokholok, M. Kellis, P. A. Rolfe, T. KT., E. S. Lander, D. K. Gifford, E. Fraenkel, and R. A. Young, “Transcriptional regulatory code of a eukaryotic genome.” Nature, vol. 431, pp. 99–104, 2004.
[8] P. L. C. Narasimhan and E. Uberbacher, “Background rareness-based iterative multiple sequence alignment algorithm for regulatory element detection.” Bioinformatics, vol.19, pp. 1952–1963, 2003.
[9] G. G. M. Friberg, P. von Roh, “Scoring functions for transcription factor binding site
prediction.” BMC Bioinformatics, 2005.
[10] G. E. Crooks, J. M. Hon G.and Chandonia, and S. Brenner, “Weblogo: A sequence logo
generator.” Genome Research, vol. 14, pp. 1188–1190, 2004.