研究生: |
鄭世梧 Shih-Wu Cheng |
---|---|
論文名稱: |
生物晶片推導基因調節網路基於轉錄因子分析與條件獨立 Inferring Gene Regulatory Networks from Microarray Data Based on Transcription Factor Analysis and Conditional Independency |
指導教授: |
蘇豐文
Von-Wun Soo |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 英文 |
論文頁數: | 63 |
中文關鍵詞: | 基因調節網路 、貝氏網路 、條件獨立 、轉錄因子 |
外文關鍵詞: | Gene regulatory networks, Bayesian Network, Conditional independence, Transcription factor |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
了解基因間調節反應,可以幫助生物學家應用在許多重要的藥物研究上,包括了藥物目標的檢証與藥物的發展。但是,目前基因間的相互調節關係尚未被完全了解出來。雖然生物晶片的技術可以用來量測大量的基因表現,但是它也含有因為雜訊所造成的消失的大量基因表現量。如何從生物晶片資料中推導出正確的基因調節網路對於生物學家來說是一個很大的挑戰。
在本論文中,我們提供一個流程—生物晶片推導基因調節網路基於轉錄因子分析與條件獨立。建立基因調節網路的系統包含三個主要的元件:(一)資物晶片資料前處理步驟、(二)生物知識處理步驟、(三)修改與推導步驟。我們利用貝氏定理中的d-separate準則和條件獨立的觀念來重建基因調節網路。之後,我們使用一些法則來推導在基因調節網路中的反應(增進或是抑制)連線,系統也結合網路上的生物資料庫與工具來自動擷取脫氧核糖核酸(DNA)序列的啟動子去預測能夠調節基因表現的轉錄因子。我們可以視覺性的呈現重建好基因調節網路好讓生物晶片的分析結果可以容易理解。
我們從史丹佛大學所公開的生物晶片資料庫分析了兩種不同的生物晶片資料。我們用兩個策略去評估結果:(一)網路拓撲方法與(二)文獻驗證方法。最後的評估結果顯示我們所重新建立出的基因調節網路可以偵測到可能的癌症基因。
Understanding of gene regulation provides biologists applied in many important drug study including drug target identification and drug development. However, gene regulatory relationships are not yet well understood now. Although Microarray technology provides a large-scale measurement of gene expressions, it also contains large missing gene expressions because of noisy. How to infer correct gene regulatory networks from microarray data is a big challenge to biologists.
In this thesis, we present a workflow for inferring gene regulatory networks from micorarray data based on transcription factor analysis and the conditional independence. The system that constructs gene regulatory networks consists of three components: (I) Microarray data preprocessing, (II) Biological knowledge processing, and (III) Revising and inferring process. We reconstruct the gene regulatory networks using d-separate criteria and conditional independency in Bayesian Network. And then, we use rules to infer interactions (activation or inhibition) of links in the gene regulatory network. The system also integrates the bioinformatics toolkits and databases to automatically extracts the promoter regions of DNA sequences to predict the transcription factors that regulate the gene expressions. We visualize the reconstructed gene regulatory network that the analyzed microarray results are easy to understand.
We analyze two microarray datasets from Stanford Microarray Database. We use two approaches to evaluate our result: (1) the topology of network and (2) verifications based on literature reports. The evaluated results show that the gene regulatory networks we reconstructed could support the process of detecting possible cancer genes.
1. Dudoit S, Yang YH, Callow MJ, Speed TP: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 2002, 12:111-139.
2. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Jr JH, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403(3):503-511.
3. Yeung KY, Medvedovic M, Bumgarner RE: Clustering geneexpression data with repeated measurements. Genome Biol 2003, 4:R34
4. Brazma, A., and Vilo, J.: ‘Gene expression data analysis’, FEBS Lett.,2000, 480, pp. 17–24
5. Quackenbush, J.: ‘Computational analysis of microarray data’, Nat.Rev. Genet., 2001, 2, pp. 418–427
6. Butte, A.: ‘The use and analysis of microarray data’, Nat. Rev. Drug Discov., 2002, 1, (12), pp. 951–960
7. Gene Expression Omnibus - NCBI, http://www.ncbi.nlm.nih.gov/geo/
8. Stanford Microarray database, http://genome-www5.stanford.edu/
9. UNC Microarray database,https://genome.unc.edu/
10. MUSC database, http://proteogenomics.musc.edu/ma/musc_madb.php?page=home&act=manage
11. ArrayExpress at EBI, http://www.ebi.ac.uk/arrayexpress/#ae-main[0]
12. Ensembl, http://www.ensembl.org/index.html
13. TFSEARCH-http://www.cbrc.jp/research/db/TFSEARCH.html
14. ExPASy, http://au.expasy.org/
15. TRANSFAC, http://www.gene-regulation.com/info/plant.html
16. Verma, T., and Pearl, J. An algorithm for deciding if a set of observed independencies has a causal explanation. Proceedings of the 8th Conference on Uncertainty in Artificial Intelligence, 323-30, 1992
17. Cheng J, Bell D A, Liu W. An Algorithm for Bayesian Belief Network Construction from Data. AI & STAT'97, Florida, 1997
18. Acid, S., and Campos, L.M., An Algorithm for Finding Minimum d-Separating Sets in Belief Networks. Proceedings of UAI'96, 1996.
19. David Bell and Weiru Liu. Learning Bayesian Network form Data: An Efficient Approach Based on Information Theory. Faculty of Informatics, University of Ulster, November 1, 2001
20. Richard E. Learning Bayesian networks. 2004
21. Tatsuya Akutsu et al., Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function, Annual Conference on Research in Computational Molecular Biology, 2000
22. Michiel J, et al, Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data Using Differential Equations, Discovery science, 2002
23. Friedman N, et al, Using Bayesian network to analyze expression data. JCompBiol , 2000
24. Mayo M, et al, Learning Petri net models of non-linear gene interactions. Biosystems., 2005
25. Eran Segal, et al., From Promoter Sequence to Expression: A Probabilistic Framework, RECOMB, 2002
26. Phillip P. Le, Amit Bahl , and Lyle H. Ungar, Using prior knowledge to improve genetic network reconstruction from microarray data. In Silico Biology 4, 0027 (2004).
27. Kalapanulak, S., Meechai, A., Cheevadhanarak, S., and Bhumiratana, S. Gene cluster regulatory network to drug target identification by using transcriptional profile of Plasmodium falciparum. 12th International Conference on Intelligent Systems for Molecular Biology (ISMB2004) and 3rd European Conference on Computational Biology (ECCB 2004), UK. 2004
28. Olga Troyanskaya, et al, Missing value estimation methods for DNA microarrays, bioinformatics, 2001
29. Kim, K.Y., Kim, B.-J., and Yi, G.-S. (2004). Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinformatics 5:160.
30. Dealing with gene expression missing data, IEE Proceedings - Systems Biology -- May 2006 -- Volume 153, Issue 3, p. 105-119
31. Gene ontology-http://www.geneontology.org/
32. Anatolij P. Potapov, et al, Topology od Mammalian Transcription Networks, Genome Informatics, 2005
33. Albert-Laszlo Barabasi, et al, Network biology: Understanding the cell’s functional organization, GENETICS, 2004
34. atlasgeneticsoncology , http://atlasgeneticsoncology.org/Genes/Geneliste.html
35. Z. Huang, J. Li, H. Su, G.S. Watts, H. Chen, Large-scale Regulatory Network Analysis from Microarray Data: Modified Bayesian Network Learning and Association Rule Mining, Decision Support Systems, 2006
36. Y. Tamada, et al, Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection, Bioinformatics, 2003
37. Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci USA.101:811–816, 2004
38. K. Munagala, R. Tibshirani, P.O. Brown, Cancer characterization and feature set extraction by discriminative margin clustering. BMC. Bioinformatics. 2004, 21.
39. D. J. Watts and S. H. Strogatz, Collective dynamics of “small-world networks”. Nature, vol. 393, pp. 440-442, 1998