研究生: |
劉怡瑜 Liu, Yi-Yu |
---|---|
論文名稱: |
Identifying Prostate Cancer-related Networks from Microarray Data Based on Genotype-Phenotype Networks Using Markov Blanket Search 基於基因與表現型態和馬可夫搜尋法從生物晶片中辨識前列腺癌相關網路 |
指導教授: |
蘇豐文
Soo, Von-Wun |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 43 |
中文關鍵詞: | 前列腺癌 、生物晶片資料 、蛋白質交互作用網路 、馬可夫搜尋法 、表現型態網路 |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
癌症是一種基因突變所造成且複雜的疾病,而且目前只有少數的人類基因被提出來可能為致癌基因。在分子基因學領域當中,有許多研究已經驗證不同癌症的一些重要基因以及辨識與基因表現資料符合的有意義的生物網路去了解細胞的生物機制是一個重要的議題。我們整合表現型態網路、蛋白質網路和應用貪心式馬可夫搜尋法有效地利用基因表現資料和蛋白質交互作用網路去辨識人類疾病有意義的網路。我們利用前列腺癌資料作為我們的研究範圍。在與統計的方法比較當中,如t-test檢定和wilcoxon檢定,我們的方法比目前已有的資料庫和文獻辨識出更多與前列腺癌相關的基因。我們辨識的疾病相關基因有更高的準確率以及至少高於1.5倍的F-measure。功能的模組包括前列腺癌是一個過量表現的Interleukin-type、insulin-like生長因子和RAS相關致癌基因都由我們的方法辨識出來。細胞訊息傳遞、免疫防禦、細胞週期和細胞激素交互作用標準路徑也被發現跟前列腺癌相關。我們提出來的方法有效地利用基因表現資料、表現型態和蛋白質網路去辨識想要探討的疾病的子網路和基因。那些有意義的基因和結合的網路可能是了解前列腺癌機制的對象。我們的方法將可以更有效且正確去整合生物晶片資料和表現型態網路去辨識疾病相關的網路。
關鍵字:前列腺癌、生物晶片資料、蛋白質交互作用網路、馬可夫搜尋法、表現型態網路
Many researches in molecular genetics area have identified a number of important genes of various types of cancers and the identification of significant biological networks corresponding to gene expression data has been an important issue in understanding underlying biological mechanisms of cells. We integrate phenotype networks, protein networks and apply a greedy Markov blanket search method that efficiently utilizes both gene expression data and protein-protein interaction networks to identify significant networks as well genes for a human disease. We use prostate cancer data as our test domain. In comparison with such statistical methods as t-test and wilcoxon test, our method identifies more prostate cancer-related genes than those reported in published database and literature. We identify disease-related genes with higher precision and at least 1.5 fold higher F-measure. The functional modules involved in the prostate cancer is over-expressed Interleukin-type, insulin-like growth factors and well-known RAS related oncogenes are identified by our method. Cell signaling, immune response, cell cycle and cytokine interactions canonical pathways are also found to be significantly related to prostate cancer. Our proposed methods efficiently utilize gene expression, phenotype and protein networks in identifying the sub-networks and genes that might be related to the disease under interest. Those significant genes and the associated networks may be the subjects to understand the mechanism of prostate cancer. Our method would be more powerful and accurate to integrate the microarray data and the phenotype network for identifying the disease-related genes and networks.
Keywords: Prostate Cancer, Microarray data, Protein-protein interaction networks, Markov Blanket search, Phenotype networks
[1] Vogelstein B, Kinzler KW:Cancer genes and the pathways they control. Nature Med 2004,10(8):789–799.
[2] Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR:A census of human cancer genes.Nature Rev Cancer 2004,4(3):177–183.
[3] Huang SM, Harari PM:Epidermal growth factor receptor inhibition in cancer therapy:biology, rationale and preliminary clinical results. Invest New Drugs 1999,17(3):259-269.
[4] Strausberg RL, Simpson AJ, Wooster R: Sequence-based cancer genomics: progress,lessons and opportunities. Nature Rev Genet 2003,4(6):409–418.
[5] Miki Y, Swensen J, Shattuck-Eidens D, Futreal PA, Harshman K, Tavtigian S, Liu Q, Cochran C, Bennett LM, Ding W, et al.: A strong candidate for the breast and ovarian ancer susceptibility gene BRCA1. Science 1994,266(5182):66–71.
[6] Wooster R, Bignell G, Lancaster J, Swift S, Seal S, Mangion J, Collins N, Gregory S, Gumbs C, Micklem G: Identification of the breast cancer susceptibility gene BRCA2. Nature 1995,378(6559):789-792.
[7] Oldenburg RA, Meijers-Heijboer H, Cornelisse CJ, Devilee P:Genetic susceptibility for breast cancer: how many more genes to be found? Crit Rev Oncol Hematol 2007, 63(2):125-149.
[8] Oti M, Brunner HG:The modular nature of genetic diseases.Clin Genet 2007, 71(1):1-11.
[9] Zhu X, Gerstein M, Snyder M:Getting connected: analysis and principles of biological networks. Genes Dev 2007, 21(9):1010–1024.
[10] George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA:Analysis of protein sequence and interaction data for candidate disease gene prediction.Nucleic Acids Research 2006,34(19):e130.
[11] Feldman I, Rzhetsky A, Vitkup D:Network properties of genes harboring inherited disease mutations.Proc Natl Acad Sci USA 2008,105(11):4323–4328.
[12] Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL:The human disease network.Proc Natl Acad Sci USA 2007,104(21):8685–8690.
[13] Xu J, Li Y: Discovering disease-genes by topological features in human protein-protein interaction network.Bioinformatics 2006, 22(22):2800–2805.
[14] Lim J, Hao T, Shaw C, Patel AJ, Szabó G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE, Barabási AL, Vidal M, Zoghbi HY: A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration.Cell 2006, 125(4):801–814.
[15] Oti M, Snel B, Huynen MA, Brunner HG: Predicting disease genes using proteinprotein interactions.J Med Genet 2006, 43(8):691–698.
[16] van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA:A text-mining analysis of the human phenome. Eur J Hum Genet 2006,14(5):535–542.
[17] Brunner HG, van Driel MA:From syndrome families to functional genomics.Nature Rev Genet 2004,5(7):545-551.
[18] Wood LD,Parsons DW,Jones S,et al.:The genomic landscapes of human breast and colorectal cancers.Science 2007,318(5853):1108–1113.
[19] Lage K, Karlberg EO, Størling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tümer Z, Pociot F, Tommerup N, Moreau Y, Brunak S: A human phenome-interactome network of protein complexes implicated in genetic disorders.Nature Biotechnol 2007, 25(3):309–316.
[20] Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási AL:The large-scale organization of metabolic networks.Nature 2000, 407(6804):651–654.
[21] Watts DJ, Strogatz SH:Collective dynamics of “small-world” networks.Nature 1998, 393(6684):440–442.
[22] Vanunu O,Sharan R:A Propagation-based Algorithm for Inferring Gene-Disease Associations.German Conference on Bioinformatics, 2008.
[23] Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES:Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring.Science 1996,286(5439):531 – 537.
[24] Chen Y, Dougherty ER,Bittner ML: Ratio-Based Decisions and the Quantitative Analysis of cDNA Microarray Images.Journal of Biomedical Optics 1997, 2 (4):364–374.
[25] Reich M, Ohm K, Angelo M, Tamayo P, Mesirov JP:GeneCluster 2.0: an advanced toolset for bioarray analysis.Bioinformatics 2004, 20(11):1797-1798.
[26] Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla-Favera R, Califano A.:A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas. Mol Syst Biol 2008, 4:169.
[27] Karni S, Soreq H, Sharan R:A network-based method for predicting disease-causing genes. J Comput Biol 2009, 16(2):181–189.
[28] Akutsu T,Miyano S,Kuhara S:Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. In Proceedings of the fourth annual international conference on Computational molecular biology.New York, NY, USA, 2000:8–14.
[29] Hoon M,Imoto S,Miyano S:Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data Using Differential Equations. In Proceedings of the 5th International Conference on Discovery Science, London, UK: Springer-Verlag 2002:267–274.
[30] Friedman N,Linial M,Nachman I,Pe’er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology 2000, 7(3):601–620.
[31] Krämer N, Schäfer J, Boulesteix AL:Regularized estimation of large-scale gene association networks using graphical Gaussian models.BMC Bioinformatics 2009, 10:384.
[32] Abbeel P, Koller D, Ng AY:Learning factor graphs in polynomial time sample complexity. Journal of Machine Learning Research 2006, 7:1743-17.
[33] Roy S, Lane T, Werner-Washburne M:Learning structurally consistent undirected probabilistic graphical models. Proc Int Conf Mach Learn. 2009, 382:905-12.
[34] Lapointe J, Li C, Higgins JP, van de Rijn M, Bair E, Montgomery K, Ferrari M, Egevad L, Rayford W, Bergerheim U, Ekman P, DeMarzo AM, Tibshirani R, Botstein D, Brown PO, Brooks JD, Pollack JR:Gene expression profiling identifies clinically relevant subtypes of prostate cancer. The National Academy of Sciences. 2004, 101(3):811–6.
[35] Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, Eisen MB, Spellman PT, Brown PO, Botstein D, Cherry JM:The Stanford Microarray Database. Nucleic Acids Research 2001, 29(1):152-5.
[36] McKusuck VA:Mendelian inheritance in man and its online version. Am J Hum Genet. 2007, 80(4):588-604.
[37] Amberger J, Bocchini CA, Scott AF, Hamosh A:McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009, 37(Database issue): D793–6.
[38] Liu HC, Arias CR, Soo VW:BioIR: An approach to public domain resource integration of human protein-protein interaction. 2009.
[39] Wu X, Jiang R, Zang MQ, Li S:Network-based global inference of human disease genes. Molecular Systems Biology 2008, 4:189.
[40] Driel MAV, Bruggeman J, Vriend G, Brunner HG, Leunissen JAM:A text-mining analysis of the human phenome. European Journal of Human Genetics 2006, 14:535–42.
[41] Gentle JE:Numerical Linear Algebra for Applications in Statistics. Springer-verlag, New York, NY, 1998.
[42] Gokhale DV, Ahmed NA, Res BC, Piscataway NJ:Entropy Expressions and Their Estimators for Multivariate Distributions. IEEE Transactions on Information Theory 1989, 35(3):688–92.
[43] Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M:The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 1(Database issue):D277-80.
[44] Li LC, Zhao H, Shiina H, Kane CJ, Dahiya R:PGDB: a curated and integrated database of genes related to the prostate. Nucleic Acids Res 2003, 31(1):291-3.
[45] Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP:Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 2005, 102(43):15545-50.
[46] Savli H, Szendröi A, Romics I, Nagy B:Gene network and canonical pathway analysis in prostate cancer: a microarray study. Experimental and Molecular Medicine 2008, 40(2):176-85.