研究生: |
陳孜圩 Chen, Tzu-Yu |
---|---|
論文名稱: |
性狀相關之單核苷酸多型性的功能性研究:應用於肺癌研究 Looking for the functional content of SNPs associated with a trait: with application to lung cancer |
指導教授: |
張憶壽
Chang, I-Shou 熊昭 Hsiung, Chao A. |
口試委員: |
徐南蓉
Hsu, Nan-Jung 謝文萍 Hsieh, Wen-Ping 張憶壽 Chang, I-Shou 黃冠華 Huang, Guan-Hua 熊昭 Hsiung, Chao A. 盧鴻興 Lu, Henry Horng-Shing 程毅豪 Chen, Yi-Hau 鍾仁華 Chung, Ren-Hua 杜憶萍 Tu, I-Ping |
學位類別: |
博士 Doctor |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 英文 |
論文頁數: | 90 |
中文關鍵詞: | 貝氏 、表達數量性狀位點 、基因表現 、遺傳度 、生物途徑 |
外文關鍵詞: | Bayesian, eQTL, gene expression, heritability, pathway |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
儘管全基因組關聯研究已經成功地找出數千個與不同的性狀或疾病有關聯的單核苷酸多型性,對這些找到的關聯性而言,如何得到生物學上的意義卻是一個問題,因為這些找到的單核苷酸多型性中有超 90% 不是落在轉譯區。對於全基因組關聯研究,一個主要的後續研究則是去檢視這些從全基因組關聯研究找到的單核苷酸多型性是否與某些基因的表現量有關聯,這導致了所謂的表達數量性狀位點研究。我們企圖同時考慮盡可能多的基因表現資料在不同探針上以及單核苷酸多型性,也考慮了不同探針上這些基因表現之間的相關性,並且利用更多的生物資訊與使用生物途徑資料。因此我們提出了一個貝氏方法,其中在先驗分佈的部分利用了遺傳學中遺傳度的概念,以及從生物途徑資料庫,如 GO 和 KEGG,所得到的生物途徑資料。前者幫助避免了遺傳研究中常常過度保守的問題,後者則幫助提供了生物上的解釋。對於所給定的這些單核苷酸多型性,我們不僅試著找出和它們有關聯的基因,也試著找出相關聯的基因集。對於所提出的貝氏模型,我們建立了一個蒙地卡羅馬可夫鏈演算法來對我們的後驗分佈做抽樣並推論。目前我們的軟體能夠在合理的時間內同時分析約兩萬個探針及數百個單核苷酸多型性。透過模擬研究我們評估並確認了所提出方法的可行性,進而將我們的方法應用在台灣肺癌資料的分析上。這些結果顯示我們的方法能夠對於單核苷酸多型性和基因或基因集之間的關聯性提供更多在一般個別分析中不能得到的新的了解。
While genome-wide association studies (GWAS) have successfully discovered and replicated thousands of SNPs associated with various traits/diseases, it is a challenge to gain biological insights regarding this association, because over 90% of them are not in protein-coding region. One of the main approaches in the follow up of a GWAS is to examine whether these SNPs from GWAS are associated with the expression levels of certain genes. This leads to the so-called expression quantitative trait loci (eQTL) studies. It is desirable to consider as many expression probes and SNPs as possible simultaneously, to take into consideration the correlations between the expression levels at different gene probes and to make use of biological pathway information. We propose a Bayesian approach in which the prior distribution makes use of the classical heritability concept in genetics and the pathway information from Biology knowledge databases like GO and KEGG. The former helps to avoid the often too conservative practices in genomic studies and the latter helps to provide biological interpretation. The proposed approach can used to look for not only genes but also gene sets associated with a given set of SNPs. A carefully designed MCMC algorithm is proposed to sample the posterior distribution for inference. Our software can handle about 20K expression probes and several hundreds of SNPs within reasonable time. Simulation studies are conducted to evaluate the performance of this method, and we also illustrate the method in analyzing Taiwan lung cancer data. The results show that our approach provides new insights into associations between SNPs and genes/gene sets that could not be revealed in separate analysis.
1. Freedman ML, Monteiro AN, Gayther SA, Coetzee GA, Risch A, et al. (2011) Principles for the post-GWAS functional characterization of cancer risk loci. Nat Genet 43: 513-518.
2. Kendziorski CM, Chen M, Yuan M, Lan H, Attie AD (2006) Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62: 19-27.
3. Jia Z, Xu S (2007) Mapping quantitative trait loci for expression abundance. Genetics 176: 611-623.
4. Bottolo L, Petretto E, Blankenberg S, Cambien F, Cook SA, et al. (2011) Bayesian detection of expression quantitative trait loci hot spots. Genetics 189: 1449-1459.
5. Scott-Boyer MP, Imholte GC, Tayeb A, Labbe A, Deschepper CF, et al. (2012) An integrated hierarchical Bayesian model for multivariate eQTL mapping. Stat Appl Genet Mol Biol 11.
6. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10: 184-194.
7. Cheung VG, Spielman RS (2009) Genetics of human gene expression: mapping DNA variants that influence gene expression. Nat Rev Genet 10: 595-604.
8. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25-29.
9. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27-30.
10. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34: 267-273.
11. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102: 15545-15550.
12. Bauer S, Gagneur J, Robinson PN (2010) GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res 38: 3523-3532.
13. Boca SM, Bravo HC, Caffo B, Leek JT, Parmigiani G (2013) A decision-theory approach to interpretable set analysis for high-dimensional data. Biometrics 69: 614-623.
14. Newton MA, He Q, Kendziorski C (2012) A model-based analysis to infer the functional content of a gene list. Stat Appl Genet Mol Biol 11.
15. Stingo FC, Chen YA, Tadesse MG, Vannucci M (2011) Incorporating Biological Information into Linear Models: A Bayesian Approach to the Selection of Pathways and Genes. Ann Appl Stat 5: 1978-2002.
16. Guan YT, Stephens M (2011) Bayesian Variable Selection Regression for Genome-Wide Association Studies and Other Large-Scale Problems. Annals of Applied Statistics 5: 1780-1815.
17. Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of g priors for Bayesian variable selection. J Am Stat Assoc 103: 410-423.
18. Gelman A (2006) Prior distributions for variance parameters in hierarchical models(Comment on an Article by Browne and Draper). Bayesian Analysis 1: 515-533.
19. Guan YT, Krone SM (2007) Small-world mcmc and convergence to multi-modal distributions: From slow mixing to fast mixing. Ann Appl Probab 17: 284-304.
20. Gelman A, Carlin, J., Stern, H., Rubin, D. (2003) Bayesian Data Analysis, Second Edition: Chapman and Hall.
21. Sun S, Schiller JH, Gazdar AF (2007) Lung cancer in never smokers--a different disease. Nat Rev Cancer 7: 778-790.
22. Rudin CM, Avila-Tang E, Harris CC, Herman JG, Hirsch FR, et al. (2009) Lung cancer in never smokers: molecular profiles and therapeutic implications. Clin Cancer Res 15: 5646-5661.
23. Sun Y, Ren Y, Fang Z, Li C, Fang R, et al. (2010) Lung adenocarcinoma from East Asian never-smokers is a disease largely defined by targetable oncogenic mutant kinases. J Clin Oncol 28: 4616-4620.
24. Lan Q, Hsiung CA, Matsuo K, Hong YC, Seow A, et al. (2012) Genome-wide association analysis identifies new lung cancer susceptibility loci in never-smoking women in Asia. Nat Genet 44: 1330-1335.
25. Hsiung CA, Lan Q, Hong YC, Chen CJ, Hosgood HD, et al. (2010) The 5p15.33 locus is associated with risk of lung adenocarcinoma in never-smoking females in Asia. PLoS Genet 6.
26. Jou YS, Lo YL, Hsiao CF, Chang GC, Tsai YH, et al. (2009) Association of an EGFR intron 1 SNP with never-smoking female lung adenocarcinoma patients. Lung Cancer 64: 251-256.
27. Ding LH, Xie Y, Park S, Xiao G, Story MD (2008) Enhanced identification and biological validation of differential gene expression via Illumina whole-genome expression arrays through the use of the model-based background correction methodology. Nucleic Acids Res 36: e58.
28. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28: 882-883.
29. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, et al. (2007) PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559-575.
30. Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5: 155-176.
31. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, et al. (2004) The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer 91: 355-358.
32. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Jr., et al. (2013) Cancer genome landscapes. Science 339: 1546-1558.