簡易檢索 / 詳目顯示

研究生: 簡立欣
Chien, Li-Hsin
論文名稱: 利用貝氏變數選取進行全基因組與存活時間之關聯研究
A Bayesian Variable Selection Approach to Genome-Wide Association Studies with Survival Outcome
指導教授: 張憶壽
Chang, I-Shou
熊昭
Hsiung, Chao A.
口試委員: 張憶壽
熊昭
程毅豪
黃冠華
鄭又仁
謝文萍
盧鴻興
鍾仁華
學位類別: 博士
Doctor
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 68
中文關鍵詞: 貝氏變數選取遺傳度全基因組關聯研究存活時間
外文關鍵詞: Bayesian Variable Selection, Heritability, GWAS, Survival
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 將存活時間作為性狀來進行全基因組關聯性研究,在流行病學及藥物基因體學等都是一個重要的問題。例如,採用特定藥物或特定治療下,與疾病復發時間相關的基因,可以幫助我們找出比較適合此種特定藥物或治療的族群。大部分的全基因組關聯研究所採用的方式都是採用對單核苷酸多型性 (Single Nucleotide Polymorphism, SNP) 做個別分析,但這樣的方法在統計上並不是個能有效率的應用資料資訊的方式。此外,這樣的方法也往往有遺失遺傳度的問題。本篇論文提出了一個貝氏變數選取模型的統計方法,對全基因組與存活時間之關聯性進行多變量分析,並設計出一個可以代表遺傳度的變數,部分解決了單點分析中遺失疑傳度的問題。本論文將Guan and Stephen (2011) 的貝氏變數選取回歸模型,擴充到可以涵蓋以存活時間為性狀的全基因組關聯研究上。我們利用 Weibull 分配,配合比例風險回歸模型,建構出對變異程度可被解釋的比例 (proportional of variance explained, PVE) 的估計。本文採用貝氏的推論方式,估計PVE、及SNP和存活時間相關機率的事後分配。在我們的研究成果當中,關於PVE的研究成果可以幫助我們設計及訂定接下來的遺傳學研究方向,而SNP和存活時間相關機率的事後分配,也是一個相較於 p-value,較能表達SNP與存活時間相關性的統計量。在方法上,我們為針對這類的問題設計MCMC來進行估計,並利用模擬資料說明模型的估計效果,及展示相較於單點估計方式的較高統計檢定力。我們也將此方法應用在關於女性初經時間的全基因組關聯研究上,特別的,我們得到 PVE 的信賴區間為 (0.339, 0.401),略低於文獻上的遺傳度 0.50。


    Genome-wide association studies (GWAS) using survival time as phenotype deserve attention. Important examples include time to progression or recurrence free survival of a cancer patient underwent a specific treatment and onset time of certain disease or biological event. Most existing GWAS utilize single SNP analysis that does not model the problem properly and hence is not statistically efficient. Moreover, while GWAS results are often reproducible, the discoveries can explain only small amount of heritability. This dissertation proposes a Bayesian variable selection approach to GWAS with survival outcome by utilizing Weibull regression model, in which the parameter describing the proportion of the variance of the survival phenotype explained by the covariates (PVE) admits an analytic form. Treating GWAS as a Bayesian variable selection problem, we extend Bayesian variable selection regression (BVSR) for GWAS using multiple linear regression [1]. In particular, we compute posterior distribution of PVE and posterior inclusion probability of each SNP for inference. The former is useful in planning future genetic studies. The latter describes the confidence of the association results. A carefully designed MCMC algorithm is used to sample the posterior distribution. Simulation studies show that both PVE and PIP (posterior inclusion probability) can be studied successfully and this method outperforms the single SNP analysis methods in terms of the plot of the number of true positive findings versus the number of false positive findings. We illustrate the method by studying the association between SNPs and age at menarche based on healthy female in Taiwan. In particular, we get the 90% credible interval of PVE being (0.339, 0.401), smaller than the reported heritability of 0.50.

    1. Introduction 1 1.1 Bayesian variable selection regression for GWAS 2 1.2 GWAS with survival outcome 5 1.3 Organization of the dissertation 7 2. The Bayesian model 9 2.1 Likelihood 9 2.2 A hierarchical structure 11 2.3 The prior on 15 2.4 Proportion of variance explained (PVE)-- the random case. 17 2.5 Proportion of variance explained (PVE)— conditional on genotype data 25 2.6 Digression to linear regression 28 3. Computation strategies and algorithm 30 3.1 Computation strategies 30 3.2 Outline of the algorithm 32 4. Simulation studies 36 5. Real data analysis: Age at menarche based on GELAC 44 6. Discussion and future work 51 References 53 Appendix 56 A.1. Algorithm 56 A.2. Single-SNP Bayes factor 62 A.3. P-values of single-SNP Cox’s fit of 10 verified SNPs. 65 A.4. The top 100 bins based on our method 66

    1. Guan YT, Stephens M (2011) Bayesian Variable Selection Regression for Genome-Wide Association Studies and Other Large-Scale Problems. Annals of Applied Statistics 5: 1780-1815.
    2. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ (2008) Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 4: e1000130.
    3. Wu TT, Chen YF, Hastie T, Sobel E, Lange K (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25: 714-721.
    4. He Q, Lin DY (2011) A variable selection method for genome-wide association studies. Bioinformatics 27: 1-8.
    5. Zuber V, Duarte Silva AP, Strimmer K (2012) A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies. BMC Bioinformatics 13: 284.
    6. Hoffman GE, Logsdon BA, Mezey JG (2013) PUMA: a unified framework for penalized multiple regression analysis of GWAS data. PLoS Comput Biol 9: e1003101.
    7. Maher B (2008) Personal genomes: The case of the missing heritability. Nature 456: 18-21.
    8. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747-753.
    9. Raftery AE, Madigan, D, Hoeting. J. A. (1997) Bayesian model averaging for linear regression models. Journal of American Statistical Association 92: 179-191.
    10. Chadeau-Hyam M, Campanella G, Jombart T, Bottolo L, Portengen L, et al. (2013) Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. Environ Mol Mutagen 54: 542-557.
    11. Wheeler HE, Maitland ML, Dolan ME, Cox NJ, Ratain MJ (2013) Cancer pharmacogenomics: strategies and challenges. Nat Rev Genet 14: 23-34.
    12. Rockova V, Lesaffre E, Luime J, Lowenberg B (2012) Hierarchical Bayesian formulations for selecting variables in regression models. Stat Med 31: 1221-1237.
    13. Lee EK, Mallick, B. K. (2004) Bayesian Methods for Variable Selection in Survival Models with Application to DNA Microarray Data. Sankhya: The Indian Journal of Statistics 66: 756-778.
    14. Rosthoj S, Keiding N (2004) Explained variation and predictive accuracy in general parametric statistical models: the role of model misspecification. Lifetime Data Anal 10: 461-472.
    15. Choodari-Oskooei B, Royston P, Parmar MK (2012) A simulation study of predictive ability measures in a survival model I: explained variation measures. Stat Med 31: 2627-2643.
    16. George EI, McCulloch, R. E. (1997) Approaches for Bayesian Variable Selection. Statistica Sinica 7: 339-373.
    17. Gibson G (2011) Rare and common variants: twenty arguments. Nat Rev Genet 13: 135-145.
    18. Roberts C, Cassella, G (2005) Monte Carlo Statistical Methods: Springer.
    19. Gelman A, Carlin, J., Stern, H., Rubin, D. (2003) Bayesian Data Analysis, Second Edition: Chapman and Hall.
    20. Hsiung CA, Lan Q, Hong YC, Chen CJ, Hosgood HD, et al. (2010) The 5p15.33 locus is associated with risk of lung adenocarcinoma in never-smoking females in Asia. PLoS Genet 6.
    21. Lan Q, Hsiung CA, Matsuo K, Hong YC, Seow A, et al. (2012) Genome-wide association analysis identifies new lung cancer susceptibility loci in never-smoking women in Asia. Nat Genet 44: 1330-1335.
    22. Parent AS, Teilmann G, Juul A, Skakkebaek NE, Toppari J, et al. (2003) The timing of normal puberty and the age limits of sexual precocity: variations around the world, secular trends, and changes after migration. Endocr Rev 24: 668-693.
    23. Towne B, Czerwinski SA, Demerath EW, Blangero J, Roche AF, et al. (2005) Heritability of age at menarche in girls from the Fels Longitudinal Study. Am J Phys Anthropol 128: 210-219.
    24. He C, Kraft P, Chen C, Buring JE, Pare G, et al. (2009) Genome-wide association studies identify loci associated with age at menarche and age at natural menopause. Nat Genet 41: 724-728.
    25. Ong KK, Elks CE, Li S, Zhao JH, Luan J, et al. (2009) Genetic variation in LIN28B is associated with the timing of puberty. Nat Genet 41: 729-733.
    26. Perry JR, Stolk L, Franceschini N, Lunetta KL, Zhai G, et al. (2009) Meta-analysis of genome-wide association data identifies two loci influencing age at menarche. Nat Genet 41: 648-650.
    27. Sulem P, Gudbjartsson DF, Rafnar T, Holm H, Olafsdottir EJ, et al. (2009) Genome-wide association study identifies sequence variants on 6q21 associated with age at menarche. Nat Genet 41: 734-738.
    28. Elks CE, Perry JR, Sulem P, Chasman DI, Franceschini N, et al. (2010) Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies. Nat Genet 42: 1077-1085.
    29. Delahanty RJ, Beeghly-Fadiel A, Long JR, Gao YT, Lu W, et al. (2013) Evaluation of GWAS-identified genetic variants for age at menarche among Chinese women. Hum Reprod 28: 1135-1143.
    30. Quan Y, Krone, S.M. (2007) Small world MCMC and convergence to multi-modal distributions: from slow mixing to fast mixing. Annals of Applied Statistics 17: 284-304.
    31. Gelman A, Meng, X. L., Stern, H. (1996) Posterior predictive assessment of model fitness via realized discrepancy. Statistica Sinica 6: 733-807.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE