研究生: |
辜冠銘 Ku, Kuan-Ming |
---|---|
論文名稱: |
應用Elastic Net於多基因風險評分分析 Polygenic Risk Score Analysis with Elastic Net |
指導教授: |
謝文萍
Hsieh, Wen-Ping |
口試委員: |
張升懋
Chang, Sheng-Mao 鍾仁華 Chung, Ren-Hua |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 23 |
中文關鍵詞: | 多基因風險評分 、連鎖不平衡 、彈性網 |
外文關鍵詞: | PolygenicRiskScore, LinkageDisequilibrium, ElasticNet |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多基因風險評分(Polygenic Risk Score, PRS)已經被應用於預測一些複雜的疾病風險。傳統的多基因風險評分方法中,會對每個遺傳的變異建構一個簡單線性迴歸(simple linear regression),然後以迴歸係數當權重將這些變異的基因型做加總以求得PRS。其中一個典型的問題是連鎖不平衡(Linkage Disequilibrium, LD),這些變異因為在染色體上距離相近而產生共線性(collinearity),現存的方法如LDpred、lassosum和C+T(Clumping + Thresholding)都可以根據LD的結構來處理這個問題。然而這樣的方法,導致那些彼此相關但因為距離較遠的變異被視為互相獨立。於是我們提出使用彈性網(Elastic net)來處理這個問題。彈性網是一個正則化的多元線性迴歸(multiple linear regression)方法,結合了最小絕對值收斂與選擇算子(lasso)及嶺迴歸(ridge regression),可以同時對所有效應做估計以及選取重要的變數,且聯合建模可以納入變異間共享的資訊。
我們使用了來自台灣人體生物資料庫的資料,透過比較身體質量指數(BMI)與多基因風險評分的相關性來展示該策略的優勢。根據分析的結果,在某些情況下使用彈性網而非簡單線性迴歸來對大量變異的效應做估計,可以使預測得到的多基因風險評分更準確。
Polygenic risk scores (PRS) have been applied to predicting the risk of some complex disease. In the standard approach, people construct a simple linear regression on each genetic variant, and then aggregate the effective alleles by summing up the effect sizes estimated in the regression. A typical issue in constructing PRS is linkage disequilibrium among the variants. There have been a number of methods treating this problem according to the linkage disequilibrium structure of the chromosomes, such as LDpred, lassosum and C+T (Clumping + Thresholding). However, some variants that carry independent information are probably not retained because of close distance. On the other hand, highly correlated SNPs will both be included if they are far away from each other. Here, we propose to construct PRS model by using elastic net, a classical penalized regression, combining the advantages of lasso and ridge. Elastic net is a multiple linear regression framework, and it can estimate the effect size and select the causal variant from all SNPs simultaneously. Instead of modeling just one SNP at a time, joint modeling of the effects can accommodate the shared information without over-emphasizing certain group of SNPs.
We demonstrate the benefit of the proposed strategy with the data from Taiwan Biobank by comparing the correlation between BMI and the PRSs to other methods. According to our experimental results, the prediction of BMI is more accurate with elastic net estimates than with the simple linear regression estimates.
1. Vilhjalmsson, B.J., et al., Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am J Hum Genet, 2015. 97(4): p. 576-92.
2. Mak, T.S.H., et al., Polygenic scores via penalized regression on summary statistics. Genet Epidemiol, 2017. 41(6): p. 469-480.
3. Choi, S.W. and P.F. O'Reilly, PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience, 2019. 8(7).
4. Hoerl, A.E. and R.W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 1970. 12(1): p. 55-67.
5. Tibshirani, R., Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B-Methodological, 1996. 58(1): p. 267-288.
6. Zou, H. and T. Hastie, Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 2005. 67(2): p. 301-320.
7. Money, D., et al., LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms. G3 (Bethesda), 2015. 5(11): p. 2383-90.
8. Browning, S.R. and B.L. Browning, Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. American Journal of Human Genetics, 2007. 81(5): p. 1084-1097.
9. Scheet, P. and M. Stephens, A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet, 2006. 78(4): p. 629-44.
10. Prive, F., et al., Making the Most of Clumping and Thresholding for Polygenic Scores. American Journal of Human Genetics, 2019. 105(6): p. 1213-1221.