簡易檢索 / 詳目顯示

研究生: 溫邦淳
WEN, BANG CHUN
論文名稱: 利用Fused LASSO對倖存資料進行分析
Analysis of survival data with Fused LASSO
指導教授: 鄭又仁
Cheng, Yu Jen
口試委員: 邱燕楓
Chiu,Yen Feng
趙蓮菊
Chao, Anne
學位類別: 碩士
Master
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 38
中文關鍵詞: 懲戒函數變數選取變數分群倖存分析
外文關鍵詞: penalty function, Fused LASSO, variable grouping
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇研究中,我們的目的是在Cox 比例風險函數中同時進行估計、變數選取及變數分群。Tibshirani (1996) 在目標函數中加入L1-norm 懲戒函數進行估計讓估計參數具有稀疏性,以此有效的同時達到估計以及變數選取的效果。在傳統的變數分群方法中,變數通常會根據從前的知識來進行分群,而這種分群方法通常被認定太過主觀。本篇研究中,我們應用Tibshirani et al. (2005) 的手法於Cox 比例風險函數的偏概似函數上,Fused LASSO 懲戒函數著重在參數和參數差的L1-norm,其中參數的L1懲戒函數使得參數估計值受到壓縮而達到稀疏性的性質,而參數差的L1 懲戒函數將鄰近的參數差進行壓縮,鄰近的參數得到相同估計值藉此進行變數分群。這種以數據自我統計的方法是較為客觀的,並且我們可以同時估計、變數選取及變數分群。在模擬方面,我們考慮四種模型比較:LASSO、Generalized LASSO、Fused LASSO、和正常的Cox model,以這些模型來分別比較這些懲戒函數所帶來的效果,並且實際應用在一筆肺癌經過輔助化療後基因位點資料分析。


    In this work, our aims are to do model selection, coefficient estimation and variable grouping imultaneously in Cox’s proportional hazards model.Tibshirani (1996) added L1 norm penalty function to objective function to obtain the sparsity of coefficient estimation, which is an efficient way to domodel election and coefficient estimation at one time. In traditional variable grouping methods, variables are grouped based on the prior knowledge, which is often be judged too subjective. In this work, we apply Tibshirani et al. (2005) to the partial likelihood of Cox model. The Fused LASSO penalty focuses on the combination of L1 norm and the difference of L1 norm: L1 penalty shrinkages coefficients to ensure the sparseness of coefficient
    estimates, while the difference of L1 penalty shrinkages the difference between the neighboring coefficients, which makes variables be grouped in the sense of nvolving same coefficient estimates. This data adaptive approach is more objective and we can estimate, select and group variables simultaneously. In our simulation, we consider three different cases: LASSO, generalized LASSO and Fused LASSO to compare the effects of the L1 and the difference of L1 penalty and apply to analysis Gene Signature for Adjuvant Chemotherapy in Resected Non–Small-Cell Lung cancer data.

    Contents 1 基本介紹1 2 方法回顧3 2.1 符號定義. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Generalized LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Cox 比例風險函數. . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Fused LASSO 12 3.1 方法介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 模擬17 4.1 模擬設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Model 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 模擬結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 實例分析20 6 結論23 附錄 1 Model.1 中各方法的Corr. 和Incorr. 挑選 方式為C.V. . . . . . . . . 23 2 Model.1 中各方法的Corr. 和Incorr. 挑選 方式為BIC . . . . . . . . 23 3 Model.1 中各方法的Fuse 個數. 挑選 方式為C.V. . . . . . . . . . . 24 4 Model.1 中各方法的Fuse 個數. 挑選 方式為BIC . . . . . . . . . . 24 5 Model.1 中各方法的Bias. 挑選 方式為C.V. . . . . . . . . . . . . 25 6 Model.1 中各方法的Bias. 挑選 方式為BIC . . . . . . . . . . . . . 25 7 Model.1 中各方法的SE 及ASE. 挑選 方式為C.V. . . . . . . . . . 26 8 Model.1 中各方法的SE 及ASE. 挑選 方式為BIC . . . . . . . . . . 26 9 Model.1 中各方法的Confidence interval 包覆率. 挑選 方式為C.V. . . 27 10 Model.1 中各方法的Confidence interval 包覆率. 挑選 方式為BIC . . 27 11 Model.2 中各方法的Corr. 和Incorr. 挑選 方式為C.V. . . . . . . . 28 12 Model.2 中各方法的Corr. 和Incorr. 挑選 方式為BIC . . . . . . . 28 13 Model.2 中各方法的Fuse 個數. 挑選 方式為C.V. . . . . . . . . . . 29 14 Model.2 中各方法的Fuse 個數. 挑選 方式為BIC . . . . . . . . . . 29 15 Model.2 中各方法的Bias. 挑選 方式為C.V. . . . . . . . . . . . . 30 16 Model.2 中各方法的Bias. 挑選 方式為BIC . . . . . . . . . . . . 30 17 Model.2 中各方法的SE 及ASE. 挑選 方式為C.V. . . . . . . . . . 31 18 Model.2 中各方法的SE 及ASE. 挑選 方式為BIC . . . . . . . . . 31 19 Model.2 中各方法的Confidence interval 包覆率. 挑選 方式為C.V. . 32 20 Model.2 中各方法的Confidence interval 包覆率. 挑選 方式為BIC . . 32 21 Screen process 選出來的80 個基因位點名稱. . . . . . . . . . . . . . 33 22 LASSO 選出的基因位點群名稱. 挑選 的方式為C.V. . . . . . . . . 33 23 Generalized LASSO 選出的基因位點群名稱. 挑選 的方式為C.V. . . 33 24 Fused LASSO 選出的基因位點群名稱. 挑選 的方式為C.V. . . . . . 34 25 Generalized LASSO 選出的基因位點群名稱. 挑選 的方式為BIC . . 34 26 LASSO 選出的基因位點群名稱. 挑選 的方式為BIC . . . . . . . . 35 27 Fused LASSO 選出的基因位點群名稱. 挑選 的方式為BIC . . . . . 35

    References
    Breiman, L. (1996). Heuristics of instability and stabilization in model selection.
    The Annals of Statistics 24, 2350–2383.
    Chaturvedi, N., de Menezes, R. X., and Goeman, J. J. (2014). Fused lasso algorithm
    for cox proportional hazards and binomial logit models with application
    to copy number profiles. Biometrical Journal 56, 477–492.
    Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269–276.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle
    regression. The Annals of Statistics 32, 407–499.
    Fan, J. and Li, R. (2002). Variable selection for cox’s proportional hazards model
    and frailty model. The Annals of Statistics 30, 74–99.
    Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional
    feature space. Journal of the Royal Statistical Society: Series B (Statistical
    Methodology) 70, 849–911.
    Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for
    generalized linear models via coordinate descent. Journal of Statistical Software
    33, 1–22.
    Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics
    6, 461–464.
    Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011). Regularization
    paths for cox proportional hazards model via coordinate descent. Journal of
    Statistical Software 39, 1–13.
    Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal
    of the Royal Statistical Society. Series B (Methodological) 58, 267–288.
    Tibshirani, R. (1997). The lasso method for variable selection in the cox model.
    Statistics in Medicine 16, 385–395.
    Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity
    and smoothness via the fused lasso. Journal of the Royal Statistical Society:
    Series B (Statistical Methodology) 67, 91–108.
    Tibshirani, R. J. (2011). The solution path of the generalized lasso. Technical
    report, Stanford University.
    van Houwelingen, H. C., Bruinsma, T., Hart, A. A., van’t Veer, L. J., and Wessels,
    L. F. (2006). Cross-validated cox regression on microarray gene expression data.
    Statistics in Medicine 25, 3201–3216.
    Yamaoka, K., Nakagawa, T., and Uno, T. (1978). Application of akaike’s information
    criterion (aic) in the evaluation of linear pharmacokinetic equations.
    Journal of Pharmacokinetics and Biopharmaceutics 6, 165–175.
    Zhang, H. H. and Lu, W. (2007). Adaptive lasso for cox proportional hazards
    model. Biometrika 94, 691–703.
    Zhu, C.-Q., Ding, K., Strumpf, D., Weir, B. A., Meyerson, M., Pennell, N.,
    Thomas, R. K., Naoki, K., Ladd-Acosta, C., Liu, N., et al. (2010). Prognostic
    and predictive gene signature for adjuvant chemotherapy in resected non–smallcell
    lung cancer. Journal of Clinical Oncology 28, 4417–4424.
    Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the
    American Statistical Association 101, 1418–1429.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE