利用Fused LASSO對倖存資料進行分析｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	溫邦淳 WEN, BANG CHUN
論文名稱：	利用Fused LASSO對倖存資料進行分析 Analysis of survival data with Fused LASSO
指導教授：	鄭又仁 Cheng, Yu Jen
口試委員:	邱燕楓 Chiu,Yen Feng 趙蓮菊 Chao, Anne
學位類別：	碩士 Master
系所名稱：	理學院 - 統計學研究所 Institute of Statistics
論文出版年：	2015
畢業學年度：	103
語文別：	中文
論文頁數：	38
中文關鍵詞：	懲戒函數、變數選取、變數分群、倖存分析
外文關鍵詞：	penalty function, Fused LASSO, variable grouping
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本篇研究中，我們的目的是在Cox 比例風險函數中同時進行估計、變數選取及變數分群。Tibshirani (1996) 在目標函數中加入L1-norm 懲戒函數進行估計讓估計參數具有稀疏性，以此有效的同時達到估計以及變數選取的效果。在傳統的變數分群方法中，變數通常會根據從前的知識來進行分群，而這種分群方法通常被認定太過主觀。本篇研究中，我們應用Tibshirani et al. (2005) 的手法於Cox 比例風險函數的偏概似函數上，Fused LASSO 懲戒函數著重在參數和參數差的L1-norm，其中參數的L1懲戒函數使得參數估計值受到壓縮而達到稀疏性的性質，而參數差的L1 懲戒函數將鄰近的參數差進行壓縮，鄰近的參數得到相同估計值藉此進行變數分群。這種以數據自我統計的方法是較為客觀的，並且我們可以同時估計、變數選取及變數分群。在模擬方面，我們考慮四種模型比較:LASSO、Generalized LASSO、Fused LASSO、和正常的Cox model，以這些模型來分別比較這些懲戒函數所帶來的效果，並且實際應用在一筆肺癌經過輔助化療後基因位點資料分析。

In this work, our aims are to do model selection, coefficient estimation and variable grouping imultaneously in Cox’s proportional hazards model.Tibshirani (1996) added L1 norm penalty function to objective function to obtain the sparsity of coefficient estimation, which is an efficient way to domodel election and coefficient estimation at one time. In traditional variable grouping methods, variables are grouped based on the prior knowledge, which is often be judged too subjective. In this work, we apply Tibshirani et al. (2005) to the partial likelihood of Cox model. The Fused LASSO penalty focuses on the combination of L1 norm and the difference of L1 norm: L1 penalty shrinkages coefficients to ensure the sparseness of coefficient
estimates, while the difference of L1 penalty shrinkages the difference between the neighboring coefficients, which makes variables be grouped in the sense of nvolving same coefficient estimates. This data adaptive approach is more objective and we can estimate, select and group variables simultaneously. In our simulation, we consider three different cases: LASSO, generalized LASSO and Fused LASSO to compare the effects of the L1 and the difference of L1 penalty and apply to analysis Gene Signature for Adjuvant Chemotherapy in Resected Non–Small-Cell Lung cancer data.

Contents
基本介紹1
方法回顧3
1 符號定義. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Generalized LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Cox 比例風險函數. . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Fused LASSO 12
1 方法介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 演算法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
模擬17
1 模擬設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Model 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 模擬結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
實例分析20
結論23

附錄
Model.1 中各方法的Corr. 和Incorr. 挑選 方式為C.V. . . . . . . . . 23
Model.1 中各方法的Corr. 和Incorr. 挑選 方式為BIC . . . . . . . . 23
Model.1 中各方法的Fuse 個數. 挑選 方式為C.V. . . . . . . . . . . 24
Model.1 中各方法的Fuse 個數. 挑選 方式為BIC . . . . . . . . . . 24
Model.1 中各方法的Bias. 挑選 方式為C.V. . . . . . . . . . . . . 25
Model.1 中各方法的Bias. 挑選 方式為BIC . . . . . . . . . . . . . 25
Model.1 中各方法的SE 及ASE. 挑選 方式為C.V. . . . . . . . . . 26
Model.1 中各方法的SE 及ASE. 挑選 方式為BIC . . . . . . . . . . 26
Model.1 中各方法的Confidence interval 包覆率. 挑選 方式為C.V. . . 27
Model.1 中各方法的Confidence interval 包覆率. 挑選 方式為BIC . . 27
Model.2 中各方法的Corr. 和Incorr. 挑選 方式為C.V. . . . . . . . 28
Model.2 中各方法的Corr. 和Incorr. 挑選 方式為BIC . . . . . . . 28
Model.2 中各方法的Fuse 個數. 挑選 方式為C.V. . . . . . . . . . . 29
Model.2 中各方法的Fuse 個數. 挑選 方式為BIC . . . . . . . . . . 29
Model.2 中各方法的Bias. 挑選 方式為C.V. . . . . . . . . . . . . 30
Model.2 中各方法的Bias. 挑選 方式為BIC . . . . . . . . . . . . 30
Model.2 中各方法的SE 及ASE. 挑選 方式為C.V. . . . . . . . . . 31
Model.2 中各方法的SE 及ASE. 挑選 方式為BIC . . . . . . . . . 31
Model.2 中各方法的Confidence interval 包覆率. 挑選 方式為C.V. . 32
Model.2 中各方法的Confidence interval 包覆率. 挑選 方式為BIC . . 32
Screen process 選出來的80 個基因位點名稱. . . . . . . . . . . . . . 33
LASSO 選出的基因位點群名稱. 挑選 的方式為C.V. . . . . . . . . 33
Generalized LASSO 選出的基因位點群名稱. 挑選 的方式為C.V. . . 33
Fused LASSO 選出的基因位點群名稱. 挑選 的方式為C.V. . . . . . 34
Generalized LASSO 選出的基因位點群名稱. 挑選 的方式為BIC . . 34
LASSO 選出的基因位點群名稱. 挑選 的方式為BIC . . . . . . . . 35
Fused LASSO 選出的基因位點群名稱. 挑選 的方式為BIC . . . . . 35
                                

References
Breiman, L. (1996). Heuristics of instability and stabilization in model selection.
The Annals of Statistics 24, 2350–2383.
Chaturvedi, N., de Menezes, R. X., and Goeman, J. J. (2014). Fused lasso algorithm
for cox proportional hazards and binomial logit models with application
to copy number profiles. Biometrical Journal 56, 477–492.
Cox, D. R. (1975). Partial likelihood. Biometrika 62, 269–276.
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle
regression. The Annals of Statistics 32, 407–499.
Fan, J. and Li, R. (2002). Variable selection for cox’s proportional hazards model
and frailty model. The Annals of Statistics 30, 74–99.
Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional
feature space. Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 70, 849–911.
Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for
generalized linear models via coordinate descent. Journal of Statistical Software
33, 1–22.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics
6, 461–464.
Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011). Regularization
paths for cox proportional hazards model via coordinate descent. Journal of
Statistical Software 39, 1–13.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal
of the Royal Statistical Society. Series B (Methodological) 58, 267–288.
Tibshirani, R. (1997). The lasso method for variable selection in the cox model.
Statistics in Medicine 16, 385–395.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity
and smoothness via the fused lasso. Journal of the Royal Statistical Society:
Series B (Statistical Methodology) 67, 91–108.
Tibshirani, R. J. (2011). The solution path of the generalized lasso. Technical
report, Stanford University.
van Houwelingen, H. C., Bruinsma, T., Hart, A. A., van’t Veer, L. J., and Wessels,
L. F. (2006). Cross-validated cox regression on microarray gene expression data.
Statistics in Medicine 25, 3201–3216.
Yamaoka, K., Nakagawa, T., and Uno, T. (1978). Application of akaike’s information
criterion (aic) in the evaluation of linear pharmacokinetic equations.
Journal of Pharmacokinetics and Biopharmaceutics 6, 165–175.
Zhang, H. H. and Lu, W. (2007). Adaptive lasso for cox proportional hazards
model. Biometrika 94, 691–703.
Zhu, C.-Q., Ding, K., Strumpf, D., Weir, B. A., Meyerson, M., Pennell, N.,
Thomas, R. K., Naoki, K., Ladd-Acosta, C., Liu, N., et al. (2010). Prognostic
and predictive gene signature for adjuvant chemotherapy in resected non–smallcell
lung cancer. Journal of Clinical Oncology 28, 4417–4424.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the
American Statistical Association 101, 1418–1429.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文