研究生: |
楊承翰 Yang, Cheng-Han |
---|---|
論文名稱: |
利用梯度提升決策樹分析倖存資料 Gradient Boosting Tree with Survival Data |
指導教授: |
鄭又仁
Cheng, Yu-Jen |
口試委員: |
黃冠華
Huang, Guan-Hua 邱燕楓 Chiu, Yen-Feng |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 54 |
中文關鍵詞: | 梯度決策樹 、比例風險模型 、個人化醫療 、因果推論 |
外文關鍵詞: | boosting tree, survival analysis, personalized medicine, causal inference |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文主要有兩個目標,第一個目標為找出對於每位病患適合之治療方式;第二個目標為在最佳醫療決策下估計病患之倖存函數。因此我們根據梯度提升演算法(gradient boosting machine)架構,提出RAINBOW演算法,達成此兩項目的。從模擬結果可知,藉由給予適當的懲罰項,即便在自變數的個數大於樣本數的情況下,仍可精確的估計倖存函數與醫療決策。
We consider the problem of identifying patients who may take medication. To deal with the problem, we use gradient boosting to train the Cox model, involves estimating survival function for treatment and control for each patient. The difference in these survival function is then used to make the optimal decision which patient should be treated and estimate the survival function with optimal treatment regime which maps observed patient characteristics to a recommended treatment. From our simulation, we get the good performance even the number of covariates is larger than the sample size. As an illustration, we apply the proposed method to survival data of non-small cell lung cancer.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2):123–140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). Classification and regression trees. CRC press, New York.
Cox, D. R. (1972). Regression models and life-tables. In Breakthroughs in statistics, pages 527–541. Springer.
Cox, D. R. (1975). Partial likelihood. Biometrika, pages 269–276.
Deng, H. and Runger, G. (2012). Feature selection via regularized trees. In Neural Networks (IJCNN), The 2012 International Joint Conference on, pages 1–8. IEEE.
Dusseldorp, E. and Van Mechelen, I. (2014). Qualitative interaction trees: a tool to identify qualitative treatment–subgroup interactions. Statistics in Medicine, 33(2):219–237.
Foster, J. C., Taylor, J. M., and Ruberg, S. J. (2011). Subgroup identification
from randomized clinical trial data. Statistics in Medicine, 30(24):2867–2880.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting
machine. Annals of Statistics, pages 1189–1232.
Friedman, J. H. and Fisher, N. I. (1999). Bump hunting in high-dimensional data. Statistics and Computing, 9(2):123–143. 36
Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar):1157–1182.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1):389–422.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 20(8):832–844.
Li, H. and Luan, Y. (2005). Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics, 21(10):2403–2409.
Li, P., Burges, C. J., Wu, Q., Platt, J., Koller, D., Singer, Y., and Roweis, S.
(2007). Mcrank: Learning to rank using multiple classification and gradient
boosting. In NIPS, volume 7, pages 845–852.
Liang, H. and Zou, G. (2008). Improved aic selection strategy for survival analysis. Computational Statistics & Data Analysis, 52(5):2538–2548.
Lipkovich, I., Dmitrienko, A., Denne, J., and Enas, G. (2011). Subgroup identification based on differential effect search—a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine, 30(21):2601–2621.
Su, X., Tsai, C.-L., Wang, H., Nickerson, D. M., and Li, B. (2009). Subgroup
analysis via recursive partitioning. Journal of Machine Learning Research,
10(Feb):141–158. 37
Su, X., Zhou, T., Yan, X., Fan, J., and Yang, S. (2008). The international journal of biostatistics.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288.
Tutz, G. and Binder, H. (2006). Generalized additive modeling with implicit
variable selection by likelihood-based boosting. Biometrics, 62(4):961–971.
Weisberg, S. (2005). Applied linear regression, volume 528. John Wiley & Sons.
Zhang, B., Tsiatis, A. A., Davidian, M., Zhang, M., and Laber, E. (2012). Estimating optimal treatment regimes from a classification perspective. Stat, 1(1):103–114.