研究生: |
連郡儀 Chun-Yi Lian |
---|---|
論文名稱: |
Some Advancement in Model Selection Methods And Its Application to a Genetic Epidemiology Study 選模方法之最近發展及其在遺傳流病研究之應用 |
指導教授: |
熊昭
Chao Hsiung 謝文萍 Wen-Ping Hsieh |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 英文 |
論文頁數: | 33 |
中文關鍵詞: | 選模 |
外文關鍵詞: | model selction, GEE, QIC, penalized likelihood |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Model selection is an important topic in data analysis. If the model is selected appropriately, we can use it to predict well. We introduced three tools for model selection in this paper. These three tools are QIC (Quasi-likelihood under the Independence model Criterion), L1-regularization path algorithm for generalized linear model, and L2-penalized logistic regression with a stepwise variable selection. The method QIC can be used for the correlated data such as family data. L1-regularization path algorithm and L2-penalized algorithm can be used for high-dimensional data such as microarray data. If we focus on gene interactions, the method L2-penalized algorithm may be useful. Our data from the SAPPHIRe (Stanford Asian Pacific Program for Hypertension and Insulin Resistance) project is from family data hence correlated. We use these three methods for the data set and compare the models selected by different methods and evaluate the performance of the prediction.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proceedings of the Second International Symposium on Information Theory, B. N. Petrov and F. Csaki (eds), 267-281. Budapest:Akademial Kiado.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407-499.
Friedman, J. (1991). Multivariate adaptive regression splines. The Annals of Statistics 19, 1-67.
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C. and Lander, E. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.
James C. and Guoqi Q. (2007). Selection of Working Correlation Structure and Best Model in GEE Analyses of Longitudinal data. Simulation and Computation, 36: 987-997.
Jianqing F. and Runze L. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. JASA, Vol. 96, No.456.
Jianwen C., Jianqing F., Runze L. and Haibo Z. (2005). Variable selection for multivariate failure time data. Biometrika, 92, 2, pp. 303-316.
Klein, R., Klein, B. E. K., Moss, S. E., Davis, M. D., and DeMets, D. L. (1984). The Wisconsin Epidemiologic Study of Diabetic Retinopathy: Ⅱ. Prevalence and risk of diabetic retinopathy when age at diagnosis is less than 30 years, Archives of Ophthalmology 102, 520-526.
Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics 22, 79-86.
Le Cessie, S. and Van Houwelingen, J. (1992). Ridge estimators in logistic regression. Applied Statistics, 41, 191-201.
Lee, A. and Silvapulle, M. (1988). Ridge estimation in logistic logistic regression. Communications in Statistics, Simulation and Computation 17,1231-1257.
Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22.
Linhart, L. and Zucchini, W. (1986). Model Selecton. New York: Wiley.
Mallows, C.L. (1973). Some Comments on Cp. Technimetrics, 15, 661-675.
Park M. Y. and Hastie, T. (2007). L1-regularization path algorithm for generalized linear models. J. R. Statistic. Soc. B 69, Part 4, pp. 650-677.
Park M. Y. and Hastie, T. (2008). Penalized Logistic Regression for Detection Gene Interactions. Biostatistics 9(1):30-50.
Rosset, S. (2004). Tracking curved regularized optimization solution paths. In Neural Information Processing Systems. Cambridge: MIT press.
Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res., 5, 941-973.
Schwarz, G., (1978). Estimating the dimension of a model. Annals of Statistics 6(2):461-464.
Tibshirani, R. J. (1997). The LASSO Method for Variable Selection in the Cox Model. Statistics in Medicine, 16, 385-395.
Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the Gaussian method. Biometrika, 61, 439-47
Wei P. (2001). Akaike’s Information Criterion in Generalized Estimating Equations. Biomatrics 57, 120-125.
Yasuhiko W. and Nobuhisa K. (1990). Selecting Statistical Models with Information Statistics. J Dairy Sci 73:3575-3582.
Zou, H. and Hastie, T. (2004). On the ‘degrees of freedom’ of the lasso. Technical Report. Stanford University, Stanford.