研究生: |
蘇尼爾 Chatla, Suneel Babu |
---|---|
論文名稱: |
計數型及二元型資料的非傳統迴歸模型 Unconventional Regression Models for Count and Binary Data |
指導教授: |
徐茉莉
Shmueli, Galit |
口試委員: |
雷松亞
Ray, Soumya 盧鴻興 Lu, Horng-Shing 陳君厚 Chen, Chun-Houh 黃禮珊 Huang, Li-Shan |
學位類別: |
博士 Doctor |
系所名稱: |
科技管理學院 - 服務科學研究所 Institute of Service Science |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 128 |
中文關鍵詞: | Probit 模型 、疊代重新加權最小平方法 、懲罰樣條 、過度分散與分散不足 、時間序列 、轉折點 、廣義概似比 、選擇性偏差 、Logit 模型 |
外文關鍵詞: | Probit, IRLS, Penalized Splines, Over and under dispersion, Time series, Change point, Generalized Likelihood Ratio, Selection bias, Logit |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於近期資料的多元化,計數以及二元反應變數已成為各領域中熱門的資料型態,特別是在人類社會行為學中。傳統在處理計數型反應變數的統計模型包含了卜瓦松迴歸以及負二項迴歸,而羅吉斯迴歸以及Probit迴歸則是使用於二元反應變數。如同大多數的迴歸模型,上述的傳統方法皆存在限制條件而無法應付於各類情況。為了因應這些傳統模型不能解決的問題,許多替代的模型孕育而生。我們討論了其中兩個的非傳統模型:針對計數資料我們研究了Conway-Maxwell Poisson (CMP) 迴歸而二元變數則採用線性機率模型 (LPM)。對於CMP迴歸,我們發展新的模型推廣,包含generalized additive model以及tree-based 變化係數模型,也同時提出包含IRLS在內的估計方法。對於LPM,我們評估其估計以及預測的性質。
(a) CMP迴歸模型能有效的掌握資料分散不足或者過度分散的現象,因此實務上CMP迴歸模型被廣泛使用於處理計數資料。然而,我們已知CMP迴歸模型使用於複雜非線性關係時是有其限制的。隨著資料多元性的增加,對於處理複雜非線性關係的計數模型的需求也隨之上升。在此篇論文中,我們首先提出在CMP分布下的變化係數模型,其迴歸係數被建構為調節變數的低維度函數,而後將此模型推廣至可容許高維度係數函數。後者我們採用新的tree-based方法。整體而言,我們在CMP迴歸研究上的貢獻如下:
(i) 我們提出一個在CMP迴歸上基於iterative reweighted least square (IRLS) 的較彈性估計過程,接著使用penalized spline估計法將其模型延伸至允許additive components (GAMs) 和 varying coefficient models (VCMs)。在一般條件下,CMP分布屬於指數族的特性可確保IRLS對於此模型的收斂性。我們透過大量的模擬研究與華盛頓特區單車分享系統的實際資料來展現此方法的可用性。
(ii) 我們提出的tree-based 變化係數模型是較有彈性的。在CMP分布的條件下,我們考慮基於模型的遞歸分割算法來有效率的改良tree-based 變化係數模型。除此之外,此篇文章也提供了另外的方法去估計切割點,進而取代預設的窮舉法,使得CMP分佈的模型配適更容易。同樣使用大量的模擬研究與華盛頓特區單車分享系統的實際資料來展現該方法的可用性。
(b)線性迴歸在社會科學研究中是最為常見的方法。而針對二元型態的反應變數
的線性機率模型(LPM)同樣也被廣泛的使用在不同學科中。在此篇論文中,我們首先使用在二元反應變數模型中三種常見的指標來衡量LPM,包含了模型推論與估計 、預測與分類以及選擇性偏差。我們在不同的樣本數、誤差分布等的條件下,比較該模型與羅吉斯迴歸以及Probit迴歸的優缺點。其次,我們在LPM中放寬了有母數的參數假設,同時也使用類似的比較方法來測試無母數LPM、羅吉斯迴歸與Probit迴歸的特點。我們研究發現在係數的正負號、統計顯著性以及邊際效性中,LPM的結果與羅吉斯和Probit模型相似。除此之外,LPM的估計量也收斂到真實參數的倍數。在分類以及選擇性偏差上,LPM同樣不亞於羅吉斯和Probit模型分析的結果。然而當有興趣的分析目的是在預測機率上時,LPM的實用性可能就相對不足,主因是LPM的機率預測值有機會高於一。最後,我們將呈現該模型在線上拍賣網(eBay)資料的價格分析結果。
針對每個研究主題,我們除了數學推導也都提供了充分的模擬數據分析以及真實數據來驗證對於所有提出模型的可用性。
Count and binary data have become popular dependent variables in studies in various areas, especially due to the growing availability of data on human and social behavior. While the standard models for count data include Poisson regression and Negative Binomial regression, the standard models for binary data include Logistic regression and Probit regression. Like any other regression models, these standard models have limitations in some cases and there may be other methods that are good in those situations. Among those alternatives we choose two unconventional models: Conway-Maxwell-Poisson (CMP) regression for count data and Linear Probability Model (LPM) for binary data. For CMP, we develop new model extensions and estimation frameworks including an IRLS algorithm, a generalized additive model as well as a tree-based varying coefficient model. For LPM, we critically evaluate its properties in terms of estimation and prediction.
(a) The Conway-Maxwell-Poisson (CMP) regression is a popular model for count data due to its ability to capture both under dispersion and over dispersion. However, CMP regression is limited when dealing with complex nonlinear relationships. With today's wide availability of count data, there is need for count data models that can capture complex nonlinear relationships. In this dissertation, we first present a varying coefficient model for the CMP distribution in which the regression coefficient is modeled as a low dimensional function of moderator variables and then we extend the model formulation in such a way that it allows high dimensional coefficient functions. The latter is fitted using a new tree-based method. Our contributions for the CMP regression include:
1. We propose a flexible estimation framework for CMP regression based on iterative reweighed least squares (IRLS) and then extend this model to allow for additive components (GAMs) as well as varying coefficient models (VCMs), using penalized splines estimation method. Because CMP distribution belongs to the exponential family, convergence is guaranteed for IRLS under some regularity conditions. We illustrate the usefulness of this method through extensive simulation studies and using real data from a bike sharing system in Washington, DC.
2. Our proposed tree-based varying coefficient model offers further flexibility. We consider a model based (MOB) recursive partitioning framework to implement a tree-based varying coefficient model for CMP distribution, as it is computationally less intensive. We also provide an alternative method to estimate the split point than the default exhaustive search which in turn eases model fitting in general, and for CMP distribution in particular. We illustrate the usefulness of our method by extensive simulation and a real application from a bike sharing system in Washington, DC.
(b) Linear regression is among the most popular statistical models in social sciences research. Linear probability models (LPMs) - linear regression models applied to a binary outcome - are used in various disciplines. In this dissertation, first, we evaluate LPM for three common uses of binary outcome models: inference and estimation, prediction and classification, and selection bias. We compare its performance to Logit and Probit regression models under different sample sizes, error distributions, and more. Second, we relax the parametric assumption and perform a similar type of comparison study for the nonparametric extensions of LPM, logistic and probit models. We find that coefficient directions, statistical significance, and marginal effects from LPM yield results similar to logit and probit. In addition, LPM estimators are consistent for the true parameters up to a multiplicative scalar. For classification and selection bias, LPM is on par with logit and probit in terms of class separation and ranking, and is a viable alternative in selection models. LPM is lacking when the predicted probabilities are of interest, because predicted probabilities can exceed the unit interval. We illustrate some of these results by modeling price in online auctions, using data from eBay.
For each of the studies, in addition to the methodological derivations, we use both extensive simulation studies and real data applications to illustrate the usefulness of the proposed methodologies.
Abramowitz, M. and Stegun, I. A. (1966). Handbook of mathematical functions, volume 55. Applied mathematics series.
Adjerid, I., Acquisti, A., Telang, R., Padman, R., and Adler-Milstein, J. (2015). The impact of privacy regulation and technology incentives: The case of health information exchanges. Management Science, 62(4):1042–1063.
Anderson, G. J. (1987). Prediction tests in limited dependent variable models. Journal of econometrics, 34(1-2):253–261.
Andrews, D. W. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica: Journal of the Econometric Society, pages 821–856.
Angrist, J. D. and Pischke, J.-S. (2008). Mostly harmless econometrics: An empiri- cist’s companion. Princeton university press.
Angrist, J. D. and Pischke, J.-S. (2012). Probit better than lpm? http://www. mostlyharmlesseconometrics.com/2012/07/probit-better-than-lpm/.
Antoch, J., Huˇskova ́, M., and Veraverbeke, N. (1995). Change-point problem and bootstrap. Journal of Nonparametric Statistics, 5(2):123–144.
Aragaki, A., Altman, N., et al. (1997). Local polynomial regression for binary response. Proceedings of the 27th symposium on the interface, pages 467–472.
Betts, J. R. and Fairlie, R. W. (2001). Explaining ethnic, racial, and immigrant differences in private school attendance. Journal of Urban Economics, 50(1):26–51.
Boatwright, P., Borle, S., and Kadane, J. B. (2011). A model of the joint distribution of purchase quantity and timing. Journal of the American Statistical Association.
Boukai, B. (1993). A nonparametric bootstrapped estimate of the change-point. Journal of Nonparametric Statistics, 3(2):123–134.
Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391):580–598.
Brodsky, E. and Darkhovsky, B. S. (2013). Nonparametric methods in change point problems, volume 243. Springer Science & Business Media.
Bu ̈hlmann, P. and Van De Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media.
Buja, A., Hastie, T. J., and Tibshirani, R. (1989). Linear smoothers and additive models. The Annals of Statistics, pages 453–510.
Bu ̈rgin, R. and Ritschard, G. (2015). Tree-based varying coefficient regression for longitudinal ordinal responses. Computational Statistics & Data Analysis, 86:65– 80.
Bu ̈rgin, R. and Ritschard, G. (2017). Coefficient-wise tree-based varying coefficient regression with vcrpart. Journal of Statistical Software, 80(6):1–33.
Burtch, G., Ghose, A., and Wattal, S. (2016). Secret admirers: An empirical exam- ination of information hiding and contribution dynamics in online crowdfunding. Information Systems Research, 27(3):478–496.
Caliendo, M., Clement, M., Papies, D., and Scheel-Kopeinig, S. (2012). The cost impact of spam filters: Measuring the effect of information system technologies in organizations. Information Systems Research, 23(3):1068–1080.
Caudill, S. B. (1987). Dichotomous choice models and dummy variables. The Statistician, pages 381–383.
Ceccagnoli, M., Forman, C., Huang, P., and Wu, D. (2011). Co-creation of value in a platform ecosystem: The case of enterprise software. MIS Quarterly, 36(1).
Chambers, J. M. (1998). Programming with data: A guide to the S language. Springer Science & Business Media.
Chatla, S. B. and Shmueli, G. (2017). An extensive examination of regression models with a binary outcome variable. Journal of the Association for Information Systems, 18(4):340.
Chatla, S. B. and Shmueli, G. (2018). Efficient estimation of com-poisson regression and additive model. Computational Statistics and Data Analysis, Accepted.
Cleveland, W. S., Grosse, E., and Shyu, W. M. (1992). Local regression models. Statistical models in S, 2:309–376.
Daly, F. and Gaunt, R. E. (2016). The Conway-Maxwell-Poisson distribution: dis- tributional theory and approximation. ALEA, Lat. Am. J. Probab. Math. Stat., 13:635–658.
Dominici, F., McDermott, A., Zeger, S. L., and Samet, J. M. (2002). On the use of generalized additive models in time-series studies of air pollution and health. American journal of epidemiology, 156(3):193–203.
Efron, B. (1975). The efficiency of logistic regression compared to normal discriminant analysis. Journal of the American Statistical Association, 70(352):892–898.
Fairlie, R. W. and Sundstrom, W. A. (1999). The emergence, persistence, and recent widening of the racial unemployment gap. ILR Review, 52(2):252–270.
Fan, J. (1992). Design-adaptive nonparametric regression. Journal of the American statistical Association, 87(420):998–1004.
Fan, J., Heckman, N. E., and Wand, M. P. (1995). Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. Journal of the American Statistical Association, 90(429):141–150.
Fanaee-T, H. and Gama, J. (2014). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, 2:113–127.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188.
Forman, C., Ghose, A., and Goldfarb, A. (2009). Competition between local and electronic markets: How the benefit of buying online depends on where you live. Management Science, 55(1):47–57.
Forman, C., Ghose, A., and Wiesenfeld, B. (2008). Examining the relationship be- tween reviews and sales: The role of reviewer identity disclosure in electronic mar- kets. Information Systems Research, 19(3):291–313.
Friedman, J., Hastie, T., H ̈ofling, H., Tibshirani, R., et al. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302–332.
Friedman, J., Hastie, T., and Tibshirani, R. (2001). The elements of statistical learn- ing, volume 1. Springer series in statistics New York.
Gaunt, R. E., Iyengar, S., Olde Daalhuis, A. B., and Simsek, B. (2016). An asymptotic expansion for the normalizing constant of the Conway-Maxwell-Poisson dis- tribution. arXiv:1612.06618v1.
Gillispie, S. B. and Green, C. G. (2015). Approximating the Conway-Maxwell-Poisson distribution normalization constant. Statistics, 49(5):1062–1073.
Gordon, D. V., Lin, Z., Osberg, L., and Phipps, S. (1994). Predicting probabilities: Inherent and sampling variability in the estimation of discrete-choice models. Oxford Bulletin of Economics and Statistics, 56(1):13–31.
Green, P. J. (1984). Iteratively reweighted least squares for maximum likelihood esti- mation, and some robust and resistant alternatives. Journal of the Royal Statistical Society. Series B (Methodological), pages 149–192.
Green, P. J. (1987). Penalized likelihood for general semi-parametric regression mod- els. International Statistical Review/Revue Internationale de Statistique, pages 245– 259.
Gu, C. (2013). Smoothing spline ANOVA models, volume 297. Springer Science & Business Media.
Gu, C. and Wahba, G. (1991). Minimizing gcv/gml scores with multiple smoothing parameters via the newton method. SIAM Journal on Scientific and Statistical Computing, 12(2):383–398.
Gupta, R. C., Sim, S. Z., and Ong, S. H. (2014). Analysis of discrete data by Conway- Maxwell-Poisson distribution. AStA Advances in Statistical Analysis, 98(4):327– 343.
Haggstrom, G. W. (1983). Logistic regression and discriminant analysis by ordinary least squares. Journal of Business & Economic Statistics, 1(3):229–238.
Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society. Series B (Methodological), pages 757–796.
Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. CRC press.
Hastie, T. J. and Tibshirani, R. (1986). Generalized additive models. Statistical science, pages 297–310.
Hastie, T. J. and Tibshirani, R. (1990). Generalized additive models, volume 43. CRC Press.
Hawkins, D. M. and Zamba, K. (2005). A change-point model for a shift in variance. Journal of Quality Technology, 37(1):21.
Heckman, J. J. (1977). Sample selection bias as a specification error (with an appli- cation to the estimation of labor supply functions).
Heckman, J. J. and Snyder Jr, J. M. (1996). Linear probability models of the de- mand for attributes with an empirical application to estimating the preferences of legislators. Technical report, National bureau of economic research.
Hellevik, O. (2009). Linear versus logistic regression when the dependent variable is a dichotomy. Quality & Quantity, 43(1):59–74.
Hinkley, D. V. (1970). Inference about the change-point in a sequence of random variables. Biometrika, 57(1):1–17.
Hjort, N. L. and Koning, A. (2002). Tests for constancy of model parameters over time. Journal of Nonparametric Statistics, 14(1-2):113–132.
Hothorn, T. and Zeileis, A. (2013). partykit: A toolkit for recursive partytioning, 2013. R package version 0.1-6, 66.
Jank, W. and Shmueli, G. (2010). Modeling online auctions, volume 91. John Wiley & Sons.
Jensen, S. T., Johansen, S., and Lauritzen, S. L. (1991). Globally convergent algo- rithms for maximizing a likelihood function. Biometrika, 78(4):867–877.
Johnson, N. L., Kotz, S., and Balakrishnan, N. (1997). Discrete multivariate distri- butions, volume 165. Wiley New York.
Kadane, J. B. et al. (2016). Sums of possibly associated bernoulli variables: The conway–maxwell-binomial distribution. Bayesian Analysis, 11(2):403–420.
Keener, R. W. (2006). Statistical theory: notes for a course in theoretical statistics. Springer.
Lehmann, E. L. and Casella, G. (2006). Theory of point estimation. Springer Science & Business Media.
Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. The Annals of Statistics, pages 1009–1052.
Lin, M., Lucas Jr, H. C., and Shmueli, G. (2013). Research commentary—too big to fail: large samples and the p-value problem. Information Systems Research, 24(4):906–917.
Lu, T. and Shiou, S. (2002). Inverses of 2× 2 block matrices. Computers & Mathe- matics with Applications, 43(1-2):119–129.
Lucking-Reiley, D., Bryan, D., Prasad, N., and Reeves, D. (2007). Pennies from ebay: The determinants of price in online auctions. The journal of industrial economics, 55(2):223–233.
Maddala, G. S. (1986). Limited-dependent and qualitative variables in econometrics. Number 3. Cambridge university press.
Marschner, I. C. (2011). glm2: fitting generalized linear models with convergence problems. The R journal, 3(2):12–15.
McCullagh, P. and Nelder, J. A. (1989). Generalized linear models, volume 37. CRC press.
McGarry, K. (2000). Testing parental altruism: Implications of a dynamic model. Technical report, National Bureau of Economic Research.
McMillen, D. and McMillen, M. D. (2012). Package ‘mcspatial’. Nonparametric Spatial Data Analysis. August, 4.
Meier, L., Van De Geer, S., and Bu ̈hlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodol- ogy), 70(1):53–71.
Miller, A. R. and Tucker, C. (2013). Active social media management: the case of health care. Information Systems Research, 24(1):52–70.
Minka, T. P., Shmeuli, G., Kadane, J. B., Borle, S., and Boatwright, P. (2003). Computing with the com-poisson distribution.
Nychka, D. (1988). Bayesian confidence intervals for smoothing splines. Journal of the American Statistical Association, 83(404):1134–1143.
Okumura, H. (2011). Kernel binary regression with multiple covariates. Journal of the Japan Statistical Society, 41(1):001–016.
Olsen, R. J. (1980). A least squares correction for selectivity bias. Econometrica: Journal of the Econometric Society, pages 1815–1820.
Olver, F. W. (2014). Asymptotics and special functions. Academic press.
Park, B. U., Mammen, E., Lee, Y. K., and Lee, E. R. (2015). Varying coefficient regression models: a review and new developments. International Statistical Review, 83(1):36–64.
Park, M. Y. and Hastie, T. (2007). L1-regularization path algorithm for general- ized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):659–677.
Pollock, J. (2014). CompGLM: Conway-Maxwell-Poisson GLM and distribution func- tions. R package version 1.0.
Ramsay, J. O. (2006). Functional data analysis. Wiley Online Library.
Ross, G. J. et al. (2013). Parametric and nonparametric sequential change detection
in r: The cpm package. Journal of Statistical Software, 78.
Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric regression.
Number 12. Cambridge university press.
Schlereth, C. and Skiera, B. (2016). Two new features in discrete choice experiments to improve willingness-to-pay estimation that result in sdr and sadr: Separated (adaptive) dual response. Management Science, 63(3):829–842.
Sellers, K. F., Borle, S., and Shmueli, G. (2012). The com-poisson model for count data: a survey of methods and applications. Applied Stochastic Models in Business and Industry, 28(2):104–116.
Sellers, K. F., Lotze, T., and Raim, A. (2017). COMPoissonReg: Conway-Maxwell- Poisson (COM-Poisson) Regression. R package version 0.4.1.
Sellers, K. F. and Shmueli, G. (2010). A flexible regression model for count data. Annals of Applied Statistics, 4(2):943–961.
Sellers, K. F. and Shmueli, G. (2013). Data dispersion: Now you see it– now you don’t. Communications in Statistics-Theory and Methods, 42(17):3134–3147.
Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., and Boatwright, P. (2005). A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1):127–142.
Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non- parametric regression curve fitting. Journal of the Royal Statistical Society. Series B (Methodological), pages 1–52.
Steutel, F. (1985). Log-concave and log-convex distributions. Wiley StatsRef: Statistics Reference Online.
Stieb, D. M., Judek, S., and Burnett, R. T. (2003). Meta-analysis of time-series studies of air pollution and mortality: update in relation to the use of generalized additive models. Journal of the Air & Waste Management Association, 53(3):258–261.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288.
Tutz, G. and Gertheiss, J. (2016). Regularized regression for categorical data. Statistical Modelling, 16(3):161–200.
Vargha, A., Rudas, T., Delaney, H. D., and Maxwell, S. E. (1996). Dichotomiza- tion, partial correlation, and conditional independence. Journal of Educational and Behavioral statistics, 21(3):264–282.
Wahba, G. (1983). Bayesian” confidence intervals” for the cross-validated smoothing spline. Journal of the Royal Statistical Society. Series B (Methodological), pages 133–150.
Wang, J. C. and Hastie, T. (2014). Boosted varying-coefficient regression models for product demand prediction. Journal of Computational and Graphical Statistics, 23(2):361–382.
White, H. (1996). Estimation, inference and specification analysis. Number 22. Cam- bridge university press.
Wood, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association, 99(467):673–686.
Wood, S. N. (2006a). Generalized Additive Models: an introduction with R. CRC press.
Wood, S. N. (2006b). On confidence intervals for generalized additive models based on penalized regression splines. Australian & New Zealand Journal of Statistics, 48(4):445–464.
Wood, S. N. (2008). Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3):495–518.
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(1):3–36.
Wood, S. N. (2012). On p-values for smooth components of an extended generalized additive model. Biometrika, 100(1):221–228.
Wood, S. N. (2017). mgcv. R package version 1.8-18.
Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. MIT
press.
Yee, T. W. (2007). Vector generalized linear and additive models. Springer.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67.
Zeileis, A. (2005). A unified approach to structural change tests based on ml scores, f statistics, and ols residuals. Econometric Reviews, 24(4):445–466.
Zeileis, A., Hothorn, T., and Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17(2):492–514.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320.