簡易檢索 / 詳目顯示

研究生: 曾立豪
Zeng, Li-Hao
論文名稱: 基於柴比雪夫貪婪演算法的零膨脹迴歸高維度模型選擇
High-Dimensional Variable Selection for Zero-Inflated Regression Models using Chebyshev Greedy Algorithms
指導教授: 銀慶剛
Ing, Ching-Kang
口試委員: 黃文瀚
Hwang, Wen-Han
黃信誠
Huang, Hsin-Cheng
俞淑惠
Yu, Shu-Hui
學位類別: 碩士
Master
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 27
中文關鍵詞: 零膨脹泊松零膨脹二項式柴比雪夫貪婪演算法高維度信息準則變量選擇模型選擇
外文關鍵詞: Zero-inflated Poisson, zero-inflated binomial, Chebyshev greedy algorithm, high-dimensional information criterion, variable selection, model selection
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 零膨脹泊松(ZIP)迴歸經常用於具有過量零的計數數據。在ZIP 迴歸中,假設計數應變量服從Poisson(λ) 分佈和質量點為1 的退化分佈之混合分布,其混合概率為π。π 和λ 都可以通過自然鏈接廣義線性模型建模,並由協變量控制。目前關於使用期望最大化(EM)演算法進行估計和變量選擇的研究有很多,然而當存在大量協變量時,EM 演算法運算速度緩慢,並且在變量選擇中表現出較差的性能。在本論文中,我們引入柴比雪夫貪婪演算法(CGA)來快速篩選變量以及考慮高維度信息準則(HDIC),以對ZIP 迴歸和零膨脹二項式(ZIB)迴歸進行具有一致性的模型選擇。本論文提供模擬分析與晶圓數據的應用以佐證所提出方法的性能和實用性。


    Zero-inflated Poisson (ZIP) regression is frequently used for count data with excess zeros. In ZIP regression, a count response variable is assumed to be distributed as a mixture of a Poisson(λ) distribution and a distribution with a point mass of one at zero, with mixing probability π. Both π and λ are allowed to depend
    on covariates through canonical link generalized linear models. There are many studies on using the expectation-maximization (EM) algorithm for estimation and variable selection. However, when there are a large number of covariates, the EM algorithm is slow and exhibits poor performance in variable selection. In this thesis, we introduce the Chebyshev greedy algorithm (CGA) to quickly screen variables and high-dimensional information criterion (HDIC) for consistent model selection for ZIP regression and zero-inflated Binomial (ZIB) regression. Simulations and applications to wafer data are provided to show the performance and usefulness of the proposed approach.

    Abstract Contents 1 Introduction --------------------------------- 1 2 Zero-Inflated Regression Models -------------- 4 3 Variable Selection Procedure ----------------- 8 4 Theoretical Properties of CGA --------------- 11 5 Simulation Studies -------------------------- 15 5.1 Zero-Inflated Poisson Models -------------- 15 5.2 Zero-Inflated Bernoulli Models ------------ 19 6 Real Data Analysis -------------------------- 24 References ------------------------------------ 26

    [1] Chen, Y.-L, Dai, C.-S and Ing, C.-K (2019). High-Dimensional Model Selection via Chebyshev Greedy Algorithms. Working paper.
    [2] Jiang, M. and Zhang, H. (2018). Sparse Estimation in High-Dimensional Zero-Inflated Poisson Regression Model. J. Phys.: Conf. Ser., 1053, 012128.
    [3] Banerjee, P., Garai, B., Mallick, H., Chowdhury, S. and Chatterjee, S. (2018). A Note on the Adaptive LASSO for Zero-Inflated Poisson Regression. J. Probab. Stat., 2018, 2834183.
    [4] Mallick, H. and Tiwari, H. K. (2016). EM Adaptive LASSO — A Multilocus Modeling Strategy for Detecting SNPs Associated with Zero-Inflated Count Phenotypes. Front. Genet., 7, 32.
    [5] Wang, Z., Shuangge M. and Wang, C. -Y (2015). Variable Selection for Zero-Inflated and Overdispersed Data with Application to Health Care Demand in Germany. Biom J., 57, 867 – 884.
    [6] Wang, Z., Shuangge M., Wang, C. -Y, Zappitelli, M., Devarajan, P. and Parikh, C. (2014). EM for Regularized Zero Inflated Regression Models with Applications to Postoperative Morbidity after Cardiac Surgery in Children. Stat Med., 33, 5192 – 5208.
    [7] Buu, A., Johnson N. J., Li, R. and Tan, X. (2011). New Variable Selection Methods for Zero-Inflated Count Data with Applications to the Substance Abuse Field. Stat Med., 30, 2326 – 2340.
    [8] Ing, C. -K and Lai, T. L. (2011). A Stepwise Regression Method and Consistent Model Selection for High-Dimensional Sparse Linear Models. Statistica Sinica, 21, 1473 – 1513.
    [9] King, G. and Zeng, L. (2001). Logistic Regression in Rare Events Data. Political Analysis, 9, 137 – 163.
    [10] Hall, D. B. (2000). Zero-Inflated Poisson and Binomial Regression with Random Effects: A Case Study. Biometrics, 56, 1030 – 1039.
    [11] Lambert, D. (1992). Zero-Inflated Poisson Regression, With an Application to Defects in Manufacturing. Technometrics, 34, 1 – 14.

    QR CODE