簡易檢索 / 詳目顯示

研究生: 潘彥碩
Pan, Yan-Shuo
論文名稱: 貪婪演算法在高維度模型上的理論、方法及應用
Greedy Algorithm: Theory, Methods, and Applications in High-Dimensional Models
指導教授: 銀慶剛
Ing, Ching-Kang
口試委員: 徐南蓉
Hsu, Nan-Jung
冼芻蕘
Sin, Chor-Yiu
鄭又仁
Cheng, Yu-Jen
黃信誠
Huang, Hsin-Cheng
俞淑惠
Yu, Shu-Hui
學位類別: 博士
Doctor
系所名稱: 理學院 - 統計與數據科學研究所
Institute of Statistics and Data Science
論文出版年: 2024
畢業學年度: 113
語文別: 中文
論文頁數: 165
中文關鍵詞: 機器學習特徵選擇柴比雪夫貪婪演算法平均治療效應雙重穩健性多重穩健性稀疏性平方可加
外文關鍵詞: Machine_learning, Feature selection, Chebyshev's greedy algorithm, Average treatment effect, Double robustness, Multiple robustness, Sparsity, Square summability
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 高維度模型在近年來成為統計學界的熱門主題,無論在預測還是模型選擇的各方面,都引起了廣泛關注與研究。本論文探討與高維度模型相關的議題,主要應用貪婪演算法來處理不同的高維度問題。文章包含以下三個主題:
    1. 高維度問題中貪婪演算法的拓展:此部分著重於當模型存在錯誤時,使統計方法和機器學習方法仍能發揮相當的效果。我們研究在高維度模型中,即使模型有誤,貪婪演算法仍能有效選出重要的特徵。
    2. 貪婪演算法在高維度因果推論中的應用 : 我們將貪婪演算法應用於高維度因果推論的方法,並提出平均因果效應的估計量,證明此估計量具備良好的統計性質。
    3. 工具變數假設的拓展:目前許多涉及工具變數的研究假設模型為平方可加,而多數高維度研究僅假設模型為絕對可加。本論文將貪婪演算法的理論假設從絕對可加拓展至平方可加的情境,以應用於高維度線性模型中。


    High-dimensional models have become a popular topic in the field of statistics in recent
    years, attracting extensive attention and research in both prediction and model selection. This
    paper explores topics related to high-dimensional models, focusing on the application of greedy
    algorithms to address various high-dimensional problems. The paper consists of the following
    three themes:
    1. Extension of Greedy Algorithms in High-Dimensional Problems: This section
    focuses on enabling statistical and machine learning methods to perform effectively even
    when models are misspecified. We investigate scenarios in high-dimensional models where,
    despite model misspecification, the greedy algorithm can still effectively select important
    features.
    2. Application of Greedy Algorithms in High-Dimensional Causal Inference: We
    apply the greedy algorithm to high-dimensional causal inference, proposing an estimator
    for the average causal effect and proving that this estimator possesses desirable statistical
    properties.
    3. Extension of Instrumental Variable Assumptions: Many studies involving instru-
    mental variables assume a squared additivity condition, while most high-dimensional re-
    search assumes only absolute additivity. This paper extends the theoretical assumptions
    of the greedy algorithm in high-dimensional linear models to include squared additivity.

    1 Overview 1 2 MPCGA: A Tree-Based Chebychev’s Greedy Algorithm 3 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 MPCGA on Discrete Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 An application to VOCs Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 Multiple Robust Estimation for Average Treatment Effects with High-Dimensional Covariates 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 The proposed methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Theoretical Results of The Proposed Estimator . . . . . . . . . . . . . . . . . . . . . 38 3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Selecting variables from many (weak) instruments with square-summable coef- ficients 49 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3 Variable selection in the reduced-form equation via Extended OGA +Extended HDIC (EOGA+EHDIC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5 Summary 67 A Appendix A 70 B Appendix B 126 C References 160

    References
    Adamczak, Radoslaw and Wolff, Pawel (2015). “Concentration inequalities for non-Lipschitz func-
    tions with bounded derivatives of higher order”. In: Probability Theory and Related Fields 162,
    pp. 531–586.
    Akaike, Hirotugu (1974). “A new look at the statistical model identification”. In: IEEE transactions
    on automatic control 19(6), pp. 716–723.
    Anatolyev, Stanislav (2019). “Many instruments and/or regressors: A friendly guide”. In: Journal of
    Economic Surveys 33(2), pp. 689–726.
    Anderson, Theodore W and Rubin, Herman (1949). “Estimation of the parameters of a single equa-
    tion in a complete system of stochastic equations”. In: The Annals of mathematical statistics
    20(1), pp. 46–63.
    Angrist, Joshua D and Krueger, Alan B (1991). “Does compulsory school attendance affect schooling
    and earnings?” In: The Quarterly Journal of Economics 106(4), pp. 979–1014.
    Athey, Susan, Imbens, Guido W, and Wager, Stefan (2018). “Approximate residual balancing: de-
    biased inference of average treatment effects in high dimensions”. In: Journal of the Royal
    Statistical Society: Series B (Statistical Methodology) 80(4), pp. 597–623.
    Bang, Heejung and Robins, James M (2005). “Doubly robust estimation in missing data and causal
    inference models”. In: Biometrics 61(4), pp. 962–973.
    Barron, Andrew R, Cohen, Albert, Dahmen, Wolfgang, and DeVore, Ronald A (2008). “Approxima-
    tion and learning by greedy algorithms”. In.
    Basu, Sumanta and Michailidis, George (2015). “Regularized estimation in sparse high-dimensional
    time series models”. In.
    Bekker, Paul A (1994). “Alternative approximations to the distributions of instrumental variable
    estimators”. In: Econometrica: Journal of the Econometric Society, pp. 657–681.
    Belloni, Alexandre, Chen, Daniel, Chernozhukov, Victor, and Hansen, Christian (2012). “Sparse
    models and methods for optimal instruments with an application to eminent domain”. In: Econo-
    metrica 80(6), pp. 2369–2429.

    Belloni, Alexandre, Chernozhukov, Victor, Fern´andez-Val, Ivan, and Hansen, Christian (2017). “Pro-
    gram evaluation and causal inference with high-dimensional data”. In: Econometrica 85(1),
    pp. 233–298.
    Belloni, Alexandre, Chernozhukov, Victor, and Hansen, Christian (2014). “Inference on treatment
    effects after selection among high-dimensional controls”. In: The Review of Economic Studies
    81(2), pp. 608–650.
    Breiman, Leo (2001). “Random forests”. In: Machine learning 45(1), pp. 5–32.
    Cao, Weihua, Tsiatis, Anastasios A, and Davidian, Marie (2009). “Improving efficiency and robust-
    ness of the doubly robust estimator for a population mean with incomplete data”. In: Biometrika
    96(3), pp. 723–734.
    Chan, Kwun Chuen Gary (2013). “A simple multiply robust estimator for missing response problem”.
    In: Stat 2(1), pp. 143–149.
    Chan, Kwun Chuen Gary and Yam, Sheung Chi Phillip (2014). “Oracle, multiple robust and multi-
    purpose calibration in missing response problem”. In: to appear on Statistical Science.
    Chan, Kwun Chuen Gary, Yam, Sheung Chi Phillip, and Zhang, Zheng (2016). “Globally effi-
    cient non-parametric inference of average treatment effects by empirical balancing calibration
    weighting”. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78(3),
    pp. 673–700.
    Chawla, Nitesh V, Bowyer, Kevin W, Hall, Lawrence O, and Kegelmeyer, W Philip (2002). “SMOTE:
    synthetic minority over-sampling technique”. In: Journal of artificial intelligence research 16,
    pp. 321–357.
    Chen, Tianqi and Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system”. In: Proceed-
    ings of the 22nd acm sigkdd international conference on knowledge discovery and data mining,
    pp. 785–794.
    Chernozhukov, Victor, Hansen, Christian, and Spindler, Martin (2015). “Valid post-selection and
    post-regularization inference: An elementary, general approach”. In: Annual Review of Eco-
    nomics 7(1), pp. 649–688.
    Chow, Yuan Shih and Teicher, Henry (2012). Probability theory: independence, interchangeability,
    martingales. Springer Science & Business Media.
    Farrell, Max H (2015). “Robust inference on average treatment effects with possibly more covariates
    than observations”. In: Journal of Econometrics 189(1), pp. 1–23.
    Golub, Gene H and Van Loan, Charles F (2013). Matrix computations. JHU press.
    Graham, Bryan S, Xavier Pinto, Cristine Campos de, and Egel, Daniel (2012). “Inverse probability
    tilting for moment condition models with missing data”. In: The Review of Economic Studies
    79(3), pp. 1053–1079.
    Gramegna, Alex and Giudici, Paolo (2022). “Shapley feature selection”. In: FinTech 1(1), pp. 72–80.
    Hahn, Jinyong (1998). “On the role of the propensity score in efficient semiparametric estimation of
    average treatment effects”. In: Econometrica, pp. 315–331.
    Hahn, Jinyong (2002). “Optimal inference with many instruments”. In: Econometric Theory 18(1),
    pp. 140–168.
    Han, Peisong and Wang, Lu (2013). “Estimation with missing data: beyond double robustness”. In:
    Biometrika 100(2), pp. 417–430.
    Hansen, Lars Peter, Heaton, John, and Yaron, Amir (1996). “Finite-sample properties of some al-
    ternative GMM estimators”. In: Journal of Business & Economic Statistics 14(3), pp. 262–280.
    Hardy, Godfrey Harold, Littlewood, John Edensor, and P´olya, George (1952). Inequalities. Cambridge
    university press.
    Huang, Hsueh-Han (2022). “Statistical inference with complex time series data”. PhD thesis. National
    Tsing Hua University.
    Huang, Hsueh-Han, Ing, Ching-Kang, and Tsay, Ruey S (2024). “Sparse matrix estimation based on
    greedy algorithms and information criteria”. In: Technical Report.
    Imai, Kosuke and Ratkovic, Marc (2014). “Covariate balancing propensity score”. In: J. R. Statist.
    Soc. B (Statistical Methodology) 76(1), pp. 243–263.
    Imbens, Guido, Johnson, Phillip M, and Spady, Richard H (1995). Information theoretic approaches
    to inference in moment condition models.
    Ing, Ching-Kang (2020a). “Model selection for high-dimensional linear regression with dependent
    observations”. In: The Annals of Statistics 48(4), pp. 1959–1980.
    Ing, Ching-Kang (2020b). “Supplement to “Model selection for high-dimensional linear regression
    with dependent observations””. In: url: https://doi.org/10.1214/19-AOS1872SUPP.
    Ing, Ching-Kang and Lai, Tze Leung (2011). “A stepwise regression method and consistent model
    selection for high-dimensional sparse linear models”. In: Statistica Sinica, pp. 1473–1513.
    Ing, Ching-Kang and Wei, Ching-Zong (2006). “A maximal moment inequality for long range de-
    pendent time series with applications to estimation and model selection”. In: Statistica Sinica,
    pp. 721–740.
    Kang, Joseph DY and Schafer, Joseph L (2007). “Demystifying double robustness: A comparison of
    alternative strategies for estimating a population mean from incomplete data”. In.
    Kitamura, Y. and Stutzer, M. (1997). “An information-theoretic alternative to generalized method
    of moments estimation”. In: Econometrica 65(4), pp. 861–874.
    Kohavi, Ron and John, George H (1997). “Wrappers for feature subset selection”. In: Artificial
    intelligence 97(1-2), pp. 273–324.
    Kunitomo, Naoto (1980). “Asymptotic expansions of the distributions of estimators in a linear func-
    tional relationship and simultaneous equations”. In: Journal of the American Statistical Asso-
    ciation 75(371), pp. 693–700.
    Kursa, Miron B and Rudnicki, Witold R (2010). “Feature selection with the Boruta package”. In:
    Journal of statistical software 36, pp. 1–13.
    Lin, Chien-Tong, Cheng, Yu-Jen, and Ing, Ching-Kang (2023). “GREEDY VARIABLE SELECTION
    FOR HIGH-DIMENSIONAL COX MODELS.” In: Statistica Sinica 33.
    Lin, Chien-Tong, Ing, Ching-Kang, Dai, Chi-Shian, and Chen, You-Lin (2024). “High-Dimensional
    Model Selection via Chebyshev Greedy Algorithms”. In.
    Liu, Huan and Yu, Lei (2005). “Toward integrating feature selection algorithms for classification and
    clustering”. In: IEEE Transactions on knowledge and data engineering 17(4), pp. 491–502.
    Loh, Wei-Yin (2011). “Classification and regression trees”. In: Wiley interdisciplinary reviews: data
    mining and knowledge discovery 1(1), pp. 14–23.
    Mikusheva, Anna (2021). Many weak instruments in time series econometrics.
    Mikusheva, Anna and Sun, L. (2022). “Inference with many weak instruments”. In: The Review of
    Economic Studies 89(6), pp. 2663–2686.
    Mikusheva, Anna and Sun, L. (2024). “Weak identification with many instruments”. In: Econometrics
    Journal 27, pp. C1–C28.
    Morimune, Kimio (1983). “Approximate distributions of k-class estimators when the degree of overi-
    dentifiability is large compared with the sample size”. In: Econometrica: Journal of the Econo-
    metric Society, pp. 821–841.
    Mueller, Alfred H (1999). “Parton saturation at small x and in large nuclei”. In: Nuclear Physics B
    558(1-2), pp. 285–303.
    Negahban, Sahand N, Ravikumar, Pradeep, Wainwright, Martin J, and Yu, Bin (2012). “A unified
    framework for high-dimensional analysis of M-estimators with decomposable regularizers”. In.
    Nilsson, Roland, Pena, Jos´e M, Bj¨orkegren, Johan, and Tegn´er, Jesper (2007). “Consistent feature
    selection for pattern recognition in polynomial time”. In: The Journal of Machine Learning
    Research 8, pp. 589–612.
    Ning, Yang, Sida, Peng, and Imai, Kosuke (2020). “Robust estimation of causal effects via a high-
    dimensional covariate balancing propensity score”. In: Biometrika 107(3), pp. 533–554.
    Owen, Art B (1988). “Empirical likelihood ratio confidence intervals for a single functional”. In:
    Biometrika 75(2), pp. 237–249.
    Papaspiliopoulos, Omiros (2020). High-dimensional probability: An introduction with applications in
    data science.
    Pedregosa, F. et al. (2011). “Scikit-learn: Machine Learning in Python”. In: Journal of Machine
    Learning Research 12, pp. 2825–2830.
    Qin, Jin and Lawless, Jerry (1994). “Empirical likelihood and general estimating equations”. In: the
    Annals of Statistics 22(1), pp. 300–325.
    Qin, Jing and Zhang, Biao (2007). “Empirical-likelihood-based inference in missing response problems
    and its application in observational studies”. In: Journal of the Royal Statistical Society Series
    B: Statistical Methodology 69(1), pp. 101–122.
    Robins, James M, Rotnitzky, Andrea, and Zhao, Lue Ping (1994). “Estimation of regression coeffi-
    cients when some regressors are not always observed”. In: Journal of the American statistical
    Association 89(427), pp. 846–866.
    Rosenbaum, Paul R and Rubin, Donald B (1983). “The central role of the propensity score in
    observational studies for causal effects”. In: Biometrika 70(1), pp. 41–55.
    Rudelson, Mark and Vershynin, Roman (2013). “Hanson-wright inequality and sub-gaussian concen-
    tration”. In.
    Staiger, Douglas and Stock, J (1997). “to Econometrica”. In: Econometrica 65(3), pp. 557–586.
    Tan, Zhiqiang (2010). “Bounded, efficient and doubly robust estimation with inverse weighting”. In:
    Biometrika 97(3), pp. 661–682.
    Tan, Zhiqiang (2020). “Regularized calibrated estimation of propensity scores with model misspeci-
    fication and high-dimensional data”. In: Biometrika 107(1), pp. 137–158.
    Tibshirani, Robert (1996). “Regression shrinkage and selection via the lasso”. In: Journal of the
    Royal Statistical Society: Series B (Methodological) 58(1), pp. 267–288.
    Tsou, Ping-Hsien et al. (2021). “Exploring volatile organic compounds in breath for high-accuracy
    prediction of lung cancer”. In: Cancers 13(6), p. 1431.
    Van de Geer, Sara, B¨uhlmann, Peter, and Zhou, Shuheng (2011). “The adaptive and the thresholded
    Lasso for potentially misspecified models (and a lower bound for the Lasso)”. In.
    Van de Geer, Sara A (2008). “High-dimensional generalized linear models and the lasso”. In.
    Wang, Lei (2019). “Multiple robustness estimation in causal inference”. In: Communications in
    Statistics-Theory and Methods 48(23), pp. 5701–5718.
    Zhou, Shuheng (2009). “Thresholding procedures for high dimensional variable selection and statis-
    tical estimation”. In: Advances in Neural Information Processing Systems 22.

    QR CODE