簡易檢索 / 詳目顯示

研究生: 李少芃
LI, SHAO-PENG
論文名稱: 計數型數據的類別變數選取與水準合併分析
Categorical Variable Selection and Level Clustering in Count Data
指導教授: 徐南蓉
Hsu, Nan-Jung
口試委員: 汪上曉
Wong, Shang-Hsiao
曾勝滄
Tseng, Sheng-Tsiang
學位類別: 碩士
Master
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 55
中文關鍵詞: 計數模型類別變數選取水準合倂
外文關鍵詞: Count regression, Group Lasso, CAS-ANOVA
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文所感興趣的研究議題為製程最佳機台組合(golden path)的選取問題。此類研究問題的傳統做法是先找出影響製程良率的重要因子,再依據影響效應找出可使良率最大化的最佳機台組合。這類做法雖能找出唯一的最佳機台組合,但基於生產實務的考量,機台或生產路徑間是否存在實質差異性往往是更受關注的議題,因此在推論機台效應差異的同時,若能同步歸納具相似表現的機台群組,將能提供更具彈性的最佳機台組合策略。 基於上述目標,本論文針對自變數皆屬於類別型的計數型資料 (count data) 提出一套兩階段的參數估計方法,第一階段估計著重於篩選出重要因子,第二階段估計則是將重要因子中具有相似效應的類別(水準)進行合併,兩階段的統計推論都採用regularized likelihood approach。

    本論文研究的計數模型廣泛地涵蓋卜瓦松迴歸 (Poisson regeression)、負二項迴歸 (negative binomial regression),並考量over-dispersion 與零膨脹 (zero-inflated) 的情況。但所提出的推論方法可廣泛地適用於其他廣義線性模式 (generalized linear model).

    藉由數值模擬與製程機台組合的實例分析,驗證本論文所提出的參數估計方法在重要因子辨識與相似水準歸類兩面向皆有相當好的推論成效。


    Finding the golden path in a production process is an important issue for intelligent manufacturing. This thesis aims to solve this problem for a specific case that the production quality is measured by the failure counts and the factors relevant to the production quality all belong to categorical variables. Traditional approaches identify the important factors (tools) affecting the yield of the process first, and then determine the best production path maximizing the mean yield, called the golden path.This thesis further takes into account the clustering patterns of tool effects to provide a more flexible solution for the golden path in practice. To achieve this goal, a two-stage inference procedure for count data with categorical covariates is developed in a generalized linear model framework. A penalized likelihood approach is adopted for estimation and variable selection in which the important factors are identified in the first-stage via incorporating the grouped lasso regularization and tool clustering is implemented in the second stage via incorporating the fused lasso regularization. The effectiveness of the proposed method is demonstrated via a simulation study for Poisson models and an application to real manufacturing data collected in a 13-stage production process. The proposed methodology successfully identifies important factors and finds reasonably well cluster patterns of effects within factors for both simulations and the application.

    第一章 緒論..............1 第二章 資料介紹...........4 第三章 研究方法..........13 第四章 數值模擬..........29 第五章 實例分析..........36 第六章 結論與未來展望.....40 附錄....................42 參考文獻................54

    [1] Henry Scheffe. (1999). The Analysis of Variance. Elsevier Health Science
    Press.

    [2] Helton, Jon C., and Freddie Joe Davis. (2003). Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering System Safety 81(1):23–69.

    [3] Howard D Bondell and Brian J Reich. (2009). Simultaneous factor selection and collapsing levels in anova. Biometrics, 65(1):169–177.

    [4] John W Tukey. (1949). Comparing individual means in the analysis of variance. Biometrics, 99–114.

    [5] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1.

    [6] Joseph M Hilbe (2011). Negative Binomial Regression. Cambridge University Press.

    [7] Lukas Meier, Sara Van De Geer, and Peter Bühlmann. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society, 70(1):53–71.

    [8] Peter McCullagh. (1984). Generalized linear models. European Journal of Operational Research, 16(3):285–292.

    [9] Post, Justin B., and Howard D. Bondell. (2013). Factor selection and structural identification in the interaction ANOVA model. Biometrics 69(1):70–79.

    [10] Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society, 67(1):91–108.

    [11] Rina Foygel Barber, Emmanuel J Candès, et al. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5):2055–2085.

    [12] Sture Holm. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 65–70.

    [13] Yosef Hochberg. (1988). A sharper bonferroni procedure for multiple tests of significance. Biometrika, 75(4):800–802.54

    [14] Yoav Benjamini and Yosef Hochberg. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society , 289–300.

    [15] Yuan, Ming and Lin,Yi. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, 68(1):49–67.

    QR CODE