簡易檢索 / 詳目顯示

研究生: 愛維萊姿
Avilez, Emmy Esther
論文名稱: 從回歸的角度探索面向預測偏差的分割
Exploring Predictive Deviance-oriented Segmentation from a Regression Perspective
指導教授: 雷松亞
Ray, Soumya
口試委員: 兪在元
Yoo, Jaewon
Danks, Nicholas Patrick
Danks, Nicholas Patrick
學位類別: 碩士
Master
系所名稱: 科技管理學院 - 國際專業管理碩士班
International Master of Business Administration(IMBA)
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 54
中文關鍵詞: 預測偏差預測偏差偏差樹分割層次聚類預測指標
外文關鍵詞: predictive deviants, predictive deviance, deviance trees, segmentation, hierarchical clustering, prediction metrics
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • N/A


    Danks et al. investigate predictive deviants/predictive deviance for structural equation models, but to date, no study has investigated predictive deviants/predictive deviance from a regression standpoint. Our study aims to complement Danks et al.'s research by exploring predictive deviance from a regression perspective. It contributes to practice by examining deviance trees as a segmentation technique and comparing it to traditional unsupervised and supervised learning segmentation techniques.

    We implemented our algorithm in the R statistical environment, reused functions from the SEMCOA package, and modified them where necessary. Our algorithm identifies the optimal number of segments when considering familiar prediction metrics (R-squared, out-of-sample MSE, and predictive deviance).

    Our results show that hierarchical clustering segmentation is limited in the number of segments on which we can apply predictive metrics. On the other hand, deviance trees are relatively accommodating to prediction metrics and thus assist with justifying the predictive power of our segments.

    Contents Abstract 4 1. Acknowledgment 5 2. Introduction 8 3. Literature Review 10 3.1 Segmentation 10 3.2 Hierarchical Clustering for Segmentation 11 3.3 Decision Trees for Segmentation 14 3.4 Predictive Segmentation 16 3.5 Predictive Deviance-Oriented Segmentation 16 3.6 Prediction Error 17 3.7 Predictive Deviance and Predictive Deviants 18 3.8 Deviance Groups and Deviance Trees 18 4. Applying Predictive Deviance and Segmentation to Regression 20 4.1 A Snapshot of Algorithm Implementation 20 5. About our Dataset 24 6. Hierarchical Clustering Segmentation 24 6.1 Identifying segments 24 6.2 Apply metrics to describe the segments: R-squared 26 6.3 Apply metrics to describe the segments: Out-of-Sample MSE/ Prediction Error (PE) 26 7. Demonstration of Predictive Deviance and Predictive Deviance-oriented Segmentation 28 7.1 Compute predictive deviance for our regression model using our dataset 28 7.2 Run a deviance tree to identify deviant groups 29 7.3 Remove the deviant groups 29 7.4 Identify levels of segmentation 31 7.5 Apply metrics to describe the levels: R-squared 33 7.6 Apply metrics to describe the levels: Out-of-sample MSE 34 7.7 Apply metrics to describe the levels: Predictive Deviance 35 8. Inspecting our Segments 37 8.1 Predictive Deviance-oriented Segmentation (PDS) 37 8.2 Hierarchical Clustering Segmentation (HCS) 37 9. Discussion 39 9.1 Hierarchical versus Predictive Deviance-oriented Segmentation 39 9.2 Metrics 40 10. Future Work and Limitations 41 10.1 Decision Trees and Other Clustering Methods 41 10.2 Prediction Metrics 41 10.3 Future Predictions 42 10.4 Datasets 42 11. Conclusion 43 12. References 44 13. Appendices 47 13.1 Hierarchical Clustering Implementation 47 13.2 Predictive Deviance Implementation 49

    Aluja-Banet, T., & Nafria, E. (2015). Stability and scalability in decision trees. Computational Statistics, 505–520.

    Bock, T. (n.d.). What is hierarchical clustering? Retrieved from Display R Blog Web site: https://www.displayr.com/what-is-hierarchical-clustering/

    Chauhan, N. S. (2022, February 9). Machine Learning: KD nuggets. Retrieved from KD nuggets Web site: https://www.kdnuggets.com/2020/01/decision-tree-algorithm-explained.html

    Coussement, K., Van den Bossche, F. A., & De Bock, K. W. (2014). Data accuracy's impact on segmentation performance: Benchmarking RFM analysis, logistic regression, and decision trees. Journal of Business Research, 2751-2758.

    Danks, N., Ray, S., & Shmueli, G. (2022). The Composite Overfit Analysis Framework: Assessing the Out-of-sample Generalizability of Construct-based Models Using redictive Deviance, Deviance Trees, and Unstable Paths. Working Paper, Hsinchu.

    Delua, J. (2021, March 12). Cloud: IBM. Retrieved from IBM Website: https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning#:~:text=Supervised%20learning%20is%20a%20machine,accuracy%20and%20learn%20over%20time.

    Dolnicar, S. (2002). Faculty of Commerce- Papers (Archive). Retrieved from University of Wollongong Website: http://ro.uow.edu.au/commpapers/273

    Dolnicar, S., Grün, B., & Leisch, F. (2018). Market Segmentation Analysis: Understanding It, Doing It, and Making It Useful. Springer Open.

    Fonseca, J. R., & S, J. R. (2011, January). Publication: ResearchGate. Retrieved from ResearchGate Website: https://www.researchgate.net/publication/234013895_Why_Does_Segmentation_Matter_Identifying_Market_Segments_Through_a_Mixed_Methodology

    Heavy.AI. (n.d.). Technical Glossary: Heavy.AI. Retrieved from Heavy.AI Web site: https://www.heavy.ai/technical-glossary/decision-tree-analysis#:~:text=Decision%20tree%20analysis%20is%20the,most%20effective%20courses%20of%20action.

    Hom, B., & Huang, W. (n.d.). Whitepapers: Decision Analyst. Retrieved from Decision Analyst Web site: https://www.decisionanalyst.com/whitepapers/comparesegmentation/#:~:text=Segmentation%20approaches%20can%20range%20from,and%20latent%20class%20cluster%20analysis.&text=Factor%20segmentation%20is%20based%20on%20factor%20analysis.

    IBM. (n.d.). Analytics: IBM. Retrieved from IBM Web site: https://www.ibm.com/topics/linear-regression

    IBM Cloud Education. (2021, March 3). IBM Cloud Learn Hub: IBM. Retrieved from IBM Web site: https://www.ibm.com/cloud/learn/overfitting

    Java T Point. (n.d.). Machine Learning: Java T Point. Retrieved from Java T Point Website: https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm

    Khalili-Damghan, K., Farshid, A., Abolmakarem, & Shaghayegh. (2018). Hybrid soft computing approach based on clustering, rule mining, and decision tree analysis for customer segmentation problem: Real case of customer-centric industries. Applied Soft Computing Journal, 51.

    Malhotra, R. M., & Chug, A. (2016). Software Maintainability: Systematic Literature Review and Current Trends. International Journal of Software Engineering and Knowledge Engineering, 1221-1253.

    McBurnie, T., & Clutterbuck, D. (1988). Give Your Company the Marketing Edge. Penguin Books.

    Penn State University. (n.d.). Applied Multivariate Statistical Analysis: PennState Eberly College of Science. Retrieved from Penn State, Eberly College of Science Web site: https://online.stat.psu.edu/stat505/lesson/14/14.4

    Pulkit, S. (2020, February 25). Blog: Analytics Vidhya. Retrieved from Analytics Vidhya Web site: https://www.analyticsvidhya.com/blog/2020/02/4-types-of-distance-metrics-in-machine-learning/

    statistics.com. (n.d.). Predictor P-Values in Predictive Modeling: statistics.com. Retrieved from statistics.com Web site: https://www.statistics.com/word-of-the-week-predictor-p-values-in-predictive-modeling/#:~:text=Predictor%20p%2Dvalues%20in%20linear,great%20as%20the%20fitted%20value.

    The Investopedia Team. (2022, February 11). Advanced Technical Analysis Concepts: Investopedia. Retrieved from Investopedia Web site: https://www.investopedia.com/ask/answers/012615/whats-difference-between-rsquared-and-adjusted-rsquared.asp#:~:text=R%2DSquared%20vs.-,Predicted%20R%2DSquared,predicts%20responses%20for%20new%20observations.

    Tsiptis, K., & Chorianopoulus, A. (2009). Data Mining Techniques in CRM. West Sussex: John Wileyy & Sons, Ltd.
    Weinstein, A. (1997). Strategic Segmentation. Journal of Segmentation in Marketing, 7-16.

    QR CODE