簡易檢索 / 詳目顯示

研究生: 麥慧芬
Mach, Patrizia
論文名稱: 應用資料探勘方法於辨別行為大數據之運籌管理文獻
A Data Mining Approach to Surveying Academic Literature on Behavioral Big Data in Operations Management Research
指導教授: 徐茉莉
Shmueli, Galit
口試委員: 李曉惠
Lee, Hsiao-Hui
林福仁
Lin, Furen
學位類別: 碩士
Master
系所名稱: 科技管理學院 - 國際專業管理碩士班
International Master of Business Administration(IMBA)
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 60
中文關鍵詞: 應用資料大數據文獻回顧運籌管理
外文關鍵詞: Academic Literature
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 行為大數據能夠成為運營管理研究的最新焦點,是因為透過傳統數學建模即可捕捉人類行為並有效幫助決策。為了深入瞭解此主題,對已發表的論文進行全面的文獻探討是常見的研究方法。然而,在一個龐大且不斷增長的運營管理文獻資料庫裡逐一識別存在於論文中的行為大數據既費時又費力,並且無法明確定義將文獻識別為與行為大數據相關的因素。這項研究提供了一種有效的數據挖掘方法,用於調查大量橫跨不同運營管理期刊的研究論文,客觀地將論文分類為相關與否。我們發現,如果期刊內容和結構與所使用的訓練集相似,該模型能夠檢測大量論文並對其進行正確分類。儘管不可能使整個過程自動化,但是這種對文檔進行分類而不需要親自閱讀每篇論文的過程需要更少的時間來識別較窄的子組以進行仔細檢查,並且透過一組用來識別行為大數據的條件便能提供更高效率和有條理的文獻探討過程。


    Behavioral big data has become a recent focus in operations management research as it attempts to aid decision making using traditional mathematic modelling that captures human behavior. To explore the depth of research on this topic, it is common to conduct a comprehensive literature review of published papers. However, identifying individual papers as containing behavioral big data in a large and growing pool of operations management published research is both time and labor intensive, and fails to specifically define factors that would identify a paper as pertaining to behavioral big data. This research provides an efficient data mining method to surveying a vast number of research articles across different operations management journals that objectively classifies papers as relevant or not. We find that the model was able to detect a larger number of papers and classify them correctly if the journal content and structure was similar to the training set used. Although it was not possible to automate the entire procedure, this process of classifying documents without manual reading of each paper requires less time to identify a narrower subgroup for close examination, and by requiring a set of conditions for identifying behavioral big data, provides a more efficient and structured literature review process.

    Table of Contents Acknowledgement 2 List of Figures 6 List of Tables 7 Abstract 8 中文摘要 9 1. Introduction 10 1.1. Motivation 13 2. Data 14 2.1. Extraction 15 2.2. Labeling 16 2.3. Exploration 17 2.4. Challenges 22 3. Data Mining Approach 24 3.1. General Approach 24 3.2. Classification Algorithms 24 3.2.1. Classification Tree 25 3.2.2. Random Forest 25 3.2.3. Boosted Tree 26 3.2.4. Logistic Regression 26 3.2.5. LASSO logistic regression 27 3.2.6. Ensemble 27 3.3. Performance Evaluation 28 3.3.1. Sensitivity and Specificity 28 3.3.2. ROC Curves 29 3.3.3. Confusion Matrices and Cut-off Value 29 3.3.4. Precision and Recall 29 4. Data Analysis and Results 31 4.1. Data Pre-Screening 31 4.2. Data Partitioning 32 4.3. Benchmark Model 32 4.4. Model Training and Evaluation 33 4.4.1. Classification Tree 35 4.4.2. Random Forest 37 4.4.3. Gradient Boosted Tree 39 4.4.4. Logistic Regression 41 4.4.5. LASSO Regression (L1 regularization) 43 4.4.6. Ensemble 45 5. Classifying test sets 46 5.1. Classifying Test Set: 2017 MS Journal 47 5.2. Classifying Test Set: 2017 MSOM Journal 49 5.3. Classifying Test Set: 2017 POM Journal 51 6. Conclusion 53 6.1. Limitations 55 6.2. Recommendations 55 6.3. Future Work 57 References 58

    Breiman, L., & Cutler, A. (n.d.). Random Forests. Retrieved from University of California, Berkeley Department of Statistics: https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#overview
    Clark, J. (2015, October 26). Google Turning Its Lucrative Web Search Over to AI Machines. Retrieved from Bloomberg: https://www.bloomberg.com/news/articles/2015-10-26/google-turning-its-lucrative-web-search-over-to-ai-machines
    Deshpande, G. (2016, February 17). 3 ways behavioral analytics can drive business growth. Retrieved from IBM Big Data & Analytics Hub: https://www.ibmbigdatahub.com/blog/3-ways-behavioral-analytics-can-drive-business-growth
    Donselaar, K. H., Gaur, V., Woensel, T. v., Broekmeulen, R. A., & Fransoo, J. C. (2010). Ordering Behavior in Retail Stores and Implications for Automated Replenishment. Management Science, 766-784.
    Gaus, T., Olsen, K., & Deloso, M. (2018, May 22). Synchronizing the digital supply network. Retrieved from Deloitte Insights: https://www2.deloitte.com/insights/us/en/focus/industry-4-0/artificial-intelligence-supply-chain-planning.html
    Google Developers. (2019, March 5). Classification: Precision and Recall. Retrieved from Google Developers: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall
    Hosmer, D., Lemeshow, S., & Sturdivant, R. (2013). Applied Logistic Regression. John Wiley & Sons, Inc.
    Kelley, C. (2018a, August 16). Hertz: How big data is delivering big advantages. Retrieved from The Big Data Insight Group: https://www.thebigdatainsightgroup.com/2018/08/hertz-how-big-data-is-delivering-big-advantages/
    Kelley, C. (2018b, July 26). Specsavers harnesses data to sharpen its performance visibility. Retrieved from The Big Data Insight Group: https://www.thebigdatainsightgroup.com/2018/07/specsavers-harnesses-data-to-sharpen-its-performance-visibility/
    Lamba, K., & Singh, S. P. (2017). Big data in operations and supply chain management: current trends and future perspectives. Production Planning & Control , 877-890.
    Le, J. (2018, June 19th). Decision Trees in R. Retrieved from DataCamp: https://www.datacamp.com/community/tutorials/decision-trees-R
    Lee, H.-H., Shmueli, G., & Mach, P. (2019 unpublished). Operations Management Research with Behavioral Big Data.
    Michał, O. (2018, November 30). Regularization: Ridge, Lasso and Elastic Net. Retrieved from DataCamp: https://www.datacamp.com/community/tutorials/tutorial-ridge-lasso-elastic-net
    Nadeem, M. (2018, July 7th). How YouTube Recommends Videos. Retrieved from Towards Data Science: https://towardsdatascience.com/how-youtube-recommends-videos-b6e003a5ab2f
    Peixeiro, M. (2018, December 10th ). Classification (Part 1) — Intro to Logistic Regression. Retrieved from Becoming Human: Artificial Intelligence Magazine: https://becominghuman.ai/classification-part-1-intro-to-logistic-regression-f6258791d309
    Shiller, R. (2017, October 11). Richard Thaler is a controversial Nobel prize winner – but a deserving one. Retrieved from The Guardian: https://www.theguardian.com/world/2017/oct/11/richard-thaler-nobel-prize-winner-behavioural-economics
    Shmueli, G. (2017). Research Dilemmas with Behavioral Big Data. Big Data, 98-119.
    Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Jr., K. C. (2017). Data mining for business analytics: concepts, techniques, and applications in R. Hoboken: John Wiley & Sons, Inc.
    Simchi-Levi, D. (2017, December 21). From the Editor. Retrieved from Informs PubsOnLine: https://pubsonline.informs.org/doi/full/10.1287/mnsc.2017.3019
    Singh, H. (2018, November 4th ). Understanding Gradient Boosting Machines. Retrieved from Towards Data Science: https://towardsdatascience.com/understanding-gradient-boosting-machines-9be756fe76ab
    Soyer, R., & Tarimcilar, M. M. (2008). Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach. Management Science, 266-278.
    STHDA. (n.d.). Text mining and word cloud fundamentals in R : 5 simple steps you should know. Retrieved from STHDA: http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know#the-5-main-steps-to-create-word-clouds-in-r
    Weiss, S. M., Indurkhya, N., & Zhang, T. (2010). Fundamentals of Predictive Text Mining. London: Springer.

    QR CODE