簡易檢索 / 詳目顯示

研究生: 楊庭量
Yang, Ting Liang
論文名稱: 使用機器學習技術預測大學生是否續讀同校研究所之研究-以國立清華大學資工系為例
Churn Prediction in Undergraduate Students Continuing Their Graduate Study at the Same University-A Case Study of Department of Computer Science, National Tsing Hua University
指導教授: 黃婷婷
Huang, Ting Ting
口試委員: 王廷基
Wang, Ting Chi
賴尚宏
Lai, Shang Hong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 47
中文關鍵詞: 機器學習流失預測續讀學生續讀客戶流失
外文關鍵詞: machine learning, churn prediction, retention, retention of students, churn
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 台灣各知名大學之間,彼此常存在著多年的競爭,且隨著近年來教育發展越趨國際 化,使得競爭早已不僅止於這些國內名校間。而想要提昇大學競爭力最根本的作法,不外 乎是積極延攬好的人才,並長期培育提昇學校的研究實力,因此若能將校內優秀的大學畢 業生保留下來繼續修讀研究所,將有助於提升學校之研究水準與聲譽。
    本研究將機器學習方法應用於預測學生大學畢業後是否願意繼續修讀同校之研究所, 並以國立清華大學資訊工程學系之畢業生為資料來源,透過機器學習分類技術如J48決策 樹、隨機森林、支持向量機等方法,建立畢業學生流失預測模型並瞭解影響學生流失之重 要特徵,提供給校方作為擬定未來發展策略之參考。


    The competition between top universities in Taiwan has been existing for years, but since the development of education is becoming more international, the competition has become more intense and has expended to more than universities in Taiwan. The basic and most significant way to improve competitiveness of a university is to attract more talented and qualified students to come and study, good students with good training would definitely help to improve research capability. Thus, for the university, keeping talented undergraduate students to continue their graduate study would be also helpful to raise the reputation and research level for the university.

    This research applies machine learning techniques on churn prediction in undergraduate students continuing their graduate study at the same university, while using data of National Tsing Hua University computer science students as the data resource. Through machine learning classification methods like J48 Decision Tree, Random Forest and Support Vector Machine, we develop prediction models to detect possible churners and analyze the most important factors that affect students to churn.

    1 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 研究流程 4 1.4 論文架構 4 2 文獻探討 6 2.1 客戶流失管理 6 2.2 機器學習 7 2.3 機器學習方法在其他產業之應用 9 3 學生流失研究分析技術 11 3.1 資料收集與初步分析 11 3.1.1 資料來源 11 3.1.2 因應個人資料保護法之資料彙整與處理 12 3.1.3 資料訓練集標籤取得 16 3.1.4 資料總項目列表 16 3.1.5 初步資料處理 20 3.2 特徵篩選 21 3.3 機器學習分類技術應用 24 4 實驗結果 30 4.1 重要特徵分析 30 4.2 預測模型實驗結果 37 5 結論與未來展望 42

    [1] 104人力銀行. 國立清華大學升學就業地圖, 15 May 2016. ”http://www.104.com.tw/jb/career/department/navigation?sid=5002000000”.
    [2]Bodenseo Bernd Klein. Text categorization and classification, 2011. http://www.python-course.eu.
    [3] Bingquan Huang, Mohand Tahar Kechadi, and Brian Buckley. Customer churn predic- tion in telecommunications. Expert Syst. Appl., 39(1):1414–1425, January 2012.
    [4] BoozAllen & Hamilton Inc. Winning the customer churn battle in the wireless industry.
    [5] Wikipedia. Machine learning, 2014.
    [6] IBM 商用服務器事業群匮 企業管理最新戰力, Sep 1998.
    [7] 陳文華. 應用資料倉儲系統建立CRM, May 1999.
    [8] Sphan Nasr. Customer Relationship Management Strategies in the Digital Era. IGI Global, 2015.
    [9] 李御璽. 大數據時代的數據挖掘及應用, 2014. ”http://www.magazine.mcu.edu.tw/pdf files/20140924083020.pdf”.
    [10] 行政院法務部. 個人資料保護法, 15 May 2016. ”http://law.moj.gov.tw/LawClass/LawAll.aspx?PCode=I0050021”.
    [11] 國立清華大學課務組. 國立清華大學學士班基本科目免修測試辦法, 2014.
    [12] Tom Dietterich. Overfitting and undercomputing in machine learning. ACM Comput. Surv., 27(3):326–327, September 1995.
    [13] Imola Fodor. A survey of dimension reduction techniques, 2002.
    [14] Isabelle Guyon and Andr ́e Elisseeff. An introduction to variable and feature selection.
    J. Mach. Learn. Res., 3:1157–1182, March 2003.
    [15] Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lotfi A. Zadeh. Feature Extraction:
    Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer- Verlag New York, Inc., Secaucus, NJ, USA, 2006.
    [16] M. A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, New Zealand, 1998.
    [17] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(6):559–572, 1901.
    [18] Ian Jolliffe. Principal Component Analysis. John Wiley & Sons, Ltd, 2014.
    [19] McGraw-Hill Tom Mitchell. Machine Learning. Material, 1997.
    [20]Dr. Saed Sayad. Support vector machine - classification (svm). http://www.saedsayad.com/supportvectormachine.htm.
    [21] Wikipedia. Logistic regression. https://en.wikipedia.org/wiki/Logisticregression.
    28
    [22] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240–242, 1895.
    [23] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10–18, November 2009.
    [24] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’95, pages 1137–1143, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE