簡易檢索 / 詳目顯示

研究生: 澤 凡
Jevon Mckenzie
論文名稱: 早期發現高中生不良學業表現之分類模式
Classification for Early Detection of High School Students Vulnerable to Poor Academic Performance
指導教授: 雷松亞
Soumya Ray
口試委員: 許裴舫
Xu Pei Fang
張寶塔
Bao-Taa Chang
學位類別: 碩士
Master
系所名稱: 科技管理學院 - 國際專業管理碩士班
International Master of Business Administration(IMBA)
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 52
中文關鍵詞: 学生表现数据挖掘分类
外文關鍵詞: Student Performance, Datamining, Classification
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • This paper aims to analyze various student demographic information. The purpose of doing so is to provide educational policy makers and educational institutions with meaningful high-level information on the students. This information will allow for the proactive intervention of identifying and providing additional support to high school students who are highly vulnerable to failing.
    This research provides a tool to predict and compare students’ academic performance using factors that are independent of the school they attended or the grades they had before. The models in this research identified two groups of factors that are significant determinants of a student’s academic performance: (1) Factors that relate to the parents (parents education, resources available at home, paid extra classes, address, student willingness to pursue higher education), and (2) Factors that relate to the student as an individual (study time, free time, weekly alcoholic consumption). Four models were applied: Logistic Regression using PCA Scores, Logistic Regression using Stepwise implementation, Decision Tree, and Random Forrest. Analysis of the dataset showed that there was a high correlation among many of the independent variables, which was the most common drawback among the different models applied. PCA allowed us to identify orthogonal components that were used as the independent variables in the logistic regression (using PCA scores from the components) model. This method proved to produce the best results. This model yielded a sensitivity
    Although it is not feasible to change the significant factors that relate to the parents, policy makers or the school can fill in the gap by providing some alternative form of teaching methodology and assessment to the students identified as those vulnerable to fail. Remedial classes can also be offered. These interventions can train and encourage students to prevail notwithstanding the factors that have been identified as those that have a significant effect on their grade.


    This paper aims to analyze various student demographic information. The purpose of doing so is to provide educational policy makers and educational institutions with meaningful high-level information on the students. This information will allow for the proactive intervention of identifying and providing additional support to high school students who are highly vulnerable to failing.
    This research provides a tool to predict and compare students’ academic performance using factors that are independent of the school they attended or the grades they had before. The models in this research identified two groups of factors that are significant determinants of a student’s academic performance: (1) Factors that relate to the parents (parents education, resources available at home, paid extra classes, address, student willingness to pursue higher education), and (2) Factors that relate to the student as an individual (study time, free time, weekly alcoholic consumption). Four models were applied: Logistic Regression using PCA Scores, Logistic Regression using Stepwise implementation, Decision Tree, and Random Forrest. Analysis of the dataset showed that there was a high correlation among many of the independent variables, which was the most common drawback among the different models applied. PCA allowed us to identify orthogonal components that were used as the independent variables in the logistic regression (using PCA scores from the components) model. This method proved to produce the best results. This model yielded a sensitivity
    Although it is not feasible to change the significant factors that relate to the parents, policy makers or the school can fill in the gap by providing some alternative form of teaching methodology and assessment to the students identified as those vulnerable to fail. Remedial classes can also be offered. These interventions can train and encourage students to prevail notwithstanding the factors that have been identified as those that have a significant effect on their grade.

    Table of Contents Abstract 1 Acknowledgements 2 List of Figures 4 List of Tables 5 Chapter 1 Introduction 6 Chapter 2 Related Works 8 Chapter 3 The Proposed Approach and Methodology 10 Chapter 4 Data Collection and Processing Described 14 4.1 Attribute Information 14 Chapter 5 Data Visualizations and Analysis 18 5.1 Data Description 18 5.2 Exploration through PCA 22 Chapter 6 Datamining Models 29 6.1 Drawbacks of the Models to be used 29 Chapter 7 Analysis of Results 31 7.1 Misclassification Costs 31 7.2 Results from Logistic Regression with PCA Scores Using R 32 7.3 Results from Stepwise Logistic Regression Using R 36 7.4 Results from Decision Tree Using Rapid Miner 38 7.5 Results from Random Forrest Using Dataiku 41 Chapter 8 Chosen Model 46 Chapter 9 Implementation (How to Deploy and Intervene) 47 Chapter 10 Challenges 48 Chapter 11 Future Work 48 Chapter 12 Conclusion 49 Reference 50

    Reference

    [1] Breiman L, “Random Forests. Machine Learning”, 45, no. 1, 5–32., 2001

    [2] Breiman L.; Friedman J.; Ohlsen R.; and Stone C, “Classification and Regression Trees. Wadsworth, Monterey”, CA, (1984).

    [3] Batista, G. E. A. P. A., Prati, R. C., and Monard, M. C., “A study of the behavior of several methods for balancing machine learning training data”. SIGKDD Explorations, 6(1), 2004.

    [4] Cushman K., “First in the family: Your college years. Advice about college from first generation students”. Next Generation Press, 2006.

    [5] Kotsiantis A, “Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program”, 2004.

    [6] Minaei-Bidgoli, “Using datamining to predict secondary school student performance”, 2003.
    [7] Pekka Kupari and Nissinen K, “Background factors behind mathematics achievement in Finnish education context: Explanatory models based on TIMSS 1999 and TIMSS 2011 data”, 2011.
    [8] Paulo, C and Silva, D, “Using Datamining to Predict Secondary School Student Performance”, 2008.
    [9] REUTERS/Keith Bedford, “The challenges of a first generation college student are tough, but not impossible”, (2015).

    [10] Romero, C, Ventura, S, Espejo, P. G. and Hervs, C, “Data mining algorithms to classify students,” 1st International Conference on Educational Data Mining”, June 2008.

    [11] Shmueli, G, “Data Mining for Business Intelligence, Second Edition, Chpt. 5, pg. 101”, 2010.

    [12] UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Student+Performance)

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE