Predictive modelling of student academic performance using machine learning approaches : a case study in universiti islam pahang sultan ahmad shah

Nurul Habibah, Abdul Rahman (2024) Predictive modelling of student academic performance using machine learning approaches : a case study in universiti islam pahang sultan ahmad shah. Masters thesis, Universti Malaysia Pahang Al-Sultan Abdullah (Contributors, Thesis advisor: Sahimel Azwal, Sulaiman).

[img]
Preview
Pdf
Predictive modelling of student academic performance using machine learning approaches a case study in universiti islam pahang sultan ahmad shah.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Recently, predictive analytics research has grown in popularity in higher education because it provides helpful information to educators and potentially assists them in enhancing student achievement. Based on the literature review, studies on machine learning and predictive analytics to improve student performance are still scarce in Malaysian higher education. Besides that, the increment of dropout rates among students is crucial issue in Higher Education Institutions. With a huge number of students drop out, the higher education institution’s reputation might be dropped. Furthermore, it may cause a significant loss of human capital for the country. The main goal of the study was to develop the most accurate predictive model for predicting students’ performance levels using machine learning techniques such as multinomial logistic regression, decision trees, Random Forest, k-nearest neighbor, Naïve Bayes, and support vector machine. This study used Cramer’s V correlation and Spearman’s Rank Correlation Coefficient to determine the most correlated factor towards students’ performance level. Evaluation metrics encompass precision, recall, accuracy, F1-score, and area under the receiver operating characteristics curve. Drawing from a dataset spanning students enrolled in the Business Statistics course at Universiti Islam Pahang Sultan Ahmad Shah from 2013 to 2022, this study identifies students’ carry marks as the most correlated factor in determining performance levels. Particularly, the decision tree is identified as the most accurate predictive model, having a 0.60 accuracy value. The model also has the highest value for recall and F1-score compared to other models. Finally, four models, namely multinomial logistic regression, decision tree, Random Forest, and Naïve Bayes, have perfect scores, 1.00 of area under the receiver operating characteristics curve to distinguish fail grade students. At the end of this study, it is recommended that future research might reassess the model by considering additional variables or techniques that may help improve the predictive accuracy. The predictive algorithm can also be added to the Learning Management System along with a dashboard so that it is easier to do analyses in the future.

Item Type: Thesis (Masters)
Additional Information: Thesis (Master of Science (Mathematics) -- Universiti Malaysia Pahang – 2024, SV: Sahimel Azwal bin Sulaiman, No CD: 13661
Uncontrolled Keywords: multinomial logistic regression
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics
Faculty/Division: Institute of Postgraduate Studies
Center for Mathematical Science
Depositing User: Mr. Mohd Fakhrurrazi Adnan
Date Deposited: 07 May 2025 07:09
Last Modified: 07 May 2025 07:09
URI: http://umpir.ump.edu.my/id/eprint/44026
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item