Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection

Tusher, Ekramul Haque and Mohd Arfian, Ismail and Akib, Abdullah and Gabralla, Lubna A. and Ashraf Osman, Ibrahim and Hafizan, Mat Som and Muhammad Akmal, Remli (2025) Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection. PLoS ONE, 20 (e0326488). pp. 1-44. ISSN 1932-6203. (Published)

[img]
Preview
Pdf
Comparative investigation of bagging enhanced machine learning.pdf
Available under License Creative Commons Attribution.

Download (9MB) | Preview

Abstract

Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.

Item Type: Article
Additional Information: Indexed by Scopus
Uncontrolled Keywords: Adult; Area Under Curve; Bayes Theorem; Decision Trees; Early Diagnosis; Female; Hepacivirus; Hepatitis C; Humans; Machine Learning; Male; Middle Aged; Support Vector Machine
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
R Medicine > RA Public aspects of medicine
Faculty/Division: Institute of Postgraduate Studies
Centre of Excellence for Artificial Intelligence & Data Science
Faculty of Computing
Depositing User: Mrs Norsaini Abdul Samat
Date Deposited: 15 Jul 2025 03:20
Last Modified: 15 Jul 2025 03:20
URI: http://umpir.ump.edu.my/id/eprint/45086
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item