The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification

Nur Syafiqah, Mohd Nafis (2022) The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification. PhD thesis, Universiti Malaysia Pahang (Contributors, Thesis advisor: Suryanti, Awang).

[img]
Preview
Pdf
ir.The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification.pdf - Accepted Version

Download (533kB) | Preview

Abstract

Sentiment classification is increasingly used to automatically identify a positive or negative sentiment in the opinionated text document, for instance, customer feedback or review. Feature selection has always been a critical and challenging problem in machine learning-based sentiment classification. Hybrid feature selection is an efficient technique in sentiment classification. However, there are several disadvantages that can be solved. Firstly, the ability to identify feature importance and reduce some features from opinionated text documents. The failure to address this issue will result in poor classification performance. Therefore, this research aims to improve the classification performances by proposing term frequency-inverse document frequency (TF-IDF) and support vector machine-recursive feature elimination (SVM-RFE) as a hybrid feature selection technique. The TF-IDF evaluates the feature importance, and the standard deviation-based threshold is used for feature reduction. The objective is to improve the conventional approach of reducing features from feature matrix. Later, the SVM-RFE re-evaluates and ranks the remaining features from TF-IDF-based feature matrix. Only the k-top features group from the SVM-RFE ranked features were used for sentiment classification. Finally, the support vector machine (SVM) classifier is employed to classify the English customer review datasets, i.e., opinion-labelled, and large IMDb. The performance was measured using accuracy, precision, recall, F-measure, and feature size reduction. The experimental results present promising performances up to 95.06% in the performance measurements, especially from the large IMDb datasets and additional dataset, hotel review. Consequently, the proposed technique could minimise 31.80% to 64.00% of the features during classification. This reduction rate is significant in optimally utilising the computational resources while preserving the efficiency of the classification performance.

Item Type: Thesis (PhD)
Additional Information: Thesis (Doctor of Philosophy) -- Universiti Malaysia Pahang – 2022, SV: Ts. Dr. Suryanti Awang, NO. CD: 13283
Uncontrolled Keywords: term frequency-inverse document frequency, support vector machine-recursive feature
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty/Division: Institute of Postgraduate Studies
Faculty of Computing
Depositing User: Mr. Nik Ahmad Nasyrun Nik Abd Malik
Date Deposited: 23 May 2023 08:30
Last Modified: 19 Sep 2023 01:09
URI: http://umpir.ump.edu.my/id/eprint/37676
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item