An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis

Tie, K. H. and A., Senawi and Chuan, Z. L. (2022) An observation of different clustering algorithms and clustering evaluation criteria for a feature selection based on linear discriminant analysis. In: Enabling Industry 4.0 through Advances in Mechatronics. Lecture Notes in Electrical Engineering book series (LNEE), 800 . Springer Nature Singapore Ptd. Ltd., Singapore, pp. 497-505. ISBN 978-981-19-2094-3(Printed); 978-981-19-2095-0 (Online)

[img]
Preview
Pdf
FULL TEXT PAPER.pdf

Download (2MB) | Preview

Abstract

Linear discriminant analysis (LDA) is a very popular method for dimensionality reduction in machine learning. Yet, the LDA cannot be implemented directly on unsupervised data as it requires the presence of class labels to train the algorithm. Thus, a clustering algorithm is needed to predict the class labels before the LDA can be utilized. However, different clustering algorithms have different parameters that need to be specified. The objective of this paper is to investigate how the parameters behave with a measurement criterion for feature selection, that is, the total error reduction ratio (TERR). The k-means and the Gaussian mixture distribution were adopted as the clustering algorithms and each algorithm was tested on four datasets with four distinct clustering evaluation criteria: Calinski-Harabasz, Davies-Bouldin, Gap and Silhouette. Overall, the k-means outperforms the Gaussian mixture distribution in selecting smaller feature subsets. It was found that if a certain threshold value of the TERR is set and the k-means algorithm is applied, the Calinski-Harabasz, Davies-Bouldin, and Silhouette criteria yield the same number of selected features, less than the feature subset size given by the Gap criterion. When the Gaussian mixture distribution algorithm is adopted, none of the criteria can consistently select features with the least number. The higher the TERR threshold value is set, the more the feature subset size will be, regardless of the type of clustering algorithm and the clustering evaluation criterion are used. These results are essential for future work direction in designing a robust unsupervised feature selection based on LDA.

Item Type: Book Chapter
Additional Information: Enabling Industry 4.0 through Advances in Mechatronics Selected Articles from iM3F 2021, Malaysia Presents selected articles from the mechatronics track of the iM3F 2021 conference, Malaysia Highlights recent findings in mechatronics pertinent toward the realization and embodiment of Industry 4.0
Uncontrolled Keywords: Unsupervised feature selection; Linear discriminant analysis Clustering algorithm; Clustering evaluation criterion
Subjects: H Social Sciences > HD Industries. Land use. Labor > HD28 Management. Industrial Management
T Technology > TJ Mechanical engineering and machinery
Faculty/Division: Institute of Postgraduate Studies
Center for Mathematical Science
Depositing User: Miss. Ratna Wilis Haryati Mustapa
Date Deposited: 31 Oct 2022 03:12
Last Modified: 31 Oct 2022 03:12
URI: http://umpir.ump.edu.my/id/eprint/35517
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item