Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters

Wan Maseri, Wan Mohd and Beg, Abul Hashem and Herawan, Tutut and Noraziah, Ahmad (2011) Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters. International Journal of Information Retrieval Research (IJIRR), 1 (3). pp. 1-14. ISSN 2155-6377 (print); 2155-6385 (online). (Published)

[img] PDF
improved-parameterless-k-means_-auto-generation-centroids-and-distance-data-point-clusters(1).pdf
Restricted to Repository staff only

Download (262kB) | Request a copy

Abstract

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.

Item Type: Article
Uncontrolled Keywords: Clustering; Clustering Process; Data Mining; K-Means Algorithm; Partitioning Clustering Algorithm
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty/Division: Faculty of Computer System And Software Engineering
Depositing User: PM Dr. Noraziah Ahmad
Date Deposited: 18 Jun 2015 01:51
Last Modified: 05 Feb 2018 00:25
URI: http://umpir.ump.edu.my/id/eprint/9328
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item