Datasets Size: Effect on Clustering Results

Raheem, Ajiboye Adeleke and Ruzaini, Abdullah Arshah and Hongwu, Qin (2013) Datasets Size: Effect on Clustering Results. In: 3rd International Conference on Software Engineering & Computer Systems (ICSECS - 2013), 20-22 Ogos 2013 , Universiti Malaysia Pahang. pp. 1-9.. (Unpublished)

[img] PDF (fskkp-2013-ruzaini-DatasetsSizeEffect)

Download (336kB)


The recent advancement in the way we capture and store data pose a serious challenge for data analysis. This gives a wider acceptance to data mining, being an interdisciplinary field that implements algorithm on stored data with a view to discovering hidden knowledge. Most people that keep records, however, are yet to reap the benefits of this tool, this is due to the general notion that a large datasets is required to guarantee reliable results. However, this may not be applicable in all cases. In this paper, we proposed a research technique that implements descriptive algorithms on numeric datasets of varied sizes. We modeled each subset of our data using EM clustering algorithm; two different numbers of partitions (k) were estimated and used for each experiment. The clustering results were validated using external evaluation measure in order to determine their level of correctness. The approach unveils the implication of datasets size on the clusters formed and the impact of estimated number of partitions.

Item Type: Conference or Workshop Item (Speech)
Uncontrolled Keywords: Data mining; Algorithms; Datasets; Partitions and Clustering
Subjects: Q Science > QA Mathematics > QA76 Computer software
Faculty/Division: Faculty of Computer System And Software Engineering
Depositing User: Ms. Ratna Wilis Haryati Mustapa
Date Deposited: 11 Feb 2014 01:41
Last Modified: 18 May 2018 02:50
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item