Density based subspace clustering: a case study on perception of the required skill

Rahmat Widia, Sembiring (2014) Density based subspace clustering: a case study on perception of the required skill. PhD thesis, Universiti Malaysia Pahang (Contributors, Thesis advisor: Jasni, Mohamad Zain).

[img]
Preview
Pdf
Density based subspace clustering - a case study on perception of the required skill.pdf

Download (5MB) | Preview

Abstract

This research aims to develop an improved model for subspace clustering based on density connection. The researches started with the problem were there are hidden data in a different space. Meanwhile the dimensionality increases, the farthest neighbour of data point expected to be almost as close as nearest neighbour for a wide range of data distributions and distance functions. In this case avoid the curse of dimensionality in multidimensional data and identify cluster in different subspace in multidimensional data are identified problem. However develop an improved model for subspace clustering based on density connection is important, also how to elaborate and testing subspace clustering based on density connection in educational data, especially how to ensure subspace clustering based on density connection can be used to justify higher learning institution required skill. Subspace clustering is projected as a search technique for grouping data or attributes in different clusters. Grouping done to identify the level of data density and to identify outliers or irrelevant data that will create each to cluster exist in a separate subset. This thesis proposed subspace clustering based on density connection, named DAta MIning subspace clusteRing Approach (DAMIRA), an improve of subspace clustering algorithm based on density connection. The main idea based on the density in each cluster is that any data has the minimum number of neighbouring data, where data density must be more than a certain threshold. In the early stage, the present research estimates density dimensions and the results are used as input data to determine the initial cluster based on density connection, using DBSCAN algorithm. Each dimension will be tested to investigate whether having a relationship with the data on another cluster, using proposed subspace clustering algorithms. If the data have a relationship, it will be classified as a subspace. Any data on the subspace clusters will then be tested again with DBSCAN algorithms, to look back on its density until a pure subspace cluster is finally found. The study used multidimensional data, such as benchmark datasets and real datasets. Real datasets are from education, particularly regarding the perception of students’ industrial training and from industries due to required skill. To verify the quality of the clustering obtained through proposed technique, we do DBSCAN, FIRES, INSCY, and SUBCLU. DAMIRA has successfully established very large number of clusters for each dataset while FIRES and INSCY have a high failure tendency to produce clusters in each subspace. SUBCLU and DAMIRA have no un-clustered real datasets; thus the perception of the results from the cluster will produce more accurate information. The clustering time for glass dataset and liver dataset using DAMIRA method is more than 20 times longer than the FIRES, INSCY and SUBCLU, meanwhile for job satisfaction dataset, DAMIRA has the shortest time compare to SUBCLU and INSCY methods. For larger and more complex data, the DAMIRA performance is more efficient than SUBCLU, but, still lower than the FIRES, INSCY, and DBSCAN. DAMIRA successfully clustered all of the data, while INSCY method has a lower coverage than FIRES method. For F1 Measure, SUBCLU method is better than FIRES, INSCY, and DAMIRA. This study present improved model for subspace clustering based on density connection, to cope with the challenges clustering in educational data mining, named as DAMIRA. This method can be used to justify perception of the required skill for higher learning institution.

Item Type: Thesis (PhD)
Additional Information: Thesis (Doctor of Philosophy (Computer Science)) -- Universiti Malaysia Pahang - 2014, SV: PROFESSOR DR. JASNI MOHAMAD ZAIN, NO. CD: 8255
Uncontrolled Keywords: Data mining; Database management
Subjects: Q Science > QA Mathematics > QA76 Computer software
Faculty/Division: Faculty of Computer System And Software Engineering
Depositing User: En. Mohd Ariffin Abdul Aziz
Date Deposited: 20 Feb 2023 07:44
Last Modified: 20 Feb 2023 07:44
URI: http://umpir.ump.edu.my/id/eprint/37064
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item