JOURNAL OF COMPUTING, VOLUME 2, ISSUE 11, NOVEMBER 2010, ISSN 2151-9617HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG 14
Cluster Evaluation of Density BasedSubspace Clustering
Rahmat Widia Sembiring, Jasni Mohamad Zain
Abstract
– Clustering real world data often faced with curse of dimensionality, where real world data often consist of manydimensions. Multidimensional data clustering evaluation can be done through a density-based approach. Density approachesbased on the paradigm introduced by DBSCAN clustering. In this approach, density of each object neighbours with MinPointswill be calculated. Cluster change will occur in accordance with changes in density of each object neighbours. The neighboursof each object typically determined using a distance function, for example the Euclidean distance. In this paper SUBCLU,FIRES and INSCY methods will be applied to clustering 6x1595 dimension synthetic datasets. IO Entropy, F1 Measure,coverage, accurate and time consumption used as evaluation performance parameters. Evaluation results showed SUBCLUmethod requires considerable time to process subspace clustering; however, its value coverage is better. Meanwhile INSCYmethod is better for accuracy comparing with two other methods, although consequence time calculation was longer.
Index Terms
— clustering, density, subspace clustering, SUBCLU, FIRES, INSCY.
——————————
——————————
1 D
ATA
M
INING AND
C
LUSTERING
Data
mining
is
the
process
of
extracting
the
data
from
large
databases,
used
as
technology
to
generate
the
re
‐
quired
information.
Data
mining
methods
can
be
used
to
predict
future
data
trends,
estimate
its
scope,
and
can
be
used
as
a
reliable
basis
in
the
decision
making
process.
Functions
of
data
mining
are
association,
correlation,
prediction,
clustering,
classification,
analysis,
trends,
out
‐
liers
and
deviation
analysis,
and
similarity
and
dissimilar
‐
ity
analysis.
One
of
frequently
used
data
mining
method
to
find
patterns
or
groupings
of
data
is
clustering.
Clustering
is
the
division
of
data
into
objects
that
have
similarities.
Showing
the
data
into
smaller
clusters
to
make
the
data
becomes
much
simpler,
however,
can
also
be
loss
of
im
‐
portant
piece
of
data,
therefore
the
cluster
needs
to
be
analyzed
and
evaluated.
This
paper
organized
into
a
few
sections.
Section
2
will
present
cluster
analysis.
Section
3
presents
density
‐
based
clustering,
followed
by
density
‐
based
subspace
cluster
‐
ing
in
Section
4.
Our
proposed
experiment
based
on
per
‐
formance
evaluation
discussed
in
Section
5,
followed
by
concluding
remarks
in
Section
6.
2 C
LUSTER
A
NALYSIS
Cluster
analysis
is
a
quite
popular
method
of
discre
‐
tizing
the
data
[1].
Cluster
analysis
performed
with
mul
‐
tivariate
statistics,
identifies
objects
that
have
similarities
and
separate
from
the
other
object,
so
the
variation
be
‐
tween
objects
in
a
group
smaller
than
the
variation
with
objects
in
other
groups.
Cluster
analysis
consists
of
several
stages,
beginning
with
the
separation
of
objects
into
a
cluster
or
group,
fol
‐
lowed
by
appropriate
to
interpret
each
characteristic
value
contained
within
their
objects,
and
labelled
of
each
group.
The
next
stage
is
to
validate
the
results
of
the
clus
‐
ter,
using
discriminant
function.
3 D
ENSITY
B
ASED
C
LUSTERING
Density
‐
based
clustering
method
calculating
the
dis
‐
tance
to
the
nearest
neighbour
object,
object
measured
with
the
objects
of
the
local
neighbourhood,
if
inter
‐
object
close
relative
with
its
neighbour
said
as
normal
object,
and
vice
versa.
In
density
‐
based
cluster,
there
are
two
points
to
be
concerned;
first
is
density
‐
reachable,
where
p
point
is
density
‐
reachable
from
point
q
with
Eps
,
MinPoints
if
there
are
rows
of
points
p1,
...,
pn,
p1
=
q,
pn
=
p
,
such
that
pi+1
directly
density
‐
reachable
from
pi
as
shown
in
Fig
‐
ure
‐
1.
————————————————
Rahmat Widia Sembiring, is with Faculty of Computer Systems andSoftware Engineering Universiti Malaysia Pahang, Lebuhraya TunRazak, 26300, Kuantan, Pahang Darul Makmur, Malaysia
Jasni Mohamad Zain, is Associate Professor at Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang, Le-buhraya Tun Razak, 26300, Kuantan, Pahang Darul Makmur, Malay-sia
© 2010 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/
Add a Comment