a single-step information-theoretic clustering algorithm and its ...

quonochontaugskateAI and Robotics

Nov 24, 2013 (3 years and 6 months ago)

59 views

A S
INGLE
-
STEP
I
NFORMATION
-
THEORETIC
C
LUSTERING
A
LGORITHM AND ITS
A
PPLICATION TO
R
ECOGNITION OF
A
RM
M
OVEMENTS
Turgay TEMEL
Department of Computer Engineering, Haliç University,
Bomonti, Sisli, Istanbul, Turkey
Email:turgaytemel@hotmail.com
A new information-theoretic, one-step clustering algorithm is proposed based on relationship between scatter properties of
samples and associated information content. The algorithm allows the number of clusters and respective cluster centers to be
estimated based on a region of samples satisfying locally distributive characteristics. Proposed algorithm is justified with a
simple normal distribution cases and it is compared to some other well-known clustering algorithms for real data data which
represents various number of arm movements in the form of AR parameters as well as synthetic data. Simulation results
indicate considerably improved clustering performance compared to previously proposed algorithms. The main advantage
of the algorithm lies in the fact its complexity is 1 and it does not require presumed cluster number in contrast to well-known
algorithms previously described in literature.
Keywords: Clustering, Information Theory, Similarity, Prosthesis, Biomedical, Electromuography
1.I
NTRODUCTION
Major clustering methods/algorithms assume certain
observable statistics which represents the regularity in data.
Prominent clustering algorithms such as K-means [1], fuzzy
C-means [2], which usually both fail in identifying elongated
multi-modal clusters and the likelihood methods based on
expectation-maximization, (EM), [3] rely on the assumption
of sample compactness within individual clusters. Linkage
algorithms, e.g. [4], split and merge samples hierarchically
until a desired regularity has been reached by bi-partitioning
intermediate clusters. However these algorithms are sensitive
to outliers, iteration rule and selection of threshold(s).
Connectivity-kernel spectral clustering algorithms [5]
represent the cluster granularity for presence of graph cuts
where clusters are identified with use of spectral
decomposition on sample similarity (affinity) and cluster
partitioning by using K-means or other algorithms. All the
approaches cited above have another common major pitfall:
They need to be supplied with a pre-specified, possibly
exaggerated number of clusters and they need to be
iteratively run for a best cluster compactness. The kernel-
based spectral algorithm exemplified in [5] involves
specifying a global kernel-width parameter. A different
group of non-parametric, information-theoretic clustering
methods [6]-[7] exploit sample similarities in the form of
potential and entropy functions, respectively. Generally,
these methods fictitiously form a cluster from samples which
satisfy a threshold constraint associated to a particular
sample designated as a centre with minimum potential/
￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿￿
January-June 2010, Volume 1, No. 1, pp. 15-18
entropy. A cluster so formed is, then, removed from overall
data set. The threshold constraint restricts clusters to a fixed
shape and removal of clusters leads to distortion on data
characteristics.
In this study, we propose a new, one-step clustering
algorithm by exploiting the variation of a similarity-based
entropy description. It yields a set of fictitious cluster centers
based on respective sample regions which simultaneously
satisfy a data-dependent optimality condition concerning
both global and local scatter properties. It is compared to
some of the algorithms cited above in terms of identifying
six different arm movements. Each individual
electromyography (EMG) waveform given a particular arm
movement is represented by four autoregressive (AR)
parameters and signal power.
Simulation results indicate considerably improved
clustering performance compared to previous algorithms
previously proposed. The main advantage of the algorithm
lies in the fact its complexity is 1 and it does not require
presumed cluster number in contrast to well-known
algorithms previously described in literature.
2.S
IMILARITY
-
BASED
E
NTROPY AND
P
ROPOSED
A
LGORITHM
The scattering properties of a data set can be expressed in
terms of a Shannon-like entropy description based on a