Using Natural Cluster Information to

slipperhangingAI and Robotics

Nov 14, 2013 (3 years and 4 months ago)

56 views

Using Natural Cluster Information to
Build Fuzzy Indexing Structure

Department of Computer Science and Engineering

The Chinese University of Hong Kong

We

propose

a

novel

fuzzy

clustering

algorithm,

Sequential

Fuzzy

Competitive

Clustering

(SFCC),

to

obtain

the

natural

cluster

information

from

the

data
.

We

then

use

this

information

to

build

an

efficient

indexing

structure,

SFCC
-
binary

tree

(SFCC
-
b
-
tree)
.

Our

exper i ment al

r esul t s

show

t hat

SFCC
-
b
-
t r ee

per f or ms

bet t er

t han

t he

VP
-
tree

in

most

cases
.

Key Problems


Multimedia data are often high volume and high
dimensional.


Most indexing methods for content
-
based retrieval
often divide the objects in the same natural cluster into
several different partitions.


Performance degrades when the queries lie near the
partition boundaries.


Contributions


Propose Sequential Fuzzy Competitive Clustering
(SFCC).


Build an efficient indexing structure, SFCC Binary Tree
(FCC
-
b
-
tree) by using SFCC.


Demonstrate how to make use of Minimum Boundary
Rectangle (MBR), which is the smallest rectangle
containing all the data objects for the node, to perform
nearest
-
neighbor search efficiently.

The Advantages of SFCC
-
b
-
tree


The ability to group the data in the same natural
cluster in the same node.


Fast building time.


Fast retrieval time.

The Searching Algorithm in SFCC
-
b
-
tree

An Illustration of SFCC
-
b
-
tree






Figure 1: Procedure of the SFCC
-
b tree


(1) Find natural clusters information for the data set.


(2) Split it into two child nodes according to the



natural clusters information.


(3) Loop back to step 1again on each child nodes



until child node is small enough to fit into a leaf



node.

Experiments


Examine the building and Searching time for SFCC
-
b
-
tree when compare with VP
-
tree.


The experimental environment is Ultra Sparc5 and
implemented in C++.

Building time (in second) in different dimensionality.





Searching time (in second) in different dimensionality.

Selected Publication

1.
H.Y. Yue, I. King, and K.S. Leung,
Fuzzy clustering method for content
-
based indexing
, in 2001 WSES FSFS
International Conference, Fuzzy Sets and Fuzzy Systems, volume 1, pages 5411
-
5419, 2001.

2.
H.Y. Yue,
Fuzzy Clustering for Content
-
Based Indexing in Multimedia Databases
, M.Phil Thesis of the Dept. of
the Computer Science and Engineering in CUHK, 2001.

The Pruning Algorithm


Each node in SFCC
-
b
-
tree contains a Minimum
Boundary Rectangle (MBR), which is the smallest
rectangle containing all the data objects for the node.


Given two
n
-
dimensional hyper
-
cube
P

and
Q

exist
over
-
lapping, if and only if
P

and
Q

have overlapping
on all the
n

dimension.

Dimension
SFCC-b-tree
VP-tree
10
261.3
422.05
20
468.29
659.43
Dimension
SFCC-b-tree
VP-tree
10
0.10
1.75
20
0.14
3.23
H.Y. Yue, I. King and K.S. Leung