Ant Inspired Data Mining

voltaireblingData Management

Nov 20, 2013 (3 years and 8 months ago)

86 views

Ant Inspired Data
Mining

Brandon Emerson





April 22, 2013

1

What is data mining?


Data mining is any process that analyzes and
organizes data into clear and concise formats.
It can be particularly powerful when creating
relationships between points of data.


Mainly used by companies with a consumer
focus, specifically marketing divisions. Data
mining allows them to make meaningful
relationships between products and
consumers.



2

Applications in Physics


Efficient data mining techniques can improve
data storage and retrieval in experiments that
require a great deal of data collection.


Effective mining can help analysts develop
relationships between specific points of data,
and thus physical phenomena.

3



Our Goals


Use basic ideas about ant behaviors to
develop an effective means of data mining.


Discuss recent improvements ant clustering
algorithms, and compare data mining
techniques by results from simple tests.

4

A Simple Model of Ants
-
1

5

Ant

Object

A Simple Model of Ants
-
2

Ant

Object

Probability of

picking up

=


+
𝑓
2

a

is a constant


f

is the perceived fraction
of objects nearby

Probability
of placing

=
𝑓

+
𝑓
2

b

is a constant

Assuming the ant moves randomly and it
has enough time to explore the entire
area, you could expect all of the objects
to be clustered together.

6

A Note on Perception

f

is the perceived fraction
of objects nearby


𝑓

=


1

2

1

𝑑
(

,

)
α



(
𝑠

𝑠
)
0

w
hen f > 0

otherwise

X

y

f(x) is now a measure of the
similarity

of object x
to object y in the area around object x

Where
𝑑

,

is a
dissimilarity function.

𝑑

,

=
0

𝑑

,

=
1

When the objects are the
same
:

When the objects are
different
:

α

is a scale factor for dissimilarity.


7

The Basic Algorithm

0 /*Initialization*/

1 for every object x do

2
place x randomly on grid

3 end for

4 for all ants do

5
Place ant at randomly selected site

6 end for

7 {*main loop*}

8 for all ants do

9 For t = 1 to

𝑚𝑎

do

10

If ((ant no object) and (site occupied by
object) then

11


Compute f(x) and probability of picking
up

12


Draw random real number R

13


if (R ≤
Prob
) then

14


pick up object


15.

end if

16.

else

17.

if (ant w/object) and (empty site) then

18.

compute f(x) and probability of
dropping

19.

draw random real number R

20.

if (R ≤
Prob
) then

21.

drop object

22.

end if

23.

end if

24.

end if

25.

move to randomly selected ant free
adjacent site

26.
end for

27.

end for

28.
Print location of objects

8

Improvements
-
1


Granted ants “short
-
term memory.” The ants
stored their last x number of locations. After
picking up data they proceed to their last
remembered locations sequentially.


Normalized the grid to enable efficient mining
of a variety of data set sizes.

9

10

𝑁



10

𝑁

20

𝑁

2000

𝑁

Where N is the maximum
number of data items to be
mined.

Grid size

Step size

Number of
iterations

Improvements
-
2

10

𝑓


=

1
σ

[
1

𝑑
(

,

)
α

]

α

determines the percentage of items
that are
similar
. If
α

is too small, clusters
wont be formed. If
α

is too large, the
clusters will combine to create one super
cluster.

Each ant is
uniquely

assigned a value for
α
, and is allowed to
change

its value
in the following way: the ant makes a set number of moves (100), during
which it keeps track of how many times it has
failed to drop

data items F. The
rate of failure is found by F/100, and
α

is adapted according to these
parameters.


α
=

α
+
0
.
01
,
α

0
.
01
,

If rate
α

0.99

If rate
α
≤ 0.99

The Updated Algorithm

1.
/*Initialization Phase*/

2.
Randomly scatter x object on the grid file

3.
For each ant a do

4.


random_select_object

5.


pick_up_object

6.


place_agent

a at randomly selected
empty grid location

7.
End for

8.
{*main loop*}

9.
For t = 1 to

𝑚𝑎

do

10.


random_select_agent


11.


move_agent

to new location

12.


I =
carried_object

13.


compute f*(x) and
prob

of drop

14.

if drop = true then

15.


while pick = false do

16.


I =
random_select_object

17.


compute f*(x) and
prob

of pick

18.


pick_up_object

19.


end while

20.


end if

21.


end for

22.
end

11

Comparing Techniques

Iris

150

K
-
means

ACA

Clusters

3.000

2.960

Rand Index

0.824

0.785

F
-
measure

0.821

0.773

Dunn Index

2.866

2.120

Variance

0.861

4.213

Class. Err.

0.176

0.230

Best results

Clusters

3.000

3.000

Rand Index

0.829

0.814

F
-
measure

0.830

0.811

Dunn Index

2.939

2.306

Variance

0.899

1.486

Class. Err.

0.167

0.187

12

Iris 150 is a data set used from the
Machine Learning repository. K
-
means is a
standard technique for data mining, and is
used here to benchmark the Ant Clustering
Algorithm’s (ACA) performance.

Maximize these values

Minimize this value

Important note
: the ACA does not need
to be given the correct number of
clusters to proceed; whereas K
-
means
does.

Summary


Ant simulation offers a unique technique for
data mining. This technique was developed
using simple ideas about ant behavior.


Ant Clustering Algorithms could use
improvement, but as it stands it is fairly
effective.


As our understanding of ant behavior
improves, perhaps ACA could be refined into
an even more efficient tool.

13

Just to be Clear…

None of the information presented, including
data tables, and code, is my personal work. All
of the information was found in the paper
below.


Boryczka
,
Urszula
. "Ant Colony Metaphor in a
New Clustering Algorithm."
Control and
Cybernetics

39.2 (2010): 343
-
57. Print.

14