Ant Inspired Data
Mining
Brandon Emerson
April 22, 2013
1
What is data mining?
•
Data mining is any process that analyzes and
organizes data into clear and concise formats.
It can be particularly powerful when creating
relationships between points of data.
•
Mainly used by companies with a consumer
focus, specifically marketing divisions. Data
mining allows them to make meaningful
relationships between products and
consumers.
2
Applications in Physics
•
Efficient data mining techniques can improve
data storage and retrieval in experiments that
require a great deal of data collection.
•
Effective mining can help analysts develop
relationships between specific points of data,
and thus physical phenomena.
3
Our Goals
•
Use basic ideas about ant behaviors to
develop an effective means of data mining.
•
Discuss recent improvements ant clustering
algorithms, and compare data mining
techniques by results from simple tests.
4
A Simple Model of Ants

1
5
Ant
Object
A Simple Model of Ants

2
Ant
Object
Probability of
picking up
=
+
𝑓
2
a
is a constant
f
is the perceived fraction
of objects nearby
Probability
of placing
=
𝑓
+
𝑓
2
b
is a constant
Assuming the ant moves randomly and it
has enough time to explore the entire
area, you could expect all of the objects
to be clustered together.
6
A Note on Perception
f
is the perceived fraction
of objects nearby
𝑓
=
1
2
1
−
𝑑
(
,
)
α
∈
(
𝑠
∗
𝑠
)
0
w
hen f > 0
otherwise
X
y
f(x) is now a measure of the
similarity
of object x
to object y in the area around object x
Where
𝑑
,
is a
dissimilarity function.
𝑑
,
=
0
𝑑
,
=
1
When the objects are the
same
:
When the objects are
different
:
α
is a scale factor for dissimilarity.
7
The Basic Algorithm
0 /*Initialization*/
1 for every object x do
2
place x randomly on grid
3 end for
4 for all ants do
5
Place ant at randomly selected site
6 end for
7 {*main loop*}
8 for all ants do
9 For t = 1 to
𝑚𝑎
do
10
If ((ant no object) and (site occupied by
object) then
11
Compute f(x) and probability of picking
up
12
Draw random real number R
13
if (R ≤
Prob
) then
14
pick up object
15.
end if
16.
else
17.
if (ant w/object) and (empty site) then
18.
compute f(x) and probability of
dropping
19.
draw random real number R
20.
if (R ≤
Prob
) then
21.
drop object
22.
end if
23.
end if
24.
end if
25.
move to randomly selected ant free
adjacent site
26.
end for
27.
end for
28.
Print location of objects
8
Improvements

1
•
Granted ants “short

term memory.” The ants
stored their last x number of locations. After
picking up data they proceed to their last
remembered locations sequentially.
•
Normalized the grid to enable efficient mining
of a variety of data set sizes.
9
10
∗
𝑁
10
∗
𝑁
20
∗
𝑁
2000
∗
𝑁
Where N is the maximum
number of data items to be
mined.
Grid size
Step size
Number of
iterations
Improvements

2
10
𝑓
∗
=
1
σ
[
1
−
𝑑
(
,
)
α
]
α
determines the percentage of items
that are
similar
. If
α
is too small, clusters
wont be formed. If
α
is too large, the
clusters will combine to create one super
cluster.
Each ant is
uniquely
assigned a value for
α
, and is allowed to
change
its value
in the following way: the ant makes a set number of moves (100), during
which it keeps track of how many times it has
failed to drop
data items F. The
rate of failure is found by F/100, and
α
is adapted according to these
parameters.
α
=
α
+
0
.
01
,
α
−
0
.
01
,
If rate
α
0.99
If rate
α
≤ 0.99
The Updated Algorithm
1.
/*Initialization Phase*/
2.
Randomly scatter x object on the grid file
3.
For each ant a do
4.
random_select_object
5.
pick_up_object
6.
place_agent
a at randomly selected
empty grid location
7.
End for
8.
{*main loop*}
9.
For t = 1 to
𝑚𝑎
do
10.
random_select_agent
11.
move_agent
to new location
12.
I =
carried_object
13.
compute f*(x) and
prob
of drop
14.
if drop = true then
15.
while pick = false do
16.
I =
random_select_object
17.
compute f*(x) and
prob
of pick
18.
pick_up_object
19.
end while
20.
end if
21.
end for
22.
end
11
Comparing Techniques
Iris
150
K

means
ACA
Clusters
3.000
2.960
Rand Index
0.824
0.785
F

measure
0.821
0.773
Dunn Index
2.866
2.120
Variance
0.861
4.213
Class. Err.
0.176
0.230
Best results
Clusters
3.000
3.000
Rand Index
0.829
0.814
F

measure
0.830
0.811
Dunn Index
2.939
2.306
Variance
0.899
1.486
Class. Err.
0.167
0.187
12
Iris 150 is a data set used from the
Machine Learning repository. K

means is a
standard technique for data mining, and is
used here to benchmark the Ant Clustering
Algorithm’s (ACA) performance.
Maximize these values
Minimize this value
Important note
: the ACA does not need
to be given the correct number of
clusters to proceed; whereas K

means
does.
Summary
•
Ant simulation offers a unique technique for
data mining. This technique was developed
using simple ideas about ant behavior.
•
Ant Clustering Algorithms could use
improvement, but as it stands it is fairly
effective.
•
As our understanding of ant behavior
improves, perhaps ACA could be refined into
an even more efficient tool.
13
Just to be Clear…
None of the information presented, including
data tables, and code, is my personal work. All
of the information was found in the paper
below.
Boryczka
,
Urszula
. "Ant Colony Metaphor in a
New Clustering Algorithm."
Control and
Cybernetics
39.2 (2010): 343

57. Print.
14
Comments 0
Log in to post a comment