and Learning Classifier Systems

muttchessAI and Robotics

Nov 8, 2013 (4 years and 1 day ago)

68 views

Towards

a Mapping of Modern AIS
and Learning Classifier Systems

Larry Bull


Department of Computer Science & Creative Technologies

University of the West of England, U.K.


Background


For 25 years correlations between aspects of
AIS and Learning Classifier Systems (LCS) have
been highlighted.


Neither field appears to have benefitted.


More recently, an LCS has been presented for
unsupervised learning which, with hindsight, may
be viewed as a form of AIS.


Purpose is to bring this LCS to the attention of
the AIS community with the aim of serving as a
catalyst for sharing ideas and mechanisms.

LCS in a Nutshell


Invented by John Holland circa 1976.


Consist of an “ecology” of rules.


IF <states> AND <action> THEN Reward


Traditionally use reinforcement learning
techniques to approximate rule utility.


Use evolutionary computing techniques
to discover new rules.


Often incorporate other heuristics.


Environment


reward

[P]



10#0:11

EA

[M]

[A]

[A]
-
1

Action selection

Prediction

0,10,2,9

state

action

Q
-
learning

CS
-
1

Holland &
Reitman

‘78

LCS

Holland ‘80

Boole

Wilson ‘87

ZCS

Wilson ‘94

New

Boole

Bonelli

et al. ‘90

XCS

Wilson ‘95

UCS

Bernado
-
Mansilla

&
Garrell

‘03

XCSF

Wilson ‘00

Gofer

Booker ‘82

XCSC

Tammee

et
al.’08

CFCS2

Riolo

‘90

ACS

Stolzmann

‘98

ACS2

Butz

et al. ‘02

Regression

(& Reinforcement)

Reinforcement

Supervised

Unsupervised

Models

Learning

Classifier

Systems


Family Tree

1978
-
2008


Animat

Wilson ‘85

From LCS to AIS


Recently presented a novel variant of XCS for
data clustering.


Approach exploits the mechanisms inherent to
XCS but for unsupervised learning.


Aim is to develop an approach to learning rules
which accurately describe clusters
-

without
prior assumptions as to their number within a
given dataset.


With hindsight approach is a form of clonal
selection AIS.

YCSC Schematic

Data


cluster descriptor

EA

[M]

[P]

data

Error updates

Rule Representation: Bounded Affinity


A
condition consists of intervals:




{ {
c
1

,s
1
}, ….. {
c
d

,s
d
} }



c
is the interval’s range centre from
[0.0,1.0]


s
is the “spread” from that centre
(truncated).


d

is the number of dimensions.


Each interval predicates’ upper and lower
bounds are calculated as: [
c
i

-

s
i
,
c
i

+ s
i
].

Fitness


Each rule maintains a running estimate of
matching error and niche size.


Error
e

is derived from the Euclidean
distance with respect to the input
x

and
c

in the condition of each member of [M]:


Niches


Niche size estimates (
s
) are based on
match sets, i.e., number of concurrently
active rules:


s
j



s
j

+
b
( |[M]|
-

s
j
)



A time
-
triggered Genetic Algorithm is run
in the match sets.

Selection


All rules maintain a time
-
stamp of the cycle when they
were last in an [M] where the GA was used.


If
q
GA
cycles or more have passed
on average
for all rules
in a current [M], the GA is triggered.


The GA uses roulette
-
wheel selection with a scalable
function:



1


Fitness =


e
v

+
1




Time
-
stamps are reset for all members of [M]

Search


Offspring are produced via mutation
(probability
m
) where we mutate an allele by
adding an amount + or
-

rand(m
0
).


Crossover (probability
c
, two
-
point) can occur
between any two alleles, i.e., within an interval
predicate as well as between predicates.


If no rules match on a given time step, then a
covering operator is used which creates a rule
with its condition centre on the input value and
the spread with a range of
rand(s
0
)
, which then
replaces an existing member of the rulebase.

Replacement


Rule replacement is population wide
and proportional to niche occupancy.


Each rule maintains an estimate of
the size of [M] in which it occurs.


Roulette
-
wheel selection.


Encourages all niches to contain the
same number of rules;
rule resource
is balanced
.



Learning Process

Generalization

Max gen.


0




1

Fitness

niche



1/error

Experiments


Clustering is an important unsupervised
classification technique where a set of
data are grouped into clusters.


Done in such a way that data in the same
cluster are similar in some sense and data
in different clusters are dissimilar in the
same sense.

Some Data


Used randomly generated synthetic datasets.


The first dataset is well
-
separated and has
k

=
25 true clusters arranged in a 5x5 grid in
d

= 2
dimension.


Each cluster is generated from 400 data points
using a Gaussian distribution with a standard
deviation of 0.02, for a total of
n

= 10,000
datum.


The second dataset is not well
-
separated and
generated it in the same way as the first except
the clusters are not centred on that of their
given cell in the grid.

Examples

Experimental Detail


The parameters used were:
N
=800,
b
=0.2,
v
=5,
c
=0.8,
m

=0.04,
q
GA

=12,
s
0

=0.03,
m
0

=0.006.


All results presented are the average of
ten runs.


Learning trials consisted of 200,000
presentations of a randomly sampled data
point.

Example Initial Results

Compaction


Many overlapping rules are seen around
each true cluster.


Developed a four
-
step rule compaction
algorithm to remove overlaps:


Delete useless rules (v.low coverage)


Sort on numerosity


Sort on error


Extract largest [M] rules

Example Result after Compaction

Comparative Performance


We use as a measure of the quality of each
clustering solution the total of the
k
-
means
objective function.


Quality of LCS was 8.12 +/
-

0.54

and the
number of clusters 25.0 +/
-

0.


The average quality on the not well
-
separated
dataset was
24.50 +/
-

0.56

and the number of
clusters 14.0 +/
-

0.


The
k
-
means algorithm (
k
=25) averaged over 10
runs gives a quality of 32.42 +/
-

9.49 and 21.07
+/
-

5.25 on the well
-
separated and less
-
separated datasets respectively.


Comparative Performance II


For estimating the number of clusters we ran,
for 10 times each, different
k
(2 to 30)

with
different random initializations in
k
-
means.


To select the best clustering with different
numbers of clusters, the Davies
-
Bouldin validity
index was used.


The result on well
-
separated dataset has a
lower negative peak at 23 clusters and the less
-
separated dataset has a lower negative peak at
14 clusters.


Thus LCS better on separated data (25).

A Network
-
like Extension


One of the missing parts of XCS is a
niche fitness sharing mechanism.


Here rules adjust their fitnesses based on
the fitnesses of the other co
-
active rules.


Termed

relative accuracy

(
f’):






f’

=
f

/
S

f


Gives Improved Performance

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Conclusions


Similarities (and differences) between AIS and
LCS have long been noted.


Views taken from many different perspectives:
dynamical systems, networks, complex adaptive
systems, etc.


A recently presented LCS as a clustering
technique is essentially a clonal selection AIS.


Can mechanisms from both fields now be
consolidated to mutual benefit?



Some Possibilities


Theory and mechanisms for
generalization.


Adaptive rates of search.


Theory from ensembles/mixture
-
of
-
experts.


Representation schemes.


Memory.


N.B. A new theory of neuronal
replicators implies innate and adaptive
components in learning.