Graduate Course
An Introduction to Genetic Algorithms
ChiaHsuan Yeh
The lecture note is based on Mitchell (1996) and Goldberg (1989)
Chapter 1:Genetic Algorithms:An Overview
1.Motivation:
• The goal of creating artiﬁcial intelligence and artiﬁcial life can be traced back to the
very beginnings of the computer age.The earliest computer scientists —Alan Turing,
John von Neumann,Norbert Wiener,and others — were motivated in large part by
visions of imbuing compute programs with intelligence,with lifelike ability to self
replicate,and with the adaptive capability to learn and to control their environments.
• These early pioneers of computer science were as much interested in biology and
psychology as in electronics,and they looked to natural systems as guiding metaphors
for how to achieve their visions.
• It should be no surprise,then,that from the earliest days computers were applied
not only to calculating missile trajectories and deciphering military codes but also to
modeling the brain,mimicking human learning,and simulating biological evolution.
• These biologically motivated computing activities have waxed and waned over the
years,but since the early 1980s they have all undergone a resurgence in the com
putation research community.The ﬁrst has grown into ﬁled of neural networks,the
second into machine learning,and the third into what is now called “evolutionary
computation”,of which genetic algorithms are the most prominent examples.
2.A Brief History of Evolutionary Computation:
• In the 1950s and the 1960s several computer scientists independently studied evolu
tionary systems with the idea that evolution could be used as optimization tools for
engineering problems.The idea in all these systems was to evolve a population of
candidate solutions to a given problem,using operators inspired by natural genetic
variation and natural selection.
• Evolutionary Strategies:
In 1960s,Rechenberg (1965,1973) introduced “evolution strategies”,a method he
used to optimize realvalued parameters for devices such as airfoils.
• Evolutionary Programming:
Fogel,Owens,and Walsh (1966) developed “evolutionary programming”,a technique
in which candidate solutions to given tasks were represented as ﬁnitestate machines,
which were evolved by randomly mutating their statetransition diagrams and select
ing the ﬁttest.
1
• Genetic Algorithms:
Genetic algorithms (GAs) were developed by John Holland and his students and
colleagues at the University of Michigan in the 1960s and 1970s.In contrast with
evolutionary strategies and evolutionary programming,Holland’s original goal was
not to design algorithms to solve speciﬁc problems,but rather to formally study
phenomenon of adaptation as it occurs in nature and to develop ways in which the
mechanisms of natural adaptation might be imported into computer systems.
3.What Are Genetic Algorithms?
• Holland’s 1975 book:Adaptation in Natural and Artiﬁcial Systems
He presented genetic algorithm as an abstraction of biological evolution and gave a
theoretical framework for an adaptation under GA.
• Holland’s GA is a method for moving from one population of “chromosomes” (strings
of ones and zeros,or “bits”) to a new population by using a kind of “natural selection”
together with the geneticsinspired operators of crossover,mutation,and inversion.
• Genetic algorithms are search algorithms based on the mechanics of natural selection
and natural genetics.
• They combine survival of the ﬁttest among string structures with a structured
yet randomized information exchange to form a search algorithm with some of the
innovative ﬂair of human search.
• In every generation,a new set of artiﬁcial creatures (strings) is created using bits and
pieces of the ﬁttest of the old;an occasional new part is tried for good measure.
• Each chromosome (string) consists of “genes” (e.g.,bits),each gene being in instance
of a particular “allele” (e.g.,0 or 1).The selection operator chooses those chromo
somes in the population that will be allowed to reproduce,and on average the ﬁtter
chromosome produce more oﬀspring than the less ﬁt ones.
• Crossover exchanges subparts of two chromosomes,roughly mimicking biological re
combination between two singlechromosome (haploid) organisms.
• Mutation randomly changes the allele values of some locations in the chromosome.
• Inversion reverses the order of a contiguous section of the chromosome,thus rear
ranging the order in which genes are arrayed.
• GAs eﬃciently exploit historical information to speculate on new search points with
expected improved performance.
• Purpose:
The goals of their research have been twofold:(1) to abstract and rigorously explain
the adaptive processes of natural systems,and (2) to design artiﬁcial systems software
that retains the important mechanisms of natural and artiﬁcial systems science.
• Genetic algorithms are theoretically and empirically proven to provide robust search
in complex spaces.
• These algorithms are computationally simple yet powerful in their search for improve
ment.
• They are not fundamentally limited by restrictive assumptions about the search space
(assumptions concerning continuity,existence of derivatives,unimodality,and other
matters)
2
4.The Appeal of Evolution
• Many computational problems require searching through a huge number of possibility
for solutions.
• Such search problems can often beneﬁt from an eﬀective use of parallelism,in which
many diﬀerent possibility are explored simultaneously in an eﬀective way.
• What is needed is both computational parallelism (i.e.,many processors evaluating
sequences at the same time) and an intelligent strategy for choosing the next set of
sequences to evaluate.
• Many computational problems require a computer programs to be adaptive — to
continue to perform well in a changing environment.
• Many problems require computer programs to be innovative —to construct something
truly new and original,such as a new algorithm for accomplishing a computational
task or even a new speciﬁc discovery.
• Many computational problems require complex solutions that are diﬃcult to program
by hand.
• Many AI researchers believe that “rules” underlying intelligence are too complex for
scientists to encode by hand in a “topdown” fashion.Instead they believe that
the best route to artiﬁcial intelligence is through a “bottomup” paradigm in which
humans write only very simple rules,and complex behaviors such as intelligence
emerge from the massively parallel application and interaction of these simple rules.
• Biological evolution is an appealing source of inspiration for addressing these prob
lems.Evolution is,in eﬀect,a method of searching among an enormous number of
possibilities for“solutions”.
5.Robustness of Traditional Optimization and Search Methods
• The current literature identiﬁes three main types of search methods:
– Calculusbased
Calculusbased methods have been studied heavily.These subdivide into two
main classes:indirect and direct.Indirect methods seek local extrema by solv
ing the usually nonlinear set of equations resulting from setting the gradient of
the objective function equal to zero.This is the multidimensional generalization
of the elementary calculus notion of extremal points.Given a smooth,uncon
strained function,ﬁnding a possible peak starts by restricting search to those
points with slopes of zero in all directions.On the other hand,direct (search)
methods seek local optima by hopping on the function and moving in a direction
related to the local gradient.This is simply the notion of hillclimbing:to ﬁnd
the local best,climb the function in the steepest permissible direction.While
both of these calculusbased methods have been improved,extended,hashed,
and rehashed,some simple reasoning shows their lack of robustness.
∗ Both methods are local in scope,the optima they seek are the best in a
neighborhood of the current point.
∗ Once the lower peak is reached,further improvement must be sought through
random restart or other trickery.Calculusbased methods depend upon the
existence of derivatives (welldeﬁned slope values).Even if we allow numerical
approximation of derivatives,this is a severe shortcoming.
3
– Enumerative
Within a ﬁnite search space or a discretized inﬁnite search space,the search
algorithm starts looking at objective function values at every point in the space,
once at a time.
∗ Although the simplicity if this type of algorithm is attractive,and enumer
ation is a very human kind of search,such schemes must ultimately be dis
counted in the robustness race for one simple reason:lack of eﬃciency.
– Random
Random search algorithms have achieved increasing popularity as researchers
have recognized the shortcomings of calculusbased and enumerative schemes.
∗ Random walks and random schemes that search and save the best must also
be discounted because of the eﬃciency requirement.
∗ Random searches,in the long run,can be expected to do no better than
enumerative schemes.
• How Are Genetic Algorithms Diﬀerent from Traditional Methods?
– GAs work with a coding of the parameter set,not the parameters themselves.
– GAs search from a population of points,not a single point.
– GAs use payoﬀ (objective function) information,not derivatives or other auxiliary
knowledge.
– GAs use probabilistic transition rules,not deterministic rules.
• GAs search combine “exploitation” (deterministic search) and “exploration” (random
search).
6.Elements of Genetic Algorithms
• There is no rigorous deﬁnition of “genetic algorithm” accepted by all in the evo
lutionary computation community that diﬀerentiates GAs from other evolutionary
computation methods.
• However,it can be said that most methods called “GAs” have at least the following
elements in common:
– population of chromosomes,
– selection according to ﬁtness (performance),
– crossover to produce new oﬀspring,
– random mutation of new oﬀspring.
• A Simple Genetic Algorithm:
(a) Start with a randomly generated population of a n lbit chromosomes (candidate
solutions to a problem).
(b) Calculate the ﬁtness f(x) of each chromosome x in the population.
(c) Repeat the following steps until n oﬀspring have been created.
i.Select a pair of parent chromosomes from the current population,the prob
ability of selection being an increasing function of ﬁtness.Selection is done
“with replacement”,meaning that the same chromosome can be selected more
than once to become a parent.
4
ii.With probability p
c
(the “crossover probability” or “crossover rate”),cross
over the pair at a randomly chosen point (chosen with uniform probability)
to form two oﬀspring.If no crossover takes place,form two oﬀspring that
exact copies of their respective parents.(Note that here the crossover rate
is deﬁned to be the probability that two parents will cross over in a single
point.There are also “multipoint crossover” versions of the GA in which
the crossover rate for a pair of parents is the number of points at which a
crossover takes place.)
iii.Mutate the two oﬀspring at each locus with probability p
m
(the mutation
probability or mutation rate),and place the resulting chromosomes in the
new population.If n is odd,one new population member can be discarded
at random.
(d) Replace the current population with the new population.
(e) Go to step (b).
7.The Basic Characteristics of Genetic Algorithms:
• Randomness plays a large role in the run of GAs,each run with diﬀerent random
number seeds will generally produce diﬀerent detailed behaviors.
• GA researchers often report statistics (such as the best ﬁtness found in a run and the
generation at which the individual with that best ﬁtness was discovered) averaged
over many diﬀerent runs of the GA on the same problem.
• There are a number of details to ﬁll in,such as the size of the population and the prob
abilities of crossover and mutation,and the success of the algorithm often depends
greatly on these details.
8.Some Applications of Genetic Algorithms:
• Optimization
• Automatic programming
• Machine learning
• Economics
• Immune systems
• Ecology
• Population genetics
• Evolution and learning
• Social systems
•........
9.Example:Optimization of a Simple Function
• Given a function,f(x) = xsin(10πx) +1.0,and it is drawn in Figure 1.
• The problem is to ﬁnd x from [1,2] which maximizes the function f,i.e.,to ﬁnd x
0
such that f(x
0
) ≥ f(x),for all x ∈ [1,2].
• Representation:
– We use a binary vector as a chromosome to represent real values variable x.
5
Figure 1:Graph of the function f(x) = xsin(10πx) +1.0
– The length of the vector depends on the required precision,for example,six
places after the decimal point.
– The domain of the variable x has length 3,this implies the range should be
divided into at least 3× 1000000 equal size ranges.
– This means that 22 bits are required as a binary vector because 2097152 =
2
21
< 3000000 ≤ 2
22
= 4194304.
– The mapping from a binary < b
21
,b
20
,...,b
0
> into a real number x from the
range [1,2] is as follows:
∗ convert the binary string < b
21
,b
20
,...,b
0
> from the base 2 to base 10:
(< b
21
,b
20
,...,b
0
>)
2
= (Σ
21
i=0
b
i
2
i
)
10
= x
′
,
∗ ﬁnd a corresponding real number x:
x = −1.0 +x
′ 3
2
22
−1
where 1.0 is the left boundary of the domain and 3 is the length of the
domain.
• The evaluation function (ﬁtness function) for the binary vectors is equivalent to the
function f:
eval(v) = f(x),
where the chromosome v represents the real value x.
• Experiment result:v
max
=(1111001101000100000101),which corresponds to a value
x
max
= 1.850773.
10.Example:Using GAs to Evolve Strategies for the Prisoner’s Dilemma
• Asimple twoperson game invented by Merrill Flood and Melvin Dresher in the 1950s.
• Two individuals (call them Alice and Bob) are arrested for committing a crime to
gether and are held in separate cells,with no communication possible between them.
• Alice is oﬀered the following deal:
6
– If she confesses and agrees to testify against Bob,she will receive a suspended
sentence with probation,and Bob will be put away for 5 years.
– However,if at the same time Bob confesses and agrees to testify against Alice,
her testimony will be discredited,and each will receive 4 years for pleading guilty.
– Alice is told that Bob is being oﬀered precisely the same deal.
• Payoﬀ matrix:(Alice,Bob)
Bob
Cooperate
Defect
Alice
Cooperate
2,2
5,0
Defect
0,5
4,4
• Each player independently decides which move to make —i.e.,whether to cooperate
or defect.
• What is the best strategy to use in order to maximize one’s own payoﬀ?
– If you suspect that your opponent is going to cooperate,then you should surely
defect.
– If you suspect that your opponent is going to defect,then you should defect too.
• The dilemma is that if both players defect each gets a worse score than if they
cooperate.
• If the game is iterated (that is,if the two players play several games in a row),both
players’ always defecting will lead to a much lower total payoﬀ than the players would
get if they cooperated.
• How can reciprocal cooperation be induced?
• Robert Axelrod’s studies in the University of Michigan:
– Human Designed Strategies:
∗ He solicited strategies from researchers in a number of disciplines.
∗ Each participant submitted a computer program that implemented a partic
ular strategy.
∗ These various programs played iterated games with each other.
∗ During each game,each program remembered what move (i.e.,cooperate
or defect) both it and its opponent had made in each of the three previous
games that they had played with each other,and its strategy was based on
this memory.
∗ The programs were paired in a roundrobin tournament in which each played
with all the other programs over a number of games.
∗ Some of the strategies submitted were rather complicated,using techniques
such as Markov processes and Bayesian inference to model the other players
in order to determine the best move.
∗ However,the winner (the strategy with the highest average score) was the
simplest of the submitted strategies:TIT FOR TAT.
∗ TIT FOR TAT punishes that defection with a defection of its own,and
continues the punishment until the other player begins cooperating again.
– Genetic Algorithm Implementation (1):
∗ Axelord (1987) decided to see if a GA could evolve strategies to play this
game successfully.
7
∗ Representation of strategies:the memory of each player is one previous game.
∗ Representation of strategies:the memory of each player is three previous
games.
∗ The ﬁtness of a strategy:
· Axelord had found that eight of the humangenerated strategies from the
tournament were representative of the entire set of strategies.
· The set of eight strategies (which did not include TIT FOR TAT) served
as the “environment” for the evolving strategies in the population.
· Each individual in the population played iterated games with each of the
eight ﬁxed strategies,and the individual’s ﬁtness was taken to be its average
score over all the games it played.
∗ Simulation result:most of the strategies that evolved were similar to TIT
FOR TAT.
∗ It would be wrong to conclude that the GA discovered strategies that are
“better” than any humandesigned strategy.
∗ The performance of a strategy depends very much on its environment —that
is,on the strategies with which it is playing.
∗ Here the environment was ﬁxed — it consisted of eight humandesigned
strategies that did not change over the course of a run.
∗ The resulting ﬁtness function is an example of a static ﬁtness landscape.
∗ It is not necessary true that these highscoring strategies would also score
well in a diﬀerent environment.
– Genetic Algorithm Implementation (2):
∗ Axelord carried out another in which the ﬁtness of an individual was deter
mined by allowing the individuals in the population to play with one another
rather than with the ﬁxed set of eight strategies.
∗ Then the environment changed from generation to generation because the
opponents themselves were evolving.
∗ The the ﬁtness landscape was not static.
∗ In the ﬁrst few generations,strategies that tended to cooperate did not ﬁnd
reciprocation among their fellow population members and thus tended to die
out.
∗ After about 1020 generations,the trend started to reverse:the GA discov
ered strategies that reciprocated cooperation and that punished defection
(i.e.,variants of TIT FOR TAT).
11.Example:Traveling Salesman Problem (TSP)
• What is TSP?
The traveling salesman must visit every city in his territory exactly once and then
return to the starting point.Given,the cost of travel between all cities,how should
he plan his itinerary for minimum total cost of the entire tour?
• The TSP is a problem in combinatorial optimization and arises in numerous applica
tions.
• Representation:integer vector or binary string?
8
– In a binary representation of a n cities TSP,each city should be encoded as a
string of ⌈log
2
n⌉ bits;a chromosome is a string of n⌈log
2
n⌉ bits.
– A mutation or crossover can result in a sequence of cities,which is not a tour:
we can get the same city twice in a sequence.
– For a TSP with 20 cities (where we need 5 bits to represent a cities),some 5bit
sequences (for example,10101) do not correspond to any city.
– If we use mutation and crossover operators as deﬁned earlier,we would need some
sort of a “repair algorithm”;such an algorithm would “repair” a chromosome,
moving it back into the search space.
• Integer representation:
– A vector v =< i
1
,i
2
,...,i
n
> represents a tour:from i
1
to i
2
,etc.,from i
n−1
to
i
n
and back to i
1
(v is a permutation of <12...n>).
– Then,given the cost of travel between all cities,we can easily calculate the total
cost of the entire tour.
– However,the crossover will produce the invalid strings (the same as the previous
case).
– Modiﬁed crossover:
∗ Given two parents,builds oﬀspring by choosing a subsequence of a tour from
one parent and preserving the relative order of cities from the other parent.
∗ For example,if the parents are
< 1,2,3,4,5,6,7,8,9,10,11,12 >
< 7,3,1,11,4,12,5,2,10,9,6,8 >
and the chosen part is < 4,5,6,7 >,then the resulting oﬀspring is
< 1,11,12,4,5,6,7,2,10,9,8,3 >
∗ The oﬀspring bears a structural relationship to both parents.The roles of
the parents can then be reversed in constructing a second oﬀspring.
9
Chapter 2:The Mathematical Foundations of Genetic Algorithms
1.Concept:
• The traditional theory of GAs (ﬁrst formulated in Holland 1975) assumes that,at a
very general level of description,GAs work by discovering,emphasizing,and recom
bining good “building blocks” of solutions in a highly parallel fashion.
• The idea here is that good solutions tend to be made up of good building blocks —
combinations of bits values that confer higher ﬁtness on the strings in which they are
present.
2.Schemata:
• Holland (1975) introduced the notion of schemas (or schemata) to formalize the in
formal notion of “building blocks”.
• A schema is a set of bit strings that can be described by a template made up of ones,
zeros,and asterisks,the asterisks representing wild cards (or “don’t care”).
• Consider strings to be constructed over the binary alphabet V = {0,1}.The lbit
string may be represented symbolically as A = a
1
a
2
a
3
....a
l
,where a
i
∈ V.
• Consider V
′
= {0,1,∗},where * means “don’t care”.
• If H = ∗11 ∗ 0 ∗ ∗,then A = 0111000 is an example of the schema H because A
matches schema positions at the ﬁxed positions 2,3,and 5.
• In general,for alphabets of cardinality k,there are (k +1)
l
schemata.
• Order:
The order of a schema H,denoted by o(H),is simply the number of ﬁxed positions (in
a binary alphabet,the number of 1’s and 0’s) present in the template.For example,
the order of the schema 011*1** is 4 (symbolically,o(011 ∗ 1 ∗ ∗) = 4).
• Deﬁning length:
The deﬁning length of a schema H,denoted by δ(H),is simply the distance between
the ﬁrst and last speciﬁc string position.For example,the schema 011*1** has
deﬁning length δ = 4.The schema 0****** has deﬁning length δ = 0.
3.The eﬀect of reproduction on the expected number of schemata:
• Suppose at a given time step t,there are m examples of a particular schema H
contained within the population A(t) where we write m= m(H,t).
• During reproduction,a string is copied according to its ﬁtness (with probability
p
i
= f
i
/Σf
j
).
• After picking a nonoverlapping population of size n with replacement from the pop
ulation A(t),we expected to have m(H,t + 1) representatives of the schema H in
the population at time t + 1,m(H,t + 1) = m(H,t)nf(H)/Σf
j
,where f(H) is the
average ﬁtness of the strings representing schema H at time t.
• Let
f = Σf
j
/n,then m(H,t +1) = m(H,t)f(H)/
f
• A particular schema grows as the ratio of the average ﬁtness of the schema to the
average ﬁtness of the population.
• Suppose a particular schema H remains above average an amount c
f with c a con
stant,then m(H,t +1) = m(H,t)(
f +c
f)/
f = (1 +c)m(H,t)
10
• Starting at t = 0 and assuming a stationary value c,then we have m(H,t) =
m(H,0)(1 +c)
t
• Reproduction allocates exponentially increasing (decreasing) numbers of trials to
above (below) average schemata.
4.The inﬂuence of crossover on the schemata:
• Consider two schemata H
1
and H
2
,H
1
= ∗1 ∗ ∗ ∗ ∗0,H
2
= ∗ ∗ ∗10 ∗ ∗.Then,it is
clear that schema H
1
is less likely to survive crossover than schema H
2
.
• The survival probability under simple crossover is p
s
= 1 −δ(H)/(l −1).
• If crossover is itself performed by random choice,say with probability p
c
,at a par
ticular mating,the survival probability may be given by p
s
≥ 1 −p
c
δ(H)/(l −1).
• The combined eﬀect of reproduction and crossover:
Assuming independence of the reproduction and crossover operators,then we have
m(H,t +1) ≥ m(H,t)
f(H)
f
1 −p
c
δ(H)
l −1
• Therefore,the schema H grows or decays depending on two things:(1) whether the
schema is above or below the population average,and (2) whether the schema has
relatively short or long deﬁning length.
5.The inﬂuence of mutation on the schemata:
• In order for schema H to survive,all the speciﬁed positions must themselves survive.
• A single allele survives with probability (1 −p
m
).
• A particular schema survives when each of the o(H) ﬁxed positions within the schema
survives.Therefore,the probability of surviving mutation is (1 −p
m
)
o(H)
.
• When p
m
<< 1,the probability may be approximated by 1 −o(H)p
m
.
• Therefore,a particular schema H receives an expected number of copies in the next
generation under reproduction,crossover,and mutation as given by the following
equation (ignoring small crossproduct terms):
m(H,t +1) ≥ m(H,t)
f(H)
f
1 −p
c
δ(H)
l −1
−o(H)p
m
6.Schema Theorem (The Fundamental Theorem of Genetic Algorithms):
Short,loworder,aboveaverage schemata receive exponentially increasing trials in subse
quent generations.
7.An immediate result of this theorem is that GAs explore the search space by short,low
order schemata which,subsequently,are used for information exchange during crossover.
8.Building Block Hypothesis:
A genetic algorithm seeks nearoptimal performance through the juxtaposition of short,
lowerorder,highperformance schemata,called the building blocks.
• During the last twenty years,many GAs applications were developed which supported
the building block hypothesis in many diﬀerent problem domains.
11
• Although some research has been done to prove this hypothesis,for most nontrivial
applications we rely mostly on empirical results.
• Bethke (1981) has generated a number of test cases that provable misleading the
simple threeoperator genetic algorithm (we call these codingfunction combinations
GAdeceptive).
• It suggests that functions and codings that are GAdeceptive tend to contain isolated
optima:the best points tend to be surrounded by the worst.
• Practically speaking,many of the functions encountered in the real world do not have
this needleinthehaystack quality;there is usually some regularity in the function
coding combination —much like the regularity of one or more schemata —that may
be exploited by the recombination of building blocks.
• If the building blocks are misleading due to the coding used or function itself,the
problem may require long waiting times to arrive at near optimal solutions.
• Example:Violation of Building Block Hypothesis
We would like to have short,loworder building blocks lead to incorrect (suboptimal)
longer,higherorder building blocks.
– Suppose we have a set of four order2 schemata over two deﬁning positions,each
schema associated with a ﬁtness value as follows:
* * * 0 * * * * * * 0 f
00
* * * 0 * * * * * * 1 f
01
* * * 1 * * * * * * 0 f
10
* * * 1 * * * * * * 1 f
11
k ← δ(H) → k
The ﬁtness values are schema averages,assumed to be constant with no variance
(this last restriction may be lifted without changing our conclusions as we only
consider expected performance).
– Assume that f
11
is the global optimum:f
11
> f
00
,f
11
> f
01
,f
11
> f
10
.
– We want a problem where one or both of the suboptimal,order1 schemata are
better than the optimal,order1 schemata.Mathematically,we want one or both
of the following conditions to hold:
f(0∗) > f(1∗)
f(∗0) > f(∗1)
In the expressions we have dropped consideration of all alleles other than the two
deﬁning positions,and the ﬁtness expression implies an average over all strings
contained within the speciﬁc similarity subset.
– Thus we would like the following two expressions to hold:
f(00) +f(01)
2
>
f(10) +f(11)
2
f(00) +f(10)
2
>
f(01) +f(11)
2
12
Figure 2:Sketch of Type I,minimal deceptive problem (MDP) f
01
> f
00
.
– Unfortunately,both expressions cannot hold simultaneously in the twobit prob
lem,and without loss of generality we assume that ﬁrst expression is true.
– Thus,the deceptive twobit problem is speciﬁed by the globality condition (f
11
is the best) and one deception condition (we choose f(0∗) > f(1∗))
– To put the problem into closer perspective,we normalize all ﬁtness values with
respect to the ﬁtness of the complement of the global optimum as follows:
r =
f
11
f
00
,c =
f
01
f
00
,c
′
=
f
10
f
00
.
– We may rewrite the globality condition in normalized form:
r > c,r > 1,r > c
′
.
– We may also rewrite the deceptive condition in normalized form:
r < 1 +c −c
′
Because based on the deception condition,f(0∗) > f(1∗),we have f
00
> f
10
and
f
01
> f
11
,and then f
00
+f
01
> f
10
+f
11
,=⇒
=⇒
f
00
+f
01
f
00
>
f
10
+f
11
f
00
,=⇒ 1 +c > c
′
+r,=⇒ 1 +c −c
′
> r.
– From these conditions,we may conclude a number of interesting facts:
c
′
< 1,c
′
< c.
– From these,we recognize that there are two types of deceptive twobit problem:
Type I:f
01
> f
00
(c > 1).
Type II:f
00
≥ f
01
(c ≤ 1).
Figure 2 and 3 are representative sketches of these problems where the ﬁtness is
graphed as a function of two boolean variables.
– Both ﬁgures are representative sketches of these problems where the ﬁtness is
graphed as a function of two boolean variables.Both cases are deceptive,and it
may be shown that neither case can be expressed as a linear combination of the
individual allele values:
f(x
1
x
2
) = b +
2
X
i=1
a
i
x
i
13
Figure 3:Sketch of Type II,minimal deceptive problem (MDP) f
00
> f
01
.
– It may be proved that no onebit problemcan be deceptive,the deceptive,twobit
problem is the smallest possible deceptive problem:it is the minimal,deceptive
problem (MDP).
• The approach proposed to deal with this problem:
– Assumes prior knowledge of the object function to code it in an appropriate way
(to get “tight” building blocks).For example,prior knowledge about the objec
tive function,and consequently about the deception,might result in a diﬀerent
coding,where the bit required to optimize the function are adjacent.
– Employing a new operator:inversion:
It selects two points within a string and invert the order of bits between selected
points,but remembering the bit’s “meaning’.This means that we have to identify
bits in the strings:we do so by keeping bits together with a record of their original
positions.
∗ In Goldberg (1989a),
“Put another way,inversion is to orderings what mutation is to alleles:both
ﬁght the good ﬁght against searchstopping lack of diversity,but neither is
suﬃciently powerful to search for good structure,allelic or permutational,on
its own when good structures require epistatic interaction of the individual
parts.”
– Messy genetic algorithms (mGA):Goldberg (1989a),Goldberg,Korb,and Deb
(1989),and Goldberg,Deb,and Korb (1991):
The term “messy GA ” is meant to be contrasted with the standard “neat”
ﬁxedlength,ﬁxedpopulationsize GAs.The goal of messy GAs is to improve
the GA’s functionoptimization performance by explicitly building up increasing
longer,highly ﬁt strings from welltested shorter building blocks.The general
idea was biologically motivated:“After all,nature did not start with strings of
length 5.9×10
9
or even of length two million and try to make man.Instead,
simple life forms gave way to more complex life forms,with the building blocks
learned at earlier times used and reused to good eﬀect along way.” (Goldberg,
Korb,and Deb,1989,p.500)
14
Chapter 3:Computer Implementation of a Genetic Algorithm
1.Basic Elements for Running Genetic Algorithms:
Population size
N
Chromosome length
l
Maximum number of generation
G
Probability of reproduction
p
r
Probability of crossover
p
c
Probability of mutation
p
m
Selection mechanism
∗
Proportional selection
Fitness function
∗
Sum of square error
2.Implementation of The Genetic Operators:
• Sequential:
reproduction =⇒
p
c
crossover =⇒
p
m
mutation.
• Probabilistic:
reproduction,crossover,and mutation are performed based on the probability.
p
r
:the probability of the selected chromosome being reproduction
p
c
:the probability of the selected chromosome being crossed over
p
m
′:the probability of the selected chromosome being mutated
p
m
:the probability of each bit being mutated for the chromosome going to be mutated
3.How to determine the appropriate values of these parameters?
• There is no objective way to determine the value of these parameters.
• In De Jong’s (1975) study of genetic algorithms in function optimization,a series
of parametric studies across a ﬁvefunction suite of problems suggested that good
GA performance requires the choice of a high crossover probability,a low mutation
probability (inversely proportional to the population size),and a moderate population
size.For example,p
c
=0.6,p
m
=0.0333,N=30.
• However,it is not always true when we apply GAs in other research areas,for example,
social adaptive systems.
4.Mapping Objective Functions to Fitness Form
• In many problems,the objective is more naturally stated as the minimization of some
cost function g(x) rather than the maximization of some utility or proﬁt function u(x).
• Even if the problem is naturally stated in maximization form,this alone does not
guarantee that the utility function will be nonnegative for all x as we require in
ﬁtness function.
• As a result,it is often necessary to map the underlying natural object function to a
ﬁtness function form through one or more mappings.
• The duality of cost minimization and proﬁt maximization is well known.In normal
operations research work,to transform a minimization to a maximization problem
15
we simply multiply the cost function by 1.However,we usually need the following
transform function to guarantee the positive ﬁtness values,
f(x) =
(
C
max
−g(x),when g(x) < C
max
,
0,otherwise.
• Of course,there are a variety of ways to choose the coeﬃcient C
max
.C
max
may be
taken as an input coeﬃcient,as the largest g value observed thus far,as the largest
value in the current population,or the largest of the last k generations.Perhaps more
appropriately,C
max
should vary depending on the population variance.
• When the natural objective function formulation is a proﬁt or utility function,we
may still have a problem with negative utility function u(x) values.Therefore,we
simply transform ﬁtness according to the equation:
f(x) =
(
u(x) +C
min
,when u(x) +C
min
> 0,
0,otherwise.
• We may choose C
min
as an input coeﬃcient,as the absolute value of the worst u
value in the current or last k generations,or as a function of the population variance.
5.Fitness Scaling
• Problem 1:
– Regulation of the number of copies is especially important in small population
genetic algorithms.
– At the start of GA runs it is common to have a few extraordinary individuals in
a population of mediocre colleagues.
– If let to the normal selection rule (p
i
= f
i
/Σf
j
),the extraordinary individu
als would take over a signiﬁcant proportion of the ﬁnite population in a single
generation,and this is undesirable,a leading cause of premature convergence.
• Problem 2:
– Late in a run,there may still be signiﬁcant diversity within the population;
however,the population average ﬁtness may be close to the population best
ﬁtness.If this situation is left alone,average members and best members get
nearly the same number of copies in future generations,and the survival of the
ﬁttest necessary for improvement becomes a random walk among the mediocre.
• In both cases,at the beginning of the run and as the run matures,ﬁtness scaling can
help.
• One useful scaling procedure is linear scaling:f
′
= af +b.
– f:raw ﬁtness.
– f
′
:scaled ﬁtness.
– a,b:constant coeﬃcients.
• In this way,simple scaling helps prevent the early domination of extraordinary indi
viduals,while it later on encourages a healthy competition among near equals.
6.Codings
16
• The basic principles for choosing a GA codings:
– Principle of meaningful building blocks:
The user should select a coding so that short,loworder schemata are relevant
to the underlying problem and relatively unrelated to schemata over other ﬁxed
positions.
– Principle of minimal alphabets:
The user should select the smallest alphabet that permits a natural expression
of the problem.
• Illustration:
Comparison of Binary and Nonbinary String Population
Binary String Value X Nonbinary String Fitness
01101 13 N 169
11000 24 Y 576
01000 8 I 64
10011 19 T 361
Binary and Nonbinary Coding Correspondence
Binary Nonbinary
00000 A
00001 B
..
..
..
11001 Z
11010 1
11011 2
..
..
..
11111 6
• What is the inﬂuence for the diﬀerent codings?
– The diﬀerent alphabet cardinalities require diﬀerent string length.For equality
of the number of points in each space,we require 2
l
= k
l
′
,where l is the binary
code string length and l
′
is the nonbinary code string length.
– The number of schemata for each coding may then be calculated using the re
spective string length:3
l
in the binary case and (k +1)
l
′
in the nonbinary case.
– It is easy to show that the binary alphabet oﬀers the maximum number of
schemata per bit of information of any coding.
– Since these similarities are the essence of our search,when we design a code we
should maximize the number of them available for the GA to exploit.
17
7.Encoding a Problem for a Genetic Algorithm:
Most GA applications use ﬁxlength,ﬁxorder bit strings to encode candidate solutions.
However,in recent years,there have been many experiments with other kinds of encodings.
• Binary Encodings
– The most common encodings for a number of reasons:
∗ In the early work,Holland and his students concentrated on such encodings
and GA practice has tended to follow this lead.
∗ Much of the existing GA theory is based on the assumption of ﬁxlength,
ﬁxorder binary encodings.
∗ Much of that theory can be extended to apply to nonbinary encodings,but
such extensions are not as well developed as the original theory.
∗ Heuristics about appropriate parameter settings (e.g.,for crossover and muta
tion rates) have generally been developed in the context of binary encodings.
– However,binary encodings are unnatural and unwieldy for many problems (e.g.,
evolving weights for neural networks or evolving condition sets in the manner of
Meyer and Packard),and they are prone to rather arbitrary orderings.
• ManyCharacter and RealValued Encodings
– For many applications,it is most natural to use an alphabet of many characters
or real numbers to form chromosomes.
– Examples include Kitano’s manycharacter representation for graphgeneration
grammars,Meyer and Packard’s realvalued representation for condition sets,
Montana and Davis’s realvalued representation for neuralnetwork weights,and
SchultzKremer’s realvalued representation for torsion angles in proteins.
– Holland’s schematacounting argument seems to imply that GAs should exhibit
worse performance on multiplecharacter encodings than on binary encodings.
However,this has been questioned by some.(e.g.,Antonisse,1989)
– Several empirical comparisons between binary encodings and multiplecharacter
or realvalued encodings have shown better performance for the latter.(e.g.,
Janikow and Michalewicz,1991;Wright,1991)
– But the performance depends very much on the problem and the details of the
GA being used,and at present there are no rigorous guidelines for predicting
which encoding will work best.
• Adapting the Encoding
– Choosing a ﬁxed encoding ahead of time presents a paradox to the potential GA
user:for any problemthat one would want to use a GA,one doesn’t know enough
about the problem ahead of time to come up with the best encoding for the GA.
– In fact,coming up with the best encoding is almost tantamount to solving the
problem itself!
– We either have no idea how best to order the bits ahead of time for this problem.
This is known in the GA literature as the “linkage problem” — one wants to
have functionally related loci be more likely to stay together on the string under
crossover,but it is not clear how this is to be done without knowing ahead of
time which loci are important in useful schemata.
– A second reason for adapting the encoding is that a ﬁxlength representation
limits the complexity of the candidate solutions.
18
– Messy GAs
8.Selection Methods:
The purpose of selection is to emphasize the ﬁtter individuals in the population in hopes
that their oﬀspring will in turn have even higher ﬁtness.Selection has to be balanced
with variation fromcrossover and mutation (the “exploitation/exploration balance”).Too
strong selection means that suboptimal highly ﬁt individuals will take over the population,
reducing the diversity needed for further change and progress.Tooweak selection will
result in tooslow evolution.
• FitnessProportionate Selection with “Roulette Wheel”.
– Under ﬁtnessproportionate selection,they and their descendants will multiply
quickly in the population,in eﬀect preventing the GA from doing any further
exploration.
– This is known as “premature convergence”.
– In other words,ﬁtnessproportionate selection often puts too much emphasis on
“exploitation” of highly ﬁt strings at the expense of exploration of other regions
of the search space.
• Sigma Scaling
– Mapping “raw” ﬁtness values to expected values so as to make the GA less
susceptible to premature convergence.
– Forrest (1985) proposed sigma scaling,which keep the selection pressure relatively
constant over the course of the run rather than depending on the ﬁtness variances
in the population.
– Under sigma scaling,an individual’s expected value is a function of its ﬁtness,
the population mean,and the population standard deviation.For example,
ExpV al(i,t) =
1 +
f(i)−
f(t)
2σ(t)
,if σ(t) 6= 0,
1.0,if σ(t) = 0.
where ExpV al(i,t) is the expected value of individual i at time t,f(i) is the
ﬁtness of i,
f(t) is the mean ﬁtness of the population at time t,and σ(t) is the
standard deviation of the population ﬁtnesses at time t.
– At the beginning of a run,when the standard deviation of ﬁtnesses is typical
high,the ﬁtter individuals will not be many standard deviations above the mean,
and so they will not be allocated the lion’s share of oﬀspring.
– Likewise,later in the run,when the population is typically more converged and
the standard deviation is typically lower,the ﬁtter individuals will stand out
more,allowing evolution to continue.
• Elitism
– “Elitism”,ﬁrst introduced by Kenneth De Jong (1975),is an addition to many se
lection methods that forces the GA to retain some number of the best individuals
at each generation.
– Such individuals can be lost if they are not selected to reproduce or if they are
destroyed by crossover or mutation.
– Many researchers have found that elitism signiﬁcantly improves the GA’s perfor
mance.
19
• Boltzmann Selection
– Sigma scaling keeps the selection pressure more constant over a run.But often
diﬀerent amounts of selection pressure are needed at diﬀerent times in a run –
for example,early on it might be good to be liberal,allowing less ﬁt individuals
to reproduce at close to the rate of ﬁtter individuals,and having selection occur
slowly while maintaining a lot of variation in the population.
– Later it might be good to have selection be stronger in order to strongly emphasize
highly ﬁt individuals,assuming that the early diversity with slow selection has
allowed the population to ﬁnd the right part of the search space.
– A typical implementation is to assign to each individual i an expected value,
ExpV al(i,t) =
e
f(i)/T
[e
f(i)/T
]
t
.
where T is temperature and [ ]
t
denotes the average over the population at time
t.
– Experimenting with this formula will show that,as T decreases,the diﬀerence in
ExpV al(i,t) between high and low ﬁtnesses increases.
– The desire is to have this happen gradually over the course of the search,so
temperature is gradually decreased according to a predeﬁned schedule.
• Rank Selection
– Rank selection is an alternate method whose purpose is also to prevent tooquick
convergence.
– In the version proposed by Baker (1985),the individuals in the population are
ranked according to ﬁtness,and the expected value of each individual depends
on its rank rather than on its absolute ﬁtness.
– There is no need to scale ﬁtnesses in this case,since absolute diﬀerences in ﬁtness
are obscured.
– This discarding of absolute ﬁtness information can have advantages (using abso
lute ﬁtness can lead to convergence problems) and disadvantages (in some cases
it might be important to know that one individual is far ﬁtter than its nearest
competitor).
– Ranking avoids giving the far largest share of oﬀspring to a small group of highly
ﬁt individuals,and thus reduces the selection pressure when the ﬁtness variance
is high.
– It also keeps up selection pressure when the ﬁtness variance is low.
– The linear ranking method proposed by Baker is as follows:
∗ Each individual is the population is ranked in increasing order of ﬁtness,from
1 to N.
∗ The user chooses the expected value Max of the individual with rank N,
with Max ≥ 0.
∗ The expected value of each individual i in the population at time t is given
by
ExpV al(i,t) = Min +(Max −Min)
rank(i,t) −1
N −1
where Min is the expected value of the individual with rank 1.
20
∗ Given the constraints Max ≥ 0 and Σ
i
ExpV al(i,t) = N (since population
size stays constant from generation to generation),it is required that 1 ≤
Max ≤ 2 and Min = 2 −Max.
– Rank selection has a possible disadvantage:slowing down selection pressure
means that GA will in some cases be slower in ﬁnding highly ﬁt individuals.
– However,in many cases the increased preservation of diversity that results from
ranking leads to more successful search than the quick convergence that can result
from ﬁtness proportionate selection.
– A variation of rank selection with elitism was used by Meyer and Packard for
evolving condition sets,Mitchell and his colleagues used a similar scheme for
evolving cellular automata.In those examples the population was rank by ﬁtness
and the top E strings were selected to be parents.The N −E oﬀsprings were
merged with the E parents to create the next population.This is a form of
the socalled (µ + λ) strategy used in the evolving strategies community.This
method can be useful in cases where the ﬁtness function is noisy (i.e.,is a random
variable);the best individuals are retained so that they can be tested again and
thus,over time,gain increasingly reliable ﬁtness estimates.
• Tournament Selection
– Tournament selection is similar to rank selection in terms of selection pressure,
but it is computationally more eﬃcient and more amenable to parallel implemen
tation.
– Two individuals are chosen at random from the population.
– A random number r is then chosen between 0 and 1.
– If r < k (where k is a parameter,for example 0.75),the ﬁtter of the two individ
uals is selected to be a parent;otherwise the less ﬁt individual is selected.
– The two are then returned to the original population and can be selected again.
– A more detailed description,please refer to Goldberg and Deb (1991).
• SteadyState Selection
– Most GAs described in the literature have been “generational” — at each gen
eration the new population consists entirely of oﬀspring formed by parents in
the previous generation (though some of these oﬀspring may be identical to their
parents).
– In some schemes,such as elitist schemes,successive generations overlap to some
degree —some portion of the previous generation is retained in the new genera
tion.
– The fraction of newindividuals at each generation has been called the “generation
gap”.(De Jong,1975)
– In steadystate selection,only a few individuals are replaced in each generation:
usually a small number of the least ﬁt individuals are replaced by oﬀspring re
sulting from crossover and mutation of the ﬁttest individuals.
– Steadystate GAs are often used in evolving rulebased systems (e.g.,classiﬁer
systems;see Holland 1986) in which incremental learning (and remembering what
has already been learned) is important and in which members of the population
collectively (rather than individually) solve the problem at hand.
21
• For more technical comparisons of diﬀerent selection methods,see Goldberg and Deb
(1991),B¨ack and Hoﬀmeister (1991),de la Maza and Tidor (1993),and Hancock
(1994).
9.Advanced Topic in Crossover:
• The usefulness of crossover is to recombine building blocks (schemata) on diﬀerent
strings.
• Singlepoint crossover has some shortcomings.It cannot combine all possible schemata.
For example,it cannot in general combine instances 11*****1 and ****11** to form
an instance of 11**11*1.
• Likewise,schemata with long deﬁning lengths are likely to be destroyed under single
point crossover.The schemata that can be created or destroyed by a crossover depend
strongly on the location of the bits in the chromosome.
• Singlepoint crossover assumes that short,loworder schemata are the functional
building blocks of strings,but one generally does not know in advance what ordering
of bits will group functionally related bits together.
• Eshelman,Caruana,and Schaﬀer (1989) pointed out that there may not be any way
to put all functionally related bits close together on a string,since particular bits
might be crucial in more than one schema.
• They pointed out further that the tendency of singlepoint crossover to keep short
schemata intact can lead to the preservation of hitchhikers — bits that are not part
of a desired schema but which,by being close on the string,hitchhike along with the
beneﬁcial schema as it reproduces.
• Many people have also noted that singlepoint crossover treats some loci preferen
tially:the segments exchanged between the two parents always contain the endpoints
of the strings.
• Twopoint crossover.
• Parameterized uniform crossover:
An exchange happens at each bit position with probability p (typically 0.5 ≤ p ≤0.8).
However,this lack of positional bias can prevent coadapted alleles from ever forming
in the population,since parameterized uniform crossover can be highly disruptive of
any schema.
• Given these arguments,which one should we use?There is no simple answer.The
success or failure of a particular crossover operator depends in complicated ways on
the particular ﬁtness function,encoding,and other details of the GA.It is still a very
important open problem to fully understand these interactions.
• It is common in recent GA applications to use either twopoint crossover or parame
terized uniform crossover with p ≈ 0.7 −0.8.
10.Advanced Topic in Mutation:
• A common view in GA community,dating back to Holland (1975),is that crossover
is the major instrument of variation and innovation in GAs,with mutation insuring
the population against permanent ﬁxation at any particular locus and thus playing
more of a background role.
22
• However,the appreciation of the role of mutation is growing as the GA community
attempts to understand how GAs solve complex problems.
• Spears (1993) formally veriﬁed the intuitive idea that,while mutation and crossover
have the same ability for “disruption” of existing schemata,crossover is a more robust
“constructor” of new schemata.
• M¨uhlenbein (1992,p.15),on the other hand,argues that in many cases a hillclimbing
strategy will work better than a GA with crossover and that the “power of mutation
has been underestimated in traditional genetic algorithms”.
23
Chapter 4:Introduction to GeneticBased Machine Learning
1.This topic is based on Goldberg (1989).
2.Introduction
• A classiﬁer system is a learning system that learns syntactically simple string rules
(called classiﬁers) to guild its performance in an arbitrary environment.
• A classiﬁer system consists of three main components:
– Rule and message system
– Apportionment of credit system
– Genetic algorithms
• The rule and message system of a classiﬁer system is a special kind of production
system.A production system is a computational scheme that uses rules as its only
algorithmic device.The rules are generally of the following form:
if < condition > then < action >.
• At ﬁrst glance,the restriction to such a simple device for the representation of knowl
edge might seem too constraining.Yet it has been shown that production systems
are computationally complete (Minsky,1967).A single rule or small set of rules can
represent a complex set of thoughts compactly.
• Traditional rulebased systems have been less frequently suggested in situations in
need of learning.One of the main obstacles to learning has been complex rule syntax.
• Classiﬁer systems depart from the mainstream by restricting a rule to a ﬁxlength
representation.This restriction has two beneﬁts.First,all strings under the per
missible alphabet are syntactically meaningful.Second,a ﬁxed string representation
permits string operators of the genetic kind.This leaves the door propped open,
ready for a genetic algorithm search of the space of permissible rules.
3.Rule and Message System:
• A schematic depicting the rule and message system,the apportionment of credit
system,and genetic algorithm is shown in Figure 1.
• The rule and message system form the computational backbone.Information ﬂows
from the environment through the detectors — the classiﬁer system’s eyes and ears
— where it is decoded to one or more ﬁnite length messages.These environmental
messages are posted to a ﬁnitelength message list where the messages may then
activate string rules called classiﬁers.
• When activated,a classiﬁer posts a message to the message list.These messages may
then invoke other classiﬁers or they may cause an action to be taken through the
system’s action triggers called eﬀectors.
• In this way classiﬁers combine environmental cues and internal thoughts to determine
what the system should do and think next.
• A message within a classiﬁer system is simple a ﬁnitelength string over some ﬁnite
alphabet.If we limit ourselves to a binary alphabet we obtain the following deﬁnition:
< message >::= {0,1}
l
where l means the length of the string.
24
Figure 1:A Learning Classiﬁer System Interacts with Its Environment
• The condition is a simple pattern recognition device where * is added to the underlying
alphabet:
< condition >::= {0,1,∗}
l
• Therefore,a classiﬁer is a production rule with excruciatingly simply syntax:
< classifier >::=< condition >:< message >
• Once a classiﬁer’s condition is matched,that classiﬁer becomes a candidate to post its
message to the message list on the next time step.Whether the candidate classiﬁer
posts its message is determined by the outcome of an activation auction,which in
turn depends on the evaluation of a classiﬁer’s value or weighting.
• An example:
Suppose we have a classiﬁer store consisting the classiﬁers shown in the following
table,
Four Classiﬁers
Index Classiﬁer
1 01**:0000
2 00*0:1100
3 11**:1000
4 **00:0001
– At the ﬁrst time step,an environment message 0111 appears on the message list.
– This message matches classiﬁer 1,which then posts its message,0000.
– This message matches rules 2 and 4,which in turn post their messages (1100 and
0001).
25
– Message 1100 then matches classiﬁer 3 and 4.Thereafter the message sent by
classiﬁer 3,1000 then matches classiﬁer 4 and the process terminates.
4.Apportionment of Credit Algorithm:The Bucket Brigade
• Many classiﬁer systems attempt to rank or rate individual classiﬁers according to a
classiﬁer’s role in achieving reward from the environment.
• The most prevalent method incorporates what Holland has called a bucket brigade
algorithm.The bucket brigade may most easily be viewed as an information economy
where the right to trade information is bought and sold by classiﬁers.
• This service economy contains two main components:an auction and a clearinghouse.
When classiﬁer are matched they do not directly post their message.Instead,having
its condition matched qualiﬁes a classiﬁer to participate in an activation auction.
• To participate in the auction,each classiﬁer maintains a record of its net worth,called
its strength S.Each matched classiﬁer makes a bid B proportional to its strength.In
this way,rules that highly ﬁt are given preference over other rules.
• The auction permits appropriate classiﬁers to be selected to post their messages.
Once a classiﬁer is selected for activation,it must clear its payment through the
clearinghouse,paying its bid to other classiﬁers for matching messages rendered.
• A matched and activated classiﬁer sends its bid B to those classiﬁers responsible for
sending the messages that matched the bidding classiﬁer’s condition.
• The bid payment is divided in some manner among the matching classiﬁers.The
division of payoﬀ among contributing classiﬁers helps ensure the formation of an
appropriately sized subpopulation of rules.Thus diﬀerent types of rules can cover
diﬀerent types of behavioral requirements without undue interspecies competition.
• In a rulelearning system of any consequence,we cannot search for one master rule.
We must instead search for a coadapted set of rules that together cover a range of
behavior that provides ample payoﬀ to the learning system.
5.A detailed auction and payment scheme:
• Classiﬁers make bids (B
i
) during the auction.Winning classiﬁers turn over their bids
to the clearinghouse as payments (P
i
).A classiﬁer may also have receipts R
i
from its
previous messagesending activity or from environment reward.In addition to bids
and receipts,a classiﬁer may be subject to one or more taxes T
i
.Taken together,
we may write an equation governing the deletion or accretion of the ith classiﬁer’s
strength as follows:
S
i
(t +1) = S
i
(t) −P
i
(t) −T
i
(t) +R
i
(t)
• A classiﬁer bids in proportion to its strength:
B
i
= C
bid
S
i
where C
bid
is the bid coeﬃcient,S is strength,and i is classiﬁer index.
• We hold auction in the presence of random noise.We calculate an eﬀective bid (EB):
EB
i
= B
i
+N(σ
bid
)
where the noise N is a function of the speciﬁed bidding noise standard deviation σ
bid
.
26
• Each classiﬁer is taxed to prevent freeloading,we simply collect a tax proportional
to the classiﬁer’s strength:
T
i
= C
tax
S
i
• Therefore,the apportionment of credit algorithmfor an active classiﬁer can be rewrote
as:
S
i
(t +1) = S
i
(t) −C
bid
S(t) −C
tax
S(t) +R
i
(t)
• Let K = C
bid
+C
tax
,and 0 ≤ K ≤ 1.We have
S(n) = (1 −K)
n
S(0) +Σ
n−1
j=0
R(j)(1 −K)
n−j−1
• To investigate the eﬀect of this mechanism further,we examine the steadystate
response.If the process continues indeﬁnitely with a constant receipt R(t) = R
ss
,we
obtain the steadystate strength by setting S(t +1) = S(t) = S
ss
.Then we have
S
ss
= R
ss
/K.
• The steady bid may be derived as follows:
B
ss
=
C
bid
K
R
ss
=
C
bid
C
bid
+C
tax
R
ss
• Since C
tax
is usually small with respect to the bid coeﬃcient,the steady bid value
usually approaches the steady receipt value,B
ss
≈ R
ss
.In other words,for steady
receipts,the bid value approaches the receipt.
6.An example:See Figure 2.
7.Genetic Algorithm
• The bucket brigade provides a clean procedure for evaluating rules and deciding
among competing alternatives.Yet we still must devise a way of injecting new,
possible better rules into the system.This is precisely where the genetic algorithm
steps in.
• However,we must be a little less cavalier about wanton replacement of the entire
population,and we must pay more attention to who replaces whom.
• Here,we deﬁne a quantity called the selection proportion where we replace that
proportion of the population at a given genetic algorithm invocation.We also deﬁne
a quantity called the GA period,T
GA
,that speciﬁes the number of time steps (rule
and message cycles) between GA calls.This period may be treated deterministically
or stochastically..Additionally,the invocation of genetic algorithm learning may be
conditioned on particular events such as lack of a match or poor performance.
27
Figure 2:A Simple Classiﬁer System by Hand–Matching and Payments
28
References
[1] Antonisse,J.(1989),“A New Interpretation of Schema Notation that Overturns the Bi
nary Encoding Constraints,” In J.D.Schaﬀer (ed.),Proceedings of the Third International
Conference on Genetic Algorithms.Morgan Kaufmann.
[2] Baker,J.E.(1985),“Adaptive Selection Methods for Genetic Algorithms,” in J.J.Grefen
stette (ed.),Proceedings of the First International Conference on Genetic Algorithms and
Their Appliactions.Erlbaum.
[3] B¨ack T.and F.Hoﬀmeister (1991),“Extended Selection Mechanisms in Genetic Algo
rithms,” in R.k.Belew and L.B.Booker (eds.),Proceedings of the Fourth International
Conference on Genetic Algorithms.Morgan Kaufmann.
[4] Bethke,A.D.(1981),“Genetic Algorithms as Function Optimizers,” (Doctoral Dissertation,
University of Michigan).Dissertation Abstracts International,41 (9),3503B.(University
Microﬁlms No.8106101).
[5] De Jong,K.A.(1975),An Analysisof the Behavior of a Class of Genetic Adaptive Systems,”
(Doctoral disseration,University of Michigan).Disseratation Abstracts International,36
(10),5140B.(University Microﬁlms No.769381)
[6] de la Maza,M.and B.Tidor (1993),“An Analysis of Selection Procedures with Particular
Attention Paid to proportional and Boltzmann Selection,” in S.Forrest (ed.),Proceedings
of the Fifth International Conference on Genetic Algorithms.Morgan Kaufmann.
[7] Eshelman,L.J.,R.A.Caruana,and J.D.Schaﬀer (1989),“Biases in the Crossover Land
scape,” in J.D.Schaﬀer (ed.),Proceedings of the Third International Conference on Genetic
Algorithms,Morgan Kaufmann.
[8] Fogel,L.J.,A.J.Owens,and M.J.Walsh (1966),Artiﬁcial Intelligen0ce through Simulated
Evolution.Wiley.
[9] Forrest,S.(1985),“Scaling Fitness in the Genetic Algorithm,” in Documentation for
PRISONERS DILEMMA and NORMS ThatUse the Genetic Algorithm.Unpublished
manuscript.
[10] Goldberg.D.E.(1989),Genetic Algorithms in Search,Optimization,and Machine Learn
ing,Reading,MA:AddisonWesley.
[11] Goldberg.D.E.(1989a),“Messy Genetic Algorithms:Motivation,Analysis,and First
Results,” Complex Systems,Vol.3,pp.493530.
[12] Goldberg,D.E.,K.Deb (1991),“A Comparitive Analysis of Selection Schemes Used in
Genetic Algorithms,” in G.Rawlins (ed.),Foundations of Genetic Algorithms.Morgan
Kaufmann.
[13] Goldberg,B.Korb,and D.E.,K.Deb (1989),“Messy Genetic Algorithms:Motivation,
Analysis,and The First Results,” Complex System,Vol.3,pp.493530.
[14] Goldberg,D.E.,K.Deb,and B.Korb (1991),“Do not Worry,Be Messy,” Proceedings
of the Fourth International Conference on Genetic Algorithms,Belew,R.,and L.Booker
(eds.),Morgan Kaufmann Publishers,Los Altos,CA.pp.2430.
29
[15] Hancock,P.J.B.(1994),“An Empirical Comparison of Selection Methods in Evolutionary
Algorithms,” in T.C.Fogarty (ed.),Evolutionary Computing:AISB Workshop,Leeds,
U.K.,April 1994,Selected Papers.SpringerVerlag.
[16] Holland,J.H.(1975),Adaptation in Natural and Artiﬁcial Systems.University of Michigan
Press.(Second edition:MIT Press,1992.)
[17] Holland,J.H.(1986),“Escaping Brittleness:The Possibilities of GeneralPurpose Learning
Algorithms Applied to Parallel RuleBased Systems,” in R.S.Michalski,J.G.Carbonell,
and T.M.Mitchell (eds.),Machine learning II.Morgan Kaufmann.
[18] Janikow,C.Z.and Z.Michalewicz (1991),“An Experimental Comparison of Binary and
Floating Point Representations in Genetic Algorithms,” in R.K.Belew and L.B.Booker
(eds.),Proceedings of the Fourth International Conference on Genetic Algorithms.Morgan
Kaufmann.
[19] Michalewicz,Z.(1996),Genetic Algorithms + Data Structures = Evolution Programs.Third
edition.New York:SpringerVerlag.
[20] Mitchell,M.(1996),An Introduction to Genetic Algorithms.MIT Press.
[21] M¨uhlenbein,H.(1992),“How Genetic Algorithms Really Work:1.Mutation and Hill
climbing,” in R.M¨anner and B.Manderick (eds.),Parallel Problem Solving from Nature 2.
NorthHolland.
[22] Rechenberg,I.(1965),“Cybernetic Solution Path of an Experimental Problem,” Ministry
of Aviation,Royal Aircraft Establishment (U.K.)
[23] Rechenberg,I.(1973),Evolutionsstrategie:Optimierung Technischer Systeme nach Prinzip
ien der Biologischen Evolution.FrommannHolzboog (Stuttgart).
[24] Spears,W.M.(1993),“Crossover or Mutation?,” in L.D.Whitely (ed.),Foundations of
Genetic Algorithms 2.Morgan Kaufmann.
[25] Wright,A.H.(1991),“Genetic Algorithms for Real Parameter Optimization,” In G.Rawl
ins (ed.),Foundations of Genetic Algorithms.Morgan Kaufmann.
30
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο