Graduate Course An Introduction to Genetic Algorithms Chapter 1 ...

grandgoatΤεχνίτη Νοημοσύνη και Ρομποτική

23 Οκτ 2013 (πριν από 4 χρόνια και 7 μήνες)

152 εμφανίσεις

Graduate Course
An Introduction to Genetic Algorithms
Chia-Hsuan Yeh
The lecture note is based on Mitchell (1996) and Goldberg (1989)
Chapter 1:Genetic Algorithms:An Overview
• The goal of creating artificial intelligence and artificial life can be traced back to the
very beginnings of the computer age.The earliest computer scientists —Alan Turing,
John von Neumann,Norbert Wiener,and others — were motivated in large part by
visions of imbuing compute programs with intelligence,with life-like ability to self-
replicate,and with the adaptive capability to learn and to control their environments.
• These early pioneers of computer science were as much interested in biology and
psychology as in electronics,and they looked to natural systems as guiding metaphors
for how to achieve their visions.
• It should be no surprise,then,that from the earliest days computers were applied
not only to calculating missile trajectories and deciphering military codes but also to
modeling the brain,mimicking human learning,and simulating biological evolution.
• These biologically motivated computing activities have waxed and waned over the
years,but since the early 1980s they have all undergone a resurgence in the com-
putation research community.The first has grown into filed of neural networks,the
second into machine learning,and the third into what is now called “evolutionary
computation”,of which genetic algorithms are the most prominent examples.
2.A Brief History of Evolutionary Computation:
• In the 1950s and the 1960s several computer scientists independently studied evolu-
tionary systems with the idea that evolution could be used as optimization tools for
engineering problems.The idea in all these systems was to evolve a population of
candidate solutions to a given problem,using operators inspired by natural genetic
variation and natural selection.
• Evolutionary Strategies:
In 1960s,Rechenberg (1965,1973) introduced “evolution strategies”,a method he
used to optimize real-valued parameters for devices such as airfoils.
• Evolutionary Programming:
Fogel,Owens,and Walsh (1966) developed “evolutionary programming”,a technique
in which candidate solutions to given tasks were represented as finite-state machines,
which were evolved by randomly mutating their state-transition diagrams and select-
ing the fittest.
• Genetic Algorithms:
Genetic algorithms (GAs) were developed by John Holland and his students and
colleagues at the University of Michigan in the 1960s and 1970s.In contrast with
evolutionary strategies and evolutionary programming,Holland’s original goal was
not to design algorithms to solve specific problems,but rather to formally study
phenomenon of adaptation as it occurs in nature and to develop ways in which the
mechanisms of natural adaptation might be imported into computer systems.
3.What Are Genetic Algorithms?
• Holland’s 1975 book:Adaptation in Natural and Artificial Systems
He presented genetic algorithm as an abstraction of biological evolution and gave a
theoretical framework for an adaptation under GA.
• Holland’s GA is a method for moving from one population of “chromosomes” (strings
of ones and zeros,or “bits”) to a new population by using a kind of “natural selection”
together with the genetics-inspired operators of crossover,mutation,and inversion.
• Genetic algorithms are search algorithms based on the mechanics of natural selection
and natural genetics.
• They combine survival of the fittest among string structures with a structured
yet randomized information exchange to form a search algorithm with some of the
innovative flair of human search.
• In every generation,a new set of artificial creatures (strings) is created using bits and
pieces of the fittest of the old;an occasional new part is tried for good measure.
• Each chromosome (string) consists of “genes” (e.g.,bits),each gene being in instance
of a particular “allele” (e.g.,0 or 1).The selection operator chooses those chromo-
somes in the population that will be allowed to reproduce,and on average the fitter
chromosome produce more offspring than the less fit ones.
• Crossover exchanges subparts of two chromosomes,roughly mimicking biological re-
combination between two single-chromosome (haploid) organisms.
• Mutation randomly changes the allele values of some locations in the chromosome.
• Inversion reverses the order of a contiguous section of the chromosome,thus rear-
ranging the order in which genes are arrayed.
• GAs efficiently exploit historical information to speculate on new search points with
expected improved performance.
• Purpose:
The goals of their research have been twofold:(1) to abstract and rigorously explain
the adaptive processes of natural systems,and (2) to design artificial systems software
that retains the important mechanisms of natural and artificial systems science.
• Genetic algorithms are theoretically and empirically proven to provide robust search
in complex spaces.
• These algorithms are computationally simple yet powerful in their search for improve-
• They are not fundamentally limited by restrictive assumptions about the search space
(assumptions concerning continuity,existence of derivatives,unimodality,and other
4.The Appeal of Evolution
• Many computational problems require searching through a huge number of possibility
for solutions.
• Such search problems can often benefit from an effective use of parallelism,in which
many different possibility are explored simultaneously in an effective way.
• What is needed is both computational parallelism (i.e.,many processors evaluating
sequences at the same time) and an intelligent strategy for choosing the next set of
sequences to evaluate.
• Many computational problems require a computer programs to be adaptive — to
continue to perform well in a changing environment.
• Many problems require computer programs to be innovative —to construct something
truly new and original,such as a new algorithm for accomplishing a computational
task or even a new specific discovery.
• Many computational problems require complex solutions that are difficult to program
by hand.
• Many AI researchers believe that “rules” underlying intelligence are too complex for
scientists to encode by hand in a “top-down” fashion.Instead they believe that
the best route to artificial intelligence is through a “bottom-up” paradigm in which
humans write only very simple rules,and complex behaviors such as intelligence
emerge from the massively parallel application and interaction of these simple rules.
• Biological evolution is an appealing source of inspiration for addressing these prob-
lems.Evolution is,in effect,a method of searching among an enormous number of
possibilities for“solutions”.
5.Robustness of Traditional Optimization and Search Methods
• The current literature identifies three main types of search methods:
– Calculus-based
Calculus-based methods have been studied heavily.These subdivide into two
main classes:indirect and direct.Indirect methods seek local extrema by solv-
ing the usually nonlinear set of equations resulting from setting the gradient of
the objective function equal to zero.This is the multidimensional generalization
of the elementary calculus notion of extremal points.Given a smooth,uncon-
strained function,finding a possible peak starts by restricting search to those
points with slopes of zero in all directions.On the other hand,direct (search)
methods seek local optima by hopping on the function and moving in a direction
related to the local gradient.This is simply the notion of hill-climbing:to find
the local best,climb the function in the steepest permissible direction.While
both of these calculus-based methods have been improved,extended,hashed,
and rehashed,some simple reasoning shows their lack of robustness.
∗ Both methods are local in scope,the optima they seek are the best in a
neighborhood of the current point.
∗ Once the lower peak is reached,further improvement must be sought through
random restart or other trickery.Calculus-based methods depend upon the
existence of derivatives (well-defined slope values).Even if we allow numerical
approximation of derivatives,this is a severe shortcoming.
– Enumerative
Within a finite search space or a discretized infinite search space,the search
algorithm starts looking at objective function values at every point in the space,
once at a time.
∗ Although the simplicity if this type of algorithm is attractive,and enumer-
ation is a very human kind of search,such schemes must ultimately be dis-
counted in the robustness race for one simple reason:lack of efficiency.
– Random
Random search algorithms have achieved increasing popularity as researchers
have recognized the shortcomings of calculus-based and enumerative schemes.
∗ Random walks and random schemes that search and save the best must also
be discounted because of the efficiency requirement.
∗ Random searches,in the long run,can be expected to do no better than
enumerative schemes.
• How Are Genetic Algorithms Different from Traditional Methods?
– GAs work with a coding of the parameter set,not the parameters themselves.
– GAs search from a population of points,not a single point.
– GAs use payoff (objective function) information,not derivatives or other auxiliary
– GAs use probabilistic transition rules,not deterministic rules.
• GAs search combine “exploitation” (deterministic search) and “exploration” (random
6.Elements of Genetic Algorithms
• There is no rigorous definition of “genetic algorithm” accepted by all in the evo-
lutionary computation community that differentiates GAs from other evolutionary
computation methods.
• However,it can be said that most methods called “GAs” have at least the following
elements in common:
– population of chromosomes,
– selection according to fitness (performance),
– crossover to produce new offspring,
– random mutation of new offspring.
• A Simple Genetic Algorithm:
(a) Start with a randomly generated population of a n l-bit chromosomes (candidate
solutions to a problem).
(b) Calculate the fitness f(x) of each chromosome x in the population.
(c) Repeat the following steps until n offspring have been created.
i.Select a pair of parent chromosomes from the current population,the prob-
ability of selection being an increasing function of fitness.Selection is done
“with replacement”,meaning that the same chromosome can be selected more
than once to become a parent.
ii.With probability p
(the “crossover probability” or “crossover rate”),cross
over the pair at a randomly chosen point (chosen with uniform probability)
to form two offspring.If no crossover takes place,form two offspring that
exact copies of their respective parents.(Note that here the crossover rate
is defined to be the probability that two parents will cross over in a single
point.There are also “multi-point crossover” versions of the GA in which
the crossover rate for a pair of parents is the number of points at which a
crossover takes place.)
iii.Mutate the two offspring at each locus with probability p
(the mutation
probability or mutation rate),and place the resulting chromosomes in the
new population.If n is odd,one new population member can be discarded
at random.
(d) Replace the current population with the new population.
(e) Go to step (b).
7.The Basic Characteristics of Genetic Algorithms:
• Randomness plays a large role in the run of GAs,each run with different random-
number seeds will generally produce different detailed behaviors.
• GA researchers often report statistics (such as the best fitness found in a run and the
generation at which the individual with that best fitness was discovered) averaged
over many different runs of the GA on the same problem.
• There are a number of details to fill in,such as the size of the population and the prob-
abilities of crossover and mutation,and the success of the algorithm often depends
greatly on these details.
8.Some Applications of Genetic Algorithms:
• Optimization
• Automatic programming
• Machine learning
• Economics
• Immune systems
• Ecology
• Population genetics
• Evolution and learning
• Social systems
9.Example:Optimization of a Simple Function
• Given a function,f(x) = xsin(10πx) +1.0,and it is drawn in Figure 1.
• The problem is to find x from [-1,2] which maximizes the function f,i.e.,to find x
such that f(x
) ≥ f(x),for all x ∈ [-1,2].
• Representation:
– We use a binary vector as a chromosome to represent real values variable x.


 

 
 
 



Figure 1:Graph of the function f(x) = xsin(10πx) +1.0
– The length of the vector depends on the required precision,for example,six
places after the decimal point.
– The domain of the variable x has length 3,this implies the range should be
divided into at least 3× 1000000 equal size ranges.
– This means that 22 bits are required as a binary vector because 2097152 =
< 3000000 ≤ 2
= 4194304.
– The mapping from a binary < b
> into a real number x from the
range [-1,2] is as follows:
∗ convert the binary string < b
> from the base 2 to base 10:
(< b
= (Σ
= x

∗ find a corresponding real number x:
x = −1.0 +x
′ 3
where -1.0 is the left boundary of the domain and 3 is the length of the
• The evaluation function (fitness function) for the binary vectors is equivalent to the
function f:
eval(v) = f(x),
where the chromosome v represents the real value x.
• Experiment result:v
=(1111001101000100000101),which corresponds to a value
= 1.850773.
10.Example:Using GAs to Evolve Strategies for the Prisoner’s Dilemma
• Asimple two-person game invented by Merrill Flood and Melvin Dresher in the 1950s.
• Two individuals (call them Alice and Bob) are arrested for committing a crime to-
gether and are held in separate cells,with no communication possible between them.
• Alice is offered the following deal:
– If she confesses and agrees to testify against Bob,she will receive a suspended
sentence with probation,and Bob will be put away for 5 years.
– However,if at the same time Bob confesses and agrees to testify against Alice,
her testimony will be discredited,and each will receive 4 years for pleading guilty.
– Alice is told that Bob is being offered precisely the same deal.
• Payoff matrix:(Alice,Bob)
• Each player independently decides which move to make —i.e.,whether to cooperate
or defect.
• What is the best strategy to use in order to maximize one’s own payoff?
– If you suspect that your opponent is going to cooperate,then you should surely
– If you suspect that your opponent is going to defect,then you should defect too.
• The dilemma is that if both players defect each gets a worse score than if they
• If the game is iterated (that is,if the two players play several games in a row),both
players’ always defecting will lead to a much lower total payoff than the players would
get if they cooperated.
• How can reciprocal cooperation be induced?
• Robert Axelrod’s studies in the University of Michigan:
– Human Designed Strategies:
∗ He solicited strategies from researchers in a number of disciplines.
∗ Each participant submitted a computer program that implemented a partic-
ular strategy.
∗ These various programs played iterated games with each other.
∗ During each game,each program remembered what move (i.e.,cooperate
or defect) both it and its opponent had made in each of the three previous
games that they had played with each other,and its strategy was based on
this memory.
∗ The programs were paired in a round-robin tournament in which each played
with all the other programs over a number of games.
∗ Some of the strategies submitted were rather complicated,using techniques
such as Markov processes and Bayesian inference to model the other players
in order to determine the best move.
∗ However,the winner (the strategy with the highest average score) was the
simplest of the submitted strategies:TIT FOR TAT.
∗ TIT FOR TAT punishes that defection with a defection of its own,and
continues the punishment until the other player begins cooperating again.
– Genetic Algorithm Implementation (1):
∗ Axelord (1987) decided to see if a GA could evolve strategies to play this
game successfully.
∗ Representation of strategies:the memory of each player is one previous game.
∗ Representation of strategies:the memory of each player is three previous
∗ The fitness of a strategy:
· Axelord had found that eight of the human-generated strategies from the
tournament were representative of the entire set of strategies.
· The set of eight strategies (which did not include TIT FOR TAT) served
as the “environment” for the evolving strategies in the population.
· Each individual in the population played iterated games with each of the
eight fixed strategies,and the individual’s fitness was taken to be its average
score over all the games it played.
∗ Simulation result:most of the strategies that evolved were similar to TIT
∗ It would be wrong to conclude that the GA discovered strategies that are
“better” than any human-designed strategy.
∗ The performance of a strategy depends very much on its environment —that
is,on the strategies with which it is playing.
∗ Here the environment was fixed — it consisted of eight human-designed
strategies that did not change over the course of a run.
∗ The resulting fitness function is an example of a static fitness landscape.
∗ It is not necessary true that these high-scoring strategies would also score
well in a different environment.
– Genetic Algorithm Implementation (2):
∗ Axelord carried out another in which the fitness of an individual was deter-
mined by allowing the individuals in the population to play with one another
rather than with the fixed set of eight strategies.
∗ Then the environment changed from generation to generation because the
opponents themselves were evolving.
∗ The the fitness landscape was not static.
∗ In the first few generations,strategies that tended to cooperate did not find
reciprocation among their fellow population members and thus tended to die
∗ After about 10-20 generations,the trend started to reverse:the GA discov-
ered strategies that reciprocated cooperation and that punished defection
(i.e.,variants of TIT FOR TAT).
11.Example:Traveling Salesman Problem (TSP)
• What is TSP?
The traveling salesman must visit every city in his territory exactly once and then
return to the starting point.Given,the cost of travel between all cities,how should
he plan his itinerary for minimum total cost of the entire tour?
• The TSP is a problem in combinatorial optimization and arises in numerous applica-
• Representation:integer vector or binary string?
– In a binary representation of a n cities TSP,each city should be encoded as a
string of ⌈log
n⌉ bits;a chromosome is a string of n⌈log
n⌉ bits.
– A mutation or crossover can result in a sequence of cities,which is not a tour:
we can get the same city twice in a sequence.
– For a TSP with 20 cities (where we need 5 bits to represent a cities),some 5-bit
sequences (for example,10101) do not correspond to any city.
– If we use mutation and crossover operators as defined earlier,we would need some
sort of a “repair algorithm”;such an algorithm would “repair” a chromosome,
moving it back into the search space.
• Integer representation:
– A vector v =< i
> represents a tour:from i
to i
,etc.,from i
and back to i
(v is a permutation of <12...n>).
– Then,given the cost of travel between all cities,we can easily calculate the total
cost of the entire tour.
– However,the crossover will produce the invalid strings (the same as the previous
– Modified crossover:
∗ Given two parents,builds offspring by choosing a subsequence of a tour from
one parent and preserving the relative order of cities from the other parent.
∗ For example,if the parents are
< 1,2,3,4,5,6,7,8,9,10,11,12 >
< 7,3,1,11,4,12,5,2,10,9,6,8 >
and the chosen part is < 4,5,6,7 >,then the resulting offspring is
< 1,11,12,4,5,6,7,2,10,9,8,3 >
∗ The offspring bears a structural relationship to both parents.The roles of
the parents can then be reversed in constructing a second offspring.
Chapter 2:The Mathematical Foundations of Genetic Algorithms
• The traditional theory of GAs (first formulated in Holland 1975) assumes that,at a
very general level of description,GAs work by discovering,emphasizing,and recom-
bining good “building blocks” of solutions in a highly parallel fashion.
• The idea here is that good solutions tend to be made up of good building blocks —
combinations of bits values that confer higher fitness on the strings in which they are
• Holland (1975) introduced the notion of schemas (or schemata) to formalize the in-
formal notion of “building blocks”.
• A schema is a set of bit strings that can be described by a template made up of ones,
zeros,and asterisks,the asterisks representing wild cards (or “don’t care”).
• Consider strings to be constructed over the binary alphabet V = {0,1}.The l-bit
string may be represented symbolically as A = a
,where a
∈ V.
• Consider V

= {0,1,∗},where * means “don’t care”.
• If H = ∗11 ∗ 0 ∗ ∗,then A = 0111000 is an example of the schema H because A
matches schema positions at the fixed positions 2,3,and 5.
• In general,for alphabets of cardinality k,there are (k +1)
• Order:
The order of a schema H,denoted by o(H),is simply the number of fixed positions (in
a binary alphabet,the number of 1’s and 0’s) present in the template.For example,
the order of the schema 011*1** is 4 (symbolically,o(011 ∗ 1 ∗ ∗) = 4).
• Defining length:
The defining length of a schema H,denoted by δ(H),is simply the distance between
the first and last specific string position.For example,the schema 011*1** has
defining length δ = 4.The schema 0****** has defining length δ = 0.
3.The effect of reproduction on the expected number of schemata:
• Suppose at a given time step t,there are m examples of a particular schema H
contained within the population A(t) where we write m= m(H,t).
• During reproduction,a string is copied according to its fitness (with probability
= f
• After picking a nonoverlapping population of size n with replacement from the pop-
ulation A(t),we expected to have m(H,t + 1) representatives of the schema H in
the population at time t + 1,m(H,t + 1) = m(H,t)nf(H)/Σf
,where f(H) is the
average fitness of the strings representing schema H at time t.
• Let
f = Σf
/n,then m(H,t +1) = m(H,t)f(H)/
• A particular schema grows as the ratio of the average fitness of the schema to the
average fitness of the population.
• Suppose a particular schema H remains above average an amount c
f with c a con-
stant,then m(H,t +1) = m(H,t)(
f +c
f = (1 +c)m(H,t)
• Starting at t = 0 and assuming a stationary value c,then we have m(H,t) =
m(H,0)(1 +c)
• Reproduction allocates exponentially increasing (decreasing) numbers of trials to
above- (below-) average schemata.
4.The influence of crossover on the schemata:
• Consider two schemata H
and H
= ∗1 ∗ ∗ ∗ ∗0,H
= ∗ ∗ ∗10 ∗ ∗.Then,it is
clear that schema H
is less likely to survive crossover than schema H
• The survival probability under simple crossover is p
= 1 −δ(H)/(l −1).
• If crossover is itself performed by random choice,say with probability p
,at a par-
ticular mating,the survival probability may be given by p
≥ 1 −p
δ(H)/(l −1).
• The combined effect of reproduction and crossover:
Assuming independence of the reproduction and crossover operators,then we have
m(H,t +1) ≥ m(H,t)

1 −p
l −1

• Therefore,the schema H grows or decays depending on two things:(1) whether the
schema is above or below the population average,and (2) whether the schema has
relatively short or long defining length.
5.The influence of mutation on the schemata:
• In order for schema H to survive,all the specified positions must themselves survive.
• A single allele survives with probability (1 −p
• A particular schema survives when each of the o(H) fixed positions within the schema
survives.Therefore,the probability of surviving mutation is (1 −p
• When p
<< 1,the probability may be approximated by 1 −o(H)p
• Therefore,a particular schema H receives an expected number of copies in the next
generation under reproduction,crossover,and mutation as given by the following
equation (ignoring small cross-product terms):
m(H,t +1) ≥ m(H,t)

1 −p
l −1

6.Schema Theorem (The Fundamental Theorem of Genetic Algorithms):
Short,low-order,above-average schemata receive exponentially increasing trials in subse-
quent generations.
7.An immediate result of this theorem is that GAs explore the search space by short,low-
order schemata which,subsequently,are used for information exchange during crossover.
8.Building Block Hypothesis:
A genetic algorithm seeks near-optimal performance through the juxtaposition of short,
lower-order,high-performance schemata,called the building blocks.
• During the last twenty years,many GAs applications were developed which supported
the building block hypothesis in many different problem domains.
• Although some research has been done to prove this hypothesis,for most nontrivial
applications we rely mostly on empirical results.
• Bethke (1981) has generated a number of test cases that provable misleading the
simple three-operator genetic algorithm (we call these coding-function combinations
• It suggests that functions and codings that are GA-deceptive tend to contain isolated
optima:the best points tend to be surrounded by the worst.
• Practically speaking,many of the functions encountered in the real world do not have
this needle-in-the-haystack quality;there is usually some regularity in the function-
coding combination —much like the regularity of one or more schemata —that may
be exploited by the recombination of building blocks.
• If the building blocks are misleading due to the coding used or function itself,the
problem may require long waiting times to arrive at near optimal solutions.
• Example:Violation of Building Block Hypothesis
We would like to have short,low-order building blocks lead to incorrect (suboptimal)
longer,higher-order building blocks.
– Suppose we have a set of four order-2 schemata over two defining positions,each
schema associated with a fitness value as follows:
* * * 0 * * * * * * 0 f
* * * 0 * * * * * * 1 f
* * * 1 * * * * * * 0 f
* * * 1 * * * * * * 1 f
k ← δ(H) → k
The fitness values are schema averages,assumed to be constant with no variance
(this last restriction may be lifted without changing our conclusions as we only
consider expected performance).
– Assume that f
is the global optimum:f
> f
> f
> f
– We want a problem where one or both of the suboptimal,order-1 schemata are
better than the optimal,order-1 schemata.Mathematically,we want one or both
of the following conditions to hold:
f(0∗) > f(1∗)
f(∗0) > f(∗1)
In the expressions we have dropped consideration of all alleles other than the two
defining positions,and the fitness expression implies an average over all strings
contained within the specific similarity subset.
– Thus we would like the following two expressions to hold:
f(00) +f(01)
f(10) +f(11)
f(00) +f(10)
f(01) +f(11)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
 

 
Figure 2:Sketch of Type I,minimal deceptive problem (MDP) f
> f
– Unfortunately,both expressions cannot hold simultaneously in the two-bit prob-
lem,and without loss of generality we assume that first expression is true.
– Thus,the deceptive two-bit problem is specified by the globality condition (f
is the best) and one deception condition (we choose f(0∗) > f(1∗))
– To put the problem into closer perspective,we normalize all fitness values with
respect to the fitness of the complement of the global optimum as follows:
r =
,c =

– We may rewrite the globality condition in normalized form:
r > c,r > 1,r > c

– We may also rewrite the deceptive condition in normalized form:
r < 1 +c −c

Because based on the deception condition,f(0∗) > f(1∗),we have f
> f
> f
,and then f
> f
,=⇒ 1 +c > c

+r,=⇒ 1 +c −c

> r.
– From these conditions,we may conclude a number of interesting facts:

< 1,c

< c.
– From these,we recognize that there are two types of deceptive two-bit problem:
Type I:f
> f
(c > 1).
Type II:f
≥ f
(c ≤ 1).
Figure 2 and 3 are representative sketches of these problems where the fitness is
graphed as a function of two boolean variables.
– Both figures are representative sketches of these problems where the fitness is
graphed as a function of two boolean variables.Both cases are deceptive,and it
may be shown that neither case can be expressed as a linear combination of the
individual allele values:
) = b +

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
 

 
Figure 3:Sketch of Type II,minimal deceptive problem (MDP) f
> f
– It may be proved that no one-bit problemcan be deceptive,the deceptive,two-bit
problem is the smallest possible deceptive problem:it is the minimal,deceptive
problem (MDP).
• The approach proposed to deal with this problem:
– Assumes prior knowledge of the object function to code it in an appropriate way
(to get “tight” building blocks).For example,prior knowledge about the objec-
tive function,and consequently about the deception,might result in a different
coding,where the bit required to optimize the function are adjacent.
– Employing a new operator:inversion:
It selects two points within a string and invert the order of bits between selected
points,but remembering the bit’s “meaning’.This means that we have to identify
bits in the strings:we do so by keeping bits together with a record of their original
∗ In Goldberg (1989a),
“Put another way,inversion is to orderings what mutation is to alleles:both
fight the good fight against search-stopping lack of diversity,but neither is
sufficiently powerful to search for good structure,allelic or permutational,on
its own when good structures require epistatic interaction of the individual
– Messy genetic algorithms (mGA):Goldberg (1989a),Goldberg,Korb,and Deb
(1989),and Goldberg,Deb,and Korb (1991):
The term “messy GA ” is meant to be contrasted with the standard “neat”
fixed-length,fixed-population-size GAs.The goal of messy GAs is to improve
the GA’s function-optimization performance by explicitly building up increasing
longer,highly fit strings from well-tested shorter building blocks.The general
idea was biologically motivated:“After all,nature did not start with strings of
length 5.9×10
or even of length two million and try to make man.Instead,
simple life forms gave way to more complex life forms,with the building blocks
learned at earlier times used and reused to good effect along way.” (Goldberg,
Korb,and Deb,1989,p.500)
Chapter 3:Computer Implementation of a Genetic Algorithm
1.Basic Elements for Running Genetic Algorithms:
Population size
Chromosome length
Maximum number of generation
Probability of reproduction
Probability of crossover
Probability of mutation
Selection mechanism

Proportional selection
Fitness function

Sum of square error
2.Implementation of The Genetic Operators:
• Sequential:
reproduction =⇒
crossover =⇒
• Probabilistic:
reproduction,crossover,and mutation are performed based on the probability.
:the probability of the selected chromosome being reproduction
:the probability of the selected chromosome being crossed over
′:the probability of the selected chromosome being mutated
:the probability of each bit being mutated for the chromosome going to be mutated
3.How to determine the appropriate values of these parameters?
• There is no objective way to determine the value of these parameters.
• In De Jong’s (1975) study of genetic algorithms in function optimization,a series
of parametric studies across a five-function suite of problems suggested that good
GA performance requires the choice of a high crossover probability,a low mutation
probability (inversely proportional to the population size),and a moderate population
size.For example,p
• However,it is not always true when we apply GAs in other research areas,for example,
social adaptive systems.
4.Mapping Objective Functions to Fitness Form
• In many problems,the objective is more naturally stated as the minimization of some
cost function g(x) rather than the maximization of some utility or profit function u(x).
• Even if the problem is naturally stated in maximization form,this alone does not
guarantee that the utility function will be nonnegative for all x as we require in
fitness function.
• As a result,it is often necessary to map the underlying natural object function to a
fitness function form through one or more mappings.
• The duality of cost minimization and profit maximization is well known.In normal
operations research work,to transform a minimization to a maximization problem
we simply multiply the cost function by -1.However,we usually need the following
transform function to guarantee the positive fitness values,
f(x) =
−g(x),when g(x) < C
• Of course,there are a variety of ways to choose the coefficient C
may be
taken as an input coefficient,as the largest g value observed thus far,as the largest
value in the current population,or the largest of the last k generations.Perhaps more
should vary depending on the population variance.
• When the natural objective function formulation is a profit or utility function,we
may still have a problem with negative utility function u(x) values.Therefore,we
simply transform fitness according to the equation:
f(x) =
u(x) +C
,when u(x) +C
> 0,
• We may choose C
as an input coefficient,as the absolute value of the worst u
value in the current or last k generations,or as a function of the population variance.
5.Fitness Scaling
• Problem 1:
– Regulation of the number of copies is especially important in small population
genetic algorithms.
– At the start of GA runs it is common to have a few extraordinary individuals in
a population of mediocre colleagues.
– If let to the normal selection rule (p
= f
),the extraordinary individu-
als would take over a significant proportion of the finite population in a single
generation,and this is undesirable,a leading cause of premature convergence.
• Problem 2:
– Late in a run,there may still be significant diversity within the population;
however,the population average fitness may be close to the population best
fitness.If this situation is left alone,average members and best members get
nearly the same number of copies in future generations,and the survival of the
fittest necessary for improvement becomes a random walk among the mediocre.
• In both cases,at the beginning of the run and as the run matures,fitness scaling can
• One useful scaling procedure is linear scaling:f

= af +b.
– f:raw fitness.
– f

:scaled fitness.
– a,b:constant coefficients.
• In this way,simple scaling helps prevent the early domination of extraordinary indi-
viduals,while it later on encourages a healthy competition among near equals.
• The basic principles for choosing a GA codings:
– Principle of meaningful building blocks:
The user should select a coding so that short,low-order schemata are relevant
to the underlying problem and relatively unrelated to schemata over other fixed
– Principle of minimal alphabets:
The user should select the smallest alphabet that permits a natural expression
of the problem.
• Illustration:
Comparison of Binary and Nonbinary String Population
Binary String Value X Nonbinary String Fitness
01101 13 N 169
11000 24 Y 576
01000 8 I 64
10011 19 T 361
Binary and Nonbinary Coding Correspondence
Binary Nonbinary
00000 A
00001 B
11001 Z
11010 1
11011 2
11111 6
• What is the influence for the different codings?
– The different alphabet cardinalities require different string length.For equality
of the number of points in each space,we require 2
= k

,where l is the binary
code string length and l

is the nonbinary code string length.
– The number of schemata for each coding may then be calculated using the re-
spective string length:3
in the binary case and (k +1)

in the nonbinary case.
– It is easy to show that the binary alphabet offers the maximum number of
schemata per bit of information of any coding.
– Since these similarities are the essence of our search,when we design a code we
should maximize the number of them available for the GA to exploit.
7.Encoding a Problem for a Genetic Algorithm:
Most GA applications use fix-length,fix-order bit strings to encode candidate solutions.
However,in recent years,there have been many experiments with other kinds of encodings.
• Binary Encodings
– The most common encodings for a number of reasons:
∗ In the early work,Holland and his students concentrated on such encodings
and GA practice has tended to follow this lead.
∗ Much of the existing GA theory is based on the assumption of fix-length,
fix-order binary encodings.
∗ Much of that theory can be extended to apply to nonbinary encodings,but
such extensions are not as well developed as the original theory.
∗ Heuristics about appropriate parameter settings (e.g.,for crossover and muta-
tion rates) have generally been developed in the context of binary encodings.
– However,binary encodings are unnatural and unwieldy for many problems (e.g.,
evolving weights for neural networks or evolving condition sets in the manner of
Meyer and Packard),and they are prone to rather arbitrary orderings.
• Many-Character and Real-Valued Encodings
– For many applications,it is most natural to use an alphabet of many characters
or real numbers to form chromosomes.
– Examples include Kitano’s many-character representation for graph-generation
grammars,Meyer and Packard’s real-valued representation for condition sets,
Montana and Davis’s real-valued representation for neural-network weights,and
Schultz-Kremer’s real-valued representation for torsion angles in proteins.
– Holland’s schemata-counting argument seems to imply that GAs should exhibit
worse performance on multiple-character encodings than on binary encodings.
However,this has been questioned by some.(e.g.,Antonisse,1989)
– Several empirical comparisons between binary encodings and multiple-character
or real-valued encodings have shown better performance for the latter.(e.g.,
Janikow and Michalewicz,1991;Wright,1991)
– But the performance depends very much on the problem and the details of the
GA being used,and at present there are no rigorous guidelines for predicting
which encoding will work best.
• Adapting the Encoding
– Choosing a fixed encoding ahead of time presents a paradox to the potential GA
user:for any problemthat one would want to use a GA,one doesn’t know enough
about the problem ahead of time to come up with the best encoding for the GA.
– In fact,coming up with the best encoding is almost tantamount to solving the
problem itself!
– We either have no idea how best to order the bits ahead of time for this problem.
This is known in the GA literature as the “linkage problem” — one wants to
have functionally related loci be more likely to stay together on the string under
crossover,but it is not clear how this is to be done without knowing ahead of
time which loci are important in useful schemata.
– A second reason for adapting the encoding is that a fix-length representation
limits the complexity of the candidate solutions.
– Messy GAs
8.Selection Methods:
The purpose of selection is to emphasize the fitter individuals in the population in hopes
that their offspring will in turn have even higher fitness.Selection has to be balanced
with variation fromcrossover and mutation (the “exploitation/exploration balance”).Too-
strong selection means that suboptimal highly fit individuals will take over the population,
reducing the diversity needed for further change and progress.Too-weak selection will
result in too-slow evolution.
• Fitness-Proportionate Selection with “Roulette Wheel”.
– Under fitness-proportionate selection,they and their descendants will multiply
quickly in the population,in effect preventing the GA from doing any further
– This is known as “premature convergence”.
– In other words,fitness-proportionate selection often puts too much emphasis on
“exploitation” of highly fit strings at the expense of exploration of other regions
of the search space.
• Sigma Scaling
– Mapping “raw” fitness values to expected values so as to make the GA less
susceptible to premature convergence.
– Forrest (1985) proposed sigma scaling,which keep the selection pressure relatively
constant over the course of the run rather than depending on the fitness variances
in the population.
– Under sigma scaling,an individual’s expected value is a function of its fitness,
the population mean,and the population standard deviation.For example,
ExpV al(i,t) =

1 +
,if σ(t) 6= 0,
1.0,if σ(t) = 0.
where ExpV al(i,t) is the expected value of individual i at time t,f(i) is the
fitness of i,
f(t) is the mean fitness of the population at time t,and σ(t) is the
standard deviation of the population fitnesses at time t.
– At the beginning of a run,when the standard deviation of fitnesses is typical
high,the fitter individuals will not be many standard deviations above the mean,
and so they will not be allocated the lion’s share of offspring.
– Likewise,later in the run,when the population is typically more converged and
the standard deviation is typically lower,the fitter individuals will stand out
more,allowing evolution to continue.
• Elitism
– “Elitism”,first introduced by Kenneth De Jong (1975),is an addition to many se-
lection methods that forces the GA to retain some number of the best individuals
at each generation.
– Such individuals can be lost if they are not selected to reproduce or if they are
destroyed by crossover or mutation.
– Many researchers have found that elitism significantly improves the GA’s perfor-
• Boltzmann Selection
– Sigma scaling keeps the selection pressure more constant over a run.But often
different amounts of selection pressure are needed at different times in a run –
for example,early on it might be good to be liberal,allowing less fit individuals
to reproduce at close to the rate of fitter individuals,and having selection occur
slowly while maintaining a lot of variation in the population.
– Later it might be good to have selection be stronger in order to strongly emphasize
highly fit individuals,assuming that the early diversity with slow selection has
allowed the population to find the right part of the search space.
– A typical implementation is to assign to each individual i an expected value,
ExpV al(i,t) =
where T is temperature and [ ]
denotes the average over the population at time
– Experimenting with this formula will show that,as T decreases,the difference in
ExpV al(i,t) between high and low fitnesses increases.
– The desire is to have this happen gradually over the course of the search,so
temperature is gradually decreased according to a predefined schedule.
• Rank Selection
– Rank selection is an alternate method whose purpose is also to prevent too-quick
– In the version proposed by Baker (1985),the individuals in the population are
ranked according to fitness,and the expected value of each individual depends
on its rank rather than on its absolute fitness.
– There is no need to scale fitnesses in this case,since absolute differences in fitness
are obscured.
– This discarding of absolute fitness information can have advantages (using abso-
lute fitness can lead to convergence problems) and disadvantages (in some cases
it might be important to know that one individual is far fitter than its nearest
– Ranking avoids giving the far largest share of offspring to a small group of highly
fit individuals,and thus reduces the selection pressure when the fitness variance
is high.
– It also keeps up selection pressure when the fitness variance is low.
– The linear ranking method proposed by Baker is as follows:
∗ Each individual is the population is ranked in increasing order of fitness,from
1 to N.
∗ The user chooses the expected value Max of the individual with rank N,
with Max ≥ 0.
∗ The expected value of each individual i in the population at time t is given
ExpV al(i,t) = Min +(Max −Min)
rank(i,t) −1
N −1
where Min is the expected value of the individual with rank 1.
∗ Given the constraints Max ≥ 0 and Σ
ExpV al(i,t) = N (since population
size stays constant from generation to generation),it is required that 1 ≤
Max ≤ 2 and Min = 2 −Max.
– Rank selection has a possible disadvantage:slowing down selection pressure
means that GA will in some cases be slower in finding highly fit individuals.
– However,in many cases the increased preservation of diversity that results from
ranking leads to more successful search than the quick convergence that can result
from fitness proportionate selection.
– A variation of rank selection with elitism was used by Meyer and Packard for
evolving condition sets,Mitchell and his colleagues used a similar scheme for
evolving cellular automata.In those examples the population was rank by fitness
and the top E strings were selected to be parents.The N −E offsprings were
merged with the E parents to create the next population.This is a form of
the so-called (µ + λ) strategy used in the evolving strategies community.This
method can be useful in cases where the fitness function is noisy (i.e.,is a random
variable);the best individuals are retained so that they can be tested again and
thus,over time,gain increasingly reliable fitness estimates.
• Tournament Selection
– Tournament selection is similar to rank selection in terms of selection pressure,
but it is computationally more efficient and more amenable to parallel implemen-
– Two individuals are chosen at random from the population.
– A random number r is then chosen between 0 and 1.
– If r < k (where k is a parameter,for example 0.75),the fitter of the two individ-
uals is selected to be a parent;otherwise the less fit individual is selected.
– The two are then returned to the original population and can be selected again.
– A more detailed description,please refer to Goldberg and Deb (1991).
• Steady-State Selection
– Most GAs described in the literature have been “generational” — at each gen-
eration the new population consists entirely of offspring formed by parents in
the previous generation (though some of these offspring may be identical to their
– In some schemes,such as elitist schemes,successive generations overlap to some
degree —some portion of the previous generation is retained in the new genera-
– The fraction of newindividuals at each generation has been called the “generation
gap”.(De Jong,1975)
– In steady-state selection,only a few individuals are replaced in each generation:
usually a small number of the least fit individuals are replaced by offspring re-
sulting from crossover and mutation of the fittest individuals.
– Steady-state GAs are often used in evolving rule-based systems (e.g.,classifier
systems;see Holland 1986) in which incremental learning (and remembering what
has already been learned) is important and in which members of the population
collectively (rather than individually) solve the problem at hand.
• For more technical comparisons of different selection methods,see Goldberg and Deb
(1991),B¨ack and Hoffmeister (1991),de la Maza and Tidor (1993),and Hancock
9.Advanced Topic in Crossover:
• The usefulness of crossover is to recombine building blocks (schemata) on different
• Single-point crossover has some shortcomings.It cannot combine all possible schemata.
For example,it cannot in general combine instances 11*****1 and ****11** to form
an instance of 11**11*1.
• Likewise,schemata with long defining lengths are likely to be destroyed under single-
point crossover.The schemata that can be created or destroyed by a crossover depend
strongly on the location of the bits in the chromosome.
• Single-point crossover assumes that short,low-order schemata are the functional
building blocks of strings,but one generally does not know in advance what ordering
of bits will group functionally related bits together.
• Eshelman,Caruana,and Schaffer (1989) pointed out that there may not be any way
to put all functionally related bits close together on a string,since particular bits
might be crucial in more than one schema.
• They pointed out further that the tendency of single-point crossover to keep short
schemata intact can lead to the preservation of hitchhikers — bits that are not part
of a desired schema but which,by being close on the string,hitchhike along with the
beneficial schema as it reproduces.
• Many people have also noted that single-point crossover treats some loci preferen-
tially:the segments exchanged between the two parents always contain the endpoints
of the strings.
• Two-point crossover.
• Parameterized uniform crossover:
An exchange happens at each bit position with probability p (typically 0.5 ≤ p ≤0.8).
However,this lack of positional bias can prevent coadapted alleles from ever forming
in the population,since parameterized uniform crossover can be highly disruptive of
any schema.
• Given these arguments,which one should we use?There is no simple answer.The
success or failure of a particular crossover operator depends in complicated ways on
the particular fitness function,encoding,and other details of the GA.It is still a very
important open problem to fully understand these interactions.
• It is common in recent GA applications to use either two-point crossover or parame-
terized uniform crossover with p ≈ 0.7 −0.8.
10.Advanced Topic in Mutation:
• A common view in GA community,dating back to Holland (1975),is that crossover
is the major instrument of variation and innovation in GAs,with mutation insuring
the population against permanent fixation at any particular locus and thus playing
more of a background role.
• However,the appreciation of the role of mutation is growing as the GA community
attempts to understand how GAs solve complex problems.
• Spears (1993) formally verified the intuitive idea that,while mutation and crossover
have the same ability for “disruption” of existing schemata,crossover is a more robust
“constructor” of new schemata.
• M¨uhlenbein (1992,p.15),on the other hand,argues that in many cases a hill-climbing
strategy will work better than a GA with crossover and that the “power of mutation
has been underestimated in traditional genetic algorithms”.
Chapter 4:Introduction to Genetic-Based Machine Learning
1.This topic is based on Goldberg (1989).
• A classifier system is a learning system that learns syntactically simple string rules
(called classifiers) to guild its performance in an arbitrary environment.
• A classifier system consists of three main components:
– Rule and message system
– Apportionment of credit system
– Genetic algorithms
• The rule and message system of a classifier system is a special kind of production
system.A production system is a computational scheme that uses rules as its only
algorithmic device.The rules are generally of the following form:
if < condition > then < action >.
• At first glance,the restriction to such a simple device for the representation of knowl-
edge might seem too constraining.Yet it has been shown that production systems
are computationally complete (Minsky,1967).A single rule or small set of rules can
represent a complex set of thoughts compactly.
• Traditional rule-based systems have been less frequently suggested in situations in
need of learning.One of the main obstacles to learning has been complex rule syntax.
• Classifier systems depart from the mainstream by restricting a rule to a fix-length
representation.This restriction has two benefits.First,all strings under the per-
missible alphabet are syntactically meaningful.Second,a fixed string representation
permits string operators of the genetic kind.This leaves the door propped open,
ready for a genetic algorithm search of the space of permissible rules.
3.Rule and Message System:
• A schematic depicting the rule and message system,the apportionment of credit
system,and genetic algorithm is shown in Figure 1.
• The rule and message system form the computational backbone.Information flows
from the environment through the detectors — the classifier system’s eyes and ears
— where it is decoded to one or more finite length messages.These environmental
messages are posted to a finite-length message list where the messages may then
activate string rules called classifiers.
• When activated,a classifier posts a message to the message list.These messages may
then invoke other classifiers or they may cause an action to be taken through the
system’s action triggers called effectors.
• In this way classifiers combine environmental cues and internal thoughts to determine
what the system should do and think next.
• A message within a classifier system is simple a finite-length string over some finite
alphabet.If we limit ourselves to a binary alphabet we obtain the following definition:
< message >::= {0,1}
where l means the length of the string.

Figure 1:A Learning Classifier System Interacts with Its Environment
• The condition is a simple pattern recognition device where * is added to the underlying
< condition >::= {0,1,∗}
• Therefore,a classifier is a production rule with excruciatingly simply syntax:
< classifier >::=< condition >:< message >
• Once a classifier’s condition is matched,that classifier becomes a candidate to post its
message to the message list on the next time step.Whether the candidate classifier
posts its message is determined by the outcome of an activation auction,which in
turn depends on the evaluation of a classifier’s value or weighting.
• An example:
Suppose we have a classifier store consisting the classifiers shown in the following
Four Classifiers
Index Classifier
1 01**:0000
2 00*0:1100
3 11**:1000
4 **00:0001
– At the first time step,an environment message 0111 appears on the message list.
– This message matches classifier 1,which then posts its message,0000.
– This message matches rules 2 and 4,which in turn post their messages (1100 and
– Message 1100 then matches classifier 3 and 4.Thereafter the message sent by
classifier 3,1000 then matches classifier 4 and the process terminates.
4.Apportionment of Credit Algorithm:The Bucket Brigade
• Many classifier systems attempt to rank or rate individual classifiers according to a
classifier’s role in achieving reward from the environment.
• The most prevalent method incorporates what Holland has called a bucket brigade
algorithm.The bucket brigade may most easily be viewed as an information economy
where the right to trade information is bought and sold by classifiers.
• This service economy contains two main components:an auction and a clearinghouse.
When classifier are matched they do not directly post their message.Instead,having
its condition matched qualifies a classifier to participate in an activation auction.
• To participate in the auction,each classifier maintains a record of its net worth,called
its strength S.Each matched classifier makes a bid B proportional to its strength.In
this way,rules that highly fit are given preference over other rules.
• The auction permits appropriate classifiers to be selected to post their messages.
Once a classifier is selected for activation,it must clear its payment through the
clearinghouse,paying its bid to other classifiers for matching messages rendered.
• A matched and activated classifier sends its bid B to those classifiers responsible for
sending the messages that matched the bidding classifier’s condition.
• The bid payment is divided in some manner among the matching classifiers.The
division of payoff among contributing classifiers helps ensure the formation of an
appropriately sized subpopulation of rules.Thus different types of rules can cover
different types of behavioral requirements without undue interspecies competition.
• In a rule-learning system of any consequence,we cannot search for one master rule.
We must instead search for a coadapted set of rules that together cover a range of
behavior that provides ample payoff to the learning system.
5.A detailed auction and payment scheme:
• Classifiers make bids (B
) during the auction.Winning classifiers turn over their bids
to the clearinghouse as payments (P
).A classifier may also have receipts R
from its
previous message-sending activity or from environment reward.In addition to bids
and receipts,a classifier may be subject to one or more taxes T
.Taken together,
we may write an equation governing the deletion or accretion of the ith classifier’s
strength as follows:
(t +1) = S
(t) −P
(t) −T
(t) +R
• A classifier bids in proportion to its strength:
= C
where C
is the bid coefficient,S is strength,and i is classifier index.
• We hold auction in the presence of random noise.We calculate an effective bid (EB):
= B
where the noise N is a function of the specified bidding noise standard deviation σ
• Each classifier is taxed to prevent freeloading,we simply collect a tax proportional
to the classifier’s strength:
= C
• Therefore,the apportionment of credit algorithmfor an active classifier can be rewrote
(t +1) = S
(t) −C
S(t) −C
S(t) +R
• Let K = C
,and 0 ≤ K ≤ 1.We have
S(n) = (1 −K)
S(0) +Σ
R(j)(1 −K)
• To investigate the effect of this mechanism further,we examine the steady-state
response.If the process continues indefinitely with a constant receipt R(t) = R
obtain the steady-state strength by setting S(t +1) = S(t) = S
.Then we have
= R
• The steady bid may be derived as follows:
• Since C
is usually small with respect to the bid coefficient,the steady bid value
usually approaches the steady receipt value,B
≈ R
.In other words,for steady
receipts,the bid value approaches the receipt.
6.An example:See Figure 2.
7.Genetic Algorithm
• The bucket brigade provides a clean procedure for evaluating rules and deciding
among competing alternatives.Yet we still must devise a way of injecting new,
possible better rules into the system.This is precisely where the genetic algorithm
steps in.
• However,we must be a little less cavalier about wanton replacement of the entire
population,and we must pay more attention to who replaces whom.
• Here,we define a quantity called the selection proportion where we replace that
proportion of the population at a given genetic algorithm invocation.We also define
a quantity called the GA period,T
,that specifies the number of time steps (rule
and message cycles) between GA calls.This period may be treated deterministically
or stochastically..Additionally,the invocation of genetic algorithm learning may be
conditioned on particular events such as lack of a match or poor performance.

Figure 2:A Simple Classifier System by Hand–Matching and Payments
[1] Antonisse,J.(1989),“A New Interpretation of Schema Notation that Overturns the Bi-
nary Encoding Constraints,” In J.D.Schaffer (ed.),Proceedings of the Third International
Conference on Genetic Algorithms.Morgan Kaufmann.
[2] Baker,J.E.(1985),“Adaptive Selection Methods for Genetic Algorithms,” in J.J.Grefen-
stette (ed.),Proceedings of the First International Conference on Genetic Algorithms and
Their Appliactions.Erlbaum.
[3] B¨ack T.and F.Hoffmeister (1991),“Extended Selection Mechanisms in Genetic Algo-
rithms,” in R.k.Belew and L.B.Booker (eds.),Proceedings of the Fourth International
Conference on Genetic Algorithms.Morgan Kaufmann.
[4] Bethke,A.D.(1981),“Genetic Algorithms as Function Optimizers,” (Doctoral Dissertation,
University of Michigan).Dissertation Abstracts International,41 (9),3503B.(University
Microfilms No.8106101).
[5] De Jong,K.A.(1975),An Analysisof the Behavior of a Class of Genetic Adaptive Systems,”
(Doctoral disseration,University of Michigan).Disseratation Abstracts International,36
(10),5140B.(University Microfilms No.76-9381)
[6] de la Maza,M.and B.Tidor (1993),“An Analysis of Selection Procedures with Particular
Attention Paid to proportional and Boltzmann Selection,” in S.Forrest (ed.),Proceedings
of the Fifth International Conference on Genetic Algorithms.Morgan Kaufmann.
[7] Eshelman,L.J.,R.A.Caruana,and J.D.Schaffer (1989),“Biases in the Crossover Land-
scape,” in J.D.Schaffer (ed.),Proceedings of the Third International Conference on Genetic
Algorithms,Morgan Kaufmann.
[8] Fogel,L.J.,A.J.Owens,and M.J.Walsh (1966),Artificial Intelligen0ce through Simulated
[9] Forrest,S.(1985),“Scaling Fitness in the Genetic Algorithm,” in Documentation for
PRISONERS DILEMMA and NORMS ThatUse the Genetic Algorithm.Unpublished
[10] Goldberg.D.E.(1989),Genetic Algorithms in Search,Optimization,and Machine Learn-
[11] Goldberg.D.E.(1989a),“Messy Genetic Algorithms:Motivation,Analysis,and First
Results,” Complex Systems,Vol.3,pp.493-530.
[12] Goldberg,D.E.,K.Deb (1991),“A Comparitive Analysis of Selection Schemes Used in
Genetic Algorithms,” in G.Rawlins (ed.),Foundations of Genetic Algorithms.Morgan
[13] Goldberg,B.Korb,and D.E.,K.Deb (1989),“Messy Genetic Algorithms:Motivation,
Analysis,and The First Results,” Complex System,Vol.3,pp.493-530.
[14] Goldberg,D.E.,K.Deb,and B.Korb (1991),“Do not Worry,Be Messy,” Proceedings
of the Fourth International Conference on Genetic Algorithms,Belew,R.,and L.Booker
(eds.),Morgan Kaufmann Publishers,Los Altos,CA.pp.24-30.
[15] Hancock,P.J.B.(1994),“An Empirical Comparison of Selection Methods in Evolutionary
Algorithms,” in T.C.Fogarty (ed.),Evolutionary Computing:AISB Workshop,Leeds,
U.K.,April 1994,Selected Papers.Springer-Verlag.
[16] Holland,J.H.(1975),Adaptation in Natural and Artificial Systems.University of Michigan
Press.(Second edition:MIT Press,1992.)
[17] Holland,J.H.(1986),“Escaping Brittleness:The Possibilities of General-Purpose Learning
Algorithms Applied to Parallel Rule-Based Systems,” in R.S.Michalski,J.G.Carbonell,
and T.M.Mitchell (eds.),Machine learning II.Morgan Kaufmann.
[18] Janikow,C.Z.and Z.Michalewicz (1991),“An Experimental Comparison of Binary and
Floating Point Representations in Genetic Algorithms,” in R.K.Belew and L.B.Booker
(eds.),Proceedings of the Fourth International Conference on Genetic Algorithms.Morgan
[19] Michalewicz,Z.(1996),Genetic Algorithms + Data Structures = Evolution Programs.Third
edition.New York:Springer-Verlag.
[20] Mitchell,M.(1996),An Introduction to Genetic Algorithms.MIT Press.
[21] M¨uhlenbein,H.(1992),“How Genetic Algorithms Really Work:1.Mutation and Hill-
climbing,” in R.M¨anner and B.Manderick (eds.),Parallel Problem Solving from Nature 2.
[22] Rechenberg,I.(1965),“Cybernetic Solution Path of an Experimental Problem,” Ministry
of Aviation,Royal Aircraft Establishment (U.K.)
[23] Rechenberg,I.(1973),Evolutionsstrategie:Optimierung Technischer Systeme nach Prinzip-
ien der Biologischen Evolution.Frommann-Holzboog (Stuttgart).
[24] Spears,W.M.(1993),“Crossover or Mutation?,” in L.D.Whitely (ed.),Foundations of
Genetic Algorithms 2.Morgan Kaufmann.
[25] Wright,A.H.(1991),“Genetic Algorithms for Real Parameter Optimization,” In G.Rawl-
ins (ed.),Foundations of Genetic Algorithms.Morgan Kaufmann.