Graduate Course

An Introduction to Genetic Algorithms

Chia-Hsuan Yeh

The lecture note is based on Mitchell (1996) and Goldberg (1989)

Chapter 1:Genetic Algorithms:An Overview

1.Motivation:

• The goal of creating artiﬁcial intelligence and artiﬁcial life can be traced back to the

very beginnings of the computer age.The earliest computer scientists —Alan Turing,

John von Neumann,Norbert Wiener,and others — were motivated in large part by

visions of imbuing compute programs with intelligence,with life-like ability to self-

replicate,and with the adaptive capability to learn and to control their environments.

• These early pioneers of computer science were as much interested in biology and

psychology as in electronics,and they looked to natural systems as guiding metaphors

for how to achieve their visions.

• It should be no surprise,then,that from the earliest days computers were applied

not only to calculating missile trajectories and deciphering military codes but also to

modeling the brain,mimicking human learning,and simulating biological evolution.

• These biologically motivated computing activities have waxed and waned over the

years,but since the early 1980s they have all undergone a resurgence in the com-

putation research community.The ﬁrst has grown into ﬁled of neural networks,the

second into machine learning,and the third into what is now called “evolutionary

computation”,of which genetic algorithms are the most prominent examples.

2.A Brief History of Evolutionary Computation:

• In the 1950s and the 1960s several computer scientists independently studied evolu-

tionary systems with the idea that evolution could be used as optimization tools for

engineering problems.The idea in all these systems was to evolve a population of

candidate solutions to a given problem,using operators inspired by natural genetic

variation and natural selection.

• Evolutionary Strategies:

In 1960s,Rechenberg (1965,1973) introduced “evolution strategies”,a method he

used to optimize real-valued parameters for devices such as airfoils.

• Evolutionary Programming:

Fogel,Owens,and Walsh (1966) developed “evolutionary programming”,a technique

in which candidate solutions to given tasks were represented as ﬁnite-state machines,

which were evolved by randomly mutating their state-transition diagrams and select-

ing the ﬁttest.

1

• Genetic Algorithms:

Genetic algorithms (GAs) were developed by John Holland and his students and

colleagues at the University of Michigan in the 1960s and 1970s.In contrast with

evolutionary strategies and evolutionary programming,Holland’s original goal was

not to design algorithms to solve speciﬁc problems,but rather to formally study

phenomenon of adaptation as it occurs in nature and to develop ways in which the

mechanisms of natural adaptation might be imported into computer systems.

3.What Are Genetic Algorithms?

• Holland’s 1975 book:Adaptation in Natural and Artiﬁcial Systems

He presented genetic algorithm as an abstraction of biological evolution and gave a

theoretical framework for an adaptation under GA.

• Holland’s GA is a method for moving from one population of “chromosomes” (strings

of ones and zeros,or “bits”) to a new population by using a kind of “natural selection”

together with the genetics-inspired operators of crossover,mutation,and inversion.

• Genetic algorithms are search algorithms based on the mechanics of natural selection

and natural genetics.

• They combine survival of the ﬁttest among string structures with a structured

yet randomized information exchange to form a search algorithm with some of the

innovative ﬂair of human search.

• In every generation,a new set of artiﬁcial creatures (strings) is created using bits and

pieces of the ﬁttest of the old;an occasional new part is tried for good measure.

• Each chromosome (string) consists of “genes” (e.g.,bits),each gene being in instance

of a particular “allele” (e.g.,0 or 1).The selection operator chooses those chromo-

somes in the population that will be allowed to reproduce,and on average the ﬁtter

chromosome produce more oﬀspring than the less ﬁt ones.

• Crossover exchanges subparts of two chromosomes,roughly mimicking biological re-

combination between two single-chromosome (haploid) organisms.

• Mutation randomly changes the allele values of some locations in the chromosome.

• Inversion reverses the order of a contiguous section of the chromosome,thus rear-

ranging the order in which genes are arrayed.

• GAs eﬃciently exploit historical information to speculate on new search points with

expected improved performance.

• Purpose:

The goals of their research have been twofold:(1) to abstract and rigorously explain

the adaptive processes of natural systems,and (2) to design artiﬁcial systems software

that retains the important mechanisms of natural and artiﬁcial systems science.

• Genetic algorithms are theoretically and empirically proven to provide robust search

in complex spaces.

• These algorithms are computationally simple yet powerful in their search for improve-

ment.

• They are not fundamentally limited by restrictive assumptions about the search space

(assumptions concerning continuity,existence of derivatives,unimodality,and other

matters)

2

4.The Appeal of Evolution

• Many computational problems require searching through a huge number of possibility

for solutions.

• Such search problems can often beneﬁt from an eﬀective use of parallelism,in which

many diﬀerent possibility are explored simultaneously in an eﬀective way.

• What is needed is both computational parallelism (i.e.,many processors evaluating

sequences at the same time) and an intelligent strategy for choosing the next set of

sequences to evaluate.

• Many computational problems require a computer programs to be adaptive — to

continue to perform well in a changing environment.

• Many problems require computer programs to be innovative —to construct something

truly new and original,such as a new algorithm for accomplishing a computational

task or even a new speciﬁc discovery.

• Many computational problems require complex solutions that are diﬃcult to program

by hand.

• Many AI researchers believe that “rules” underlying intelligence are too complex for

scientists to encode by hand in a “top-down” fashion.Instead they believe that

the best route to artiﬁcial intelligence is through a “bottom-up” paradigm in which

humans write only very simple rules,and complex behaviors such as intelligence

emerge from the massively parallel application and interaction of these simple rules.

• Biological evolution is an appealing source of inspiration for addressing these prob-

lems.Evolution is,in eﬀect,a method of searching among an enormous number of

possibilities for“solutions”.

5.Robustness of Traditional Optimization and Search Methods

• The current literature identiﬁes three main types of search methods:

– Calculus-based

Calculus-based methods have been studied heavily.These subdivide into two

main classes:indirect and direct.Indirect methods seek local extrema by solv-

ing the usually nonlinear set of equations resulting from setting the gradient of

the objective function equal to zero.This is the multidimensional generalization

of the elementary calculus notion of extremal points.Given a smooth,uncon-

strained function,ﬁnding a possible peak starts by restricting search to those

points with slopes of zero in all directions.On the other hand,direct (search)

methods seek local optima by hopping on the function and moving in a direction

related to the local gradient.This is simply the notion of hill-climbing:to ﬁnd

the local best,climb the function in the steepest permissible direction.While

both of these calculus-based methods have been improved,extended,hashed,

and rehashed,some simple reasoning shows their lack of robustness.

∗ Both methods are local in scope,the optima they seek are the best in a

neighborhood of the current point.

∗ Once the lower peak is reached,further improvement must be sought through

random restart or other trickery.Calculus-based methods depend upon the

existence of derivatives (well-deﬁned slope values).Even if we allow numerical

approximation of derivatives,this is a severe shortcoming.

3

– Enumerative

Within a ﬁnite search space or a discretized inﬁnite search space,the search

algorithm starts looking at objective function values at every point in the space,

once at a time.

∗ Although the simplicity if this type of algorithm is attractive,and enumer-

ation is a very human kind of search,such schemes must ultimately be dis-

counted in the robustness race for one simple reason:lack of eﬃciency.

– Random

Random search algorithms have achieved increasing popularity as researchers

have recognized the shortcomings of calculus-based and enumerative schemes.

∗ Random walks and random schemes that search and save the best must also

be discounted because of the eﬃciency requirement.

∗ Random searches,in the long run,can be expected to do no better than

enumerative schemes.

• How Are Genetic Algorithms Diﬀerent from Traditional Methods?

– GAs work with a coding of the parameter set,not the parameters themselves.

– GAs search from a population of points,not a single point.

– GAs use payoﬀ (objective function) information,not derivatives or other auxiliary

knowledge.

– GAs use probabilistic transition rules,not deterministic rules.

• GAs search combine “exploitation” (deterministic search) and “exploration” (random

search).

6.Elements of Genetic Algorithms

• There is no rigorous deﬁnition of “genetic algorithm” accepted by all in the evo-

lutionary computation community that diﬀerentiates GAs from other evolutionary

computation methods.

• However,it can be said that most methods called “GAs” have at least the following

elements in common:

– population of chromosomes,

– selection according to ﬁtness (performance),

– crossover to produce new oﬀspring,

– random mutation of new oﬀspring.

• A Simple Genetic Algorithm:

(a) Start with a randomly generated population of a n l-bit chromosomes (candidate

solutions to a problem).

(b) Calculate the ﬁtness f(x) of each chromosome x in the population.

(c) Repeat the following steps until n oﬀspring have been created.

i.Select a pair of parent chromosomes from the current population,the prob-

ability of selection being an increasing function of ﬁtness.Selection is done

“with replacement”,meaning that the same chromosome can be selected more

than once to become a parent.

4

ii.With probability p

c

(the “crossover probability” or “crossover rate”),cross

over the pair at a randomly chosen point (chosen with uniform probability)

to form two oﬀspring.If no crossover takes place,form two oﬀspring that

exact copies of their respective parents.(Note that here the crossover rate

is deﬁned to be the probability that two parents will cross over in a single

point.There are also “multi-point crossover” versions of the GA in which

the crossover rate for a pair of parents is the number of points at which a

crossover takes place.)

iii.Mutate the two oﬀspring at each locus with probability p

m

(the mutation

probability or mutation rate),and place the resulting chromosomes in the

new population.If n is odd,one new population member can be discarded

at random.

(d) Replace the current population with the new population.

(e) Go to step (b).

7.The Basic Characteristics of Genetic Algorithms:

• Randomness plays a large role in the run of GAs,each run with diﬀerent random-

number seeds will generally produce diﬀerent detailed behaviors.

• GA researchers often report statistics (such as the best ﬁtness found in a run and the

generation at which the individual with that best ﬁtness was discovered) averaged

over many diﬀerent runs of the GA on the same problem.

• There are a number of details to ﬁll in,such as the size of the population and the prob-

abilities of crossover and mutation,and the success of the algorithm often depends

greatly on these details.

8.Some Applications of Genetic Algorithms:

• Optimization

• Automatic programming

• Machine learning

• Economics

• Immune systems

• Ecology

• Population genetics

• Evolution and learning

• Social systems

•........

9.Example:Optimization of a Simple Function

• Given a function,f(x) = xsin(10πx) +1.0,and it is drawn in Figure 1.

• The problem is to ﬁnd x from [-1,2] which maximizes the function f,i.e.,to ﬁnd x

0

such that f(x

0

) ≥ f(x),for all x ∈ [-1,2].

• Representation:

– We use a binary vector as a chromosome to represent real values variable x.

5

Figure 1:Graph of the function f(x) = xsin(10πx) +1.0

– The length of the vector depends on the required precision,for example,six

places after the decimal point.

– The domain of the variable x has length 3,this implies the range should be

divided into at least 3× 1000000 equal size ranges.

– This means that 22 bits are required as a binary vector because 2097152 =

2

21

< 3000000 ≤ 2

22

= 4194304.

– The mapping from a binary < b

21

,b

20

,...,b

0

> into a real number x from the

range [-1,2] is as follows:

∗ convert the binary string < b

21

,b

20

,...,b

0

> from the base 2 to base 10:

(< b

21

,b

20

,...,b

0

>)

2

= (Σ

21

i=0

b

i

2

i

)

10

= x

′

,

∗ ﬁnd a corresponding real number x:

x = −1.0 +x

′ 3

2

22

−1

where -1.0 is the left boundary of the domain and 3 is the length of the

domain.

• The evaluation function (ﬁtness function) for the binary vectors is equivalent to the

function f:

eval(v) = f(x),

where the chromosome v represents the real value x.

• Experiment result:v

max

=(1111001101000100000101),which corresponds to a value

x

max

= 1.850773.

10.Example:Using GAs to Evolve Strategies for the Prisoner’s Dilemma

• Asimple two-person game invented by Merrill Flood and Melvin Dresher in the 1950s.

• Two individuals (call them Alice and Bob) are arrested for committing a crime to-

gether and are held in separate cells,with no communication possible between them.

• Alice is oﬀered the following deal:

6

– If she confesses and agrees to testify against Bob,she will receive a suspended

sentence with probation,and Bob will be put away for 5 years.

– However,if at the same time Bob confesses and agrees to testify against Alice,

her testimony will be discredited,and each will receive 4 years for pleading guilty.

– Alice is told that Bob is being oﬀered precisely the same deal.

• Payoﬀ matrix:(Alice,Bob)

Bob

Cooperate

Defect

Alice

Cooperate

2,2

5,0

Defect

0,5

4,4

• Each player independently decides which move to make —i.e.,whether to cooperate

or defect.

• What is the best strategy to use in order to maximize one’s own payoﬀ?

– If you suspect that your opponent is going to cooperate,then you should surely

defect.

– If you suspect that your opponent is going to defect,then you should defect too.

• The dilemma is that if both players defect each gets a worse score than if they

cooperate.

• If the game is iterated (that is,if the two players play several games in a row),both

players’ always defecting will lead to a much lower total payoﬀ than the players would

get if they cooperated.

• How can reciprocal cooperation be induced?

• Robert Axelrod’s studies in the University of Michigan:

– Human Designed Strategies:

∗ He solicited strategies from researchers in a number of disciplines.

∗ Each participant submitted a computer program that implemented a partic-

ular strategy.

∗ These various programs played iterated games with each other.

∗ During each game,each program remembered what move (i.e.,cooperate

or defect) both it and its opponent had made in each of the three previous

games that they had played with each other,and its strategy was based on

this memory.

∗ The programs were paired in a round-robin tournament in which each played

with all the other programs over a number of games.

∗ Some of the strategies submitted were rather complicated,using techniques

such as Markov processes and Bayesian inference to model the other players

in order to determine the best move.

∗ However,the winner (the strategy with the highest average score) was the

simplest of the submitted strategies:TIT FOR TAT.

∗ TIT FOR TAT punishes that defection with a defection of its own,and

continues the punishment until the other player begins cooperating again.

– Genetic Algorithm Implementation (1):

∗ Axelord (1987) decided to see if a GA could evolve strategies to play this

game successfully.

7

∗ Representation of strategies:the memory of each player is one previous game.

∗ Representation of strategies:the memory of each player is three previous

games.

∗ The ﬁtness of a strategy:

· Axelord had found that eight of the human-generated strategies from the

tournament were representative of the entire set of strategies.

· The set of eight strategies (which did not include TIT FOR TAT) served

as the “environment” for the evolving strategies in the population.

· Each individual in the population played iterated games with each of the

eight ﬁxed strategies,and the individual’s ﬁtness was taken to be its average

score over all the games it played.

∗ Simulation result:most of the strategies that evolved were similar to TIT

FOR TAT.

∗ It would be wrong to conclude that the GA discovered strategies that are

“better” than any human-designed strategy.

∗ The performance of a strategy depends very much on its environment —that

is,on the strategies with which it is playing.

∗ Here the environment was ﬁxed — it consisted of eight human-designed

strategies that did not change over the course of a run.

∗ The resulting ﬁtness function is an example of a static ﬁtness landscape.

∗ It is not necessary true that these high-scoring strategies would also score

well in a diﬀerent environment.

– Genetic Algorithm Implementation (2):

∗ Axelord carried out another in which the ﬁtness of an individual was deter-

mined by allowing the individuals in the population to play with one another

rather than with the ﬁxed set of eight strategies.

∗ Then the environment changed from generation to generation because the

opponents themselves were evolving.

∗ The the ﬁtness landscape was not static.

∗ In the ﬁrst few generations,strategies that tended to cooperate did not ﬁnd

reciprocation among their fellow population members and thus tended to die

out.

∗ After about 10-20 generations,the trend started to reverse:the GA discov-

ered strategies that reciprocated cooperation and that punished defection

(i.e.,variants of TIT FOR TAT).

11.Example:Traveling Salesman Problem (TSP)

• What is TSP?

The traveling salesman must visit every city in his territory exactly once and then

return to the starting point.Given,the cost of travel between all cities,how should

he plan his itinerary for minimum total cost of the entire tour?

• The TSP is a problem in combinatorial optimization and arises in numerous applica-

tions.

• Representation:integer vector or binary string?

8

– In a binary representation of a n cities TSP,each city should be encoded as a

string of ⌈log

2

n⌉ bits;a chromosome is a string of n⌈log

2

n⌉ bits.

– A mutation or crossover can result in a sequence of cities,which is not a tour:

we can get the same city twice in a sequence.

– For a TSP with 20 cities (where we need 5 bits to represent a cities),some 5-bit

sequences (for example,10101) do not correspond to any city.

– If we use mutation and crossover operators as deﬁned earlier,we would need some

sort of a “repair algorithm”;such an algorithm would “repair” a chromosome,

moving it back into the search space.

• Integer representation:

– A vector v =< i

1

,i

2

,...,i

n

> represents a tour:from i

1

to i

2

,etc.,from i

n−1

to

i

n

and back to i

1

(v is a permutation of <12...n>).

– Then,given the cost of travel between all cities,we can easily calculate the total

cost of the entire tour.

– However,the crossover will produce the invalid strings (the same as the previous

case).

– Modiﬁed crossover:

∗ Given two parents,builds oﬀspring by choosing a subsequence of a tour from

one parent and preserving the relative order of cities from the other parent.

∗ For example,if the parents are

< 1,2,3,4,5,6,7,8,9,10,11,12 >

< 7,3,1,11,4,12,5,2,10,9,6,8 >

and the chosen part is < 4,5,6,7 >,then the resulting oﬀspring is

< 1,11,12,4,5,6,7,2,10,9,8,3 >

∗ The oﬀspring bears a structural relationship to both parents.The roles of

the parents can then be reversed in constructing a second oﬀspring.

9

Chapter 2:The Mathematical Foundations of Genetic Algorithms

1.Concept:

• The traditional theory of GAs (ﬁrst formulated in Holland 1975) assumes that,at a

very general level of description,GAs work by discovering,emphasizing,and recom-

bining good “building blocks” of solutions in a highly parallel fashion.

• The idea here is that good solutions tend to be made up of good building blocks —

combinations of bits values that confer higher ﬁtness on the strings in which they are

present.

2.Schemata:

• Holland (1975) introduced the notion of schemas (or schemata) to formalize the in-

formal notion of “building blocks”.

• A schema is a set of bit strings that can be described by a template made up of ones,

zeros,and asterisks,the asterisks representing wild cards (or “don’t care”).

• Consider strings to be constructed over the binary alphabet V = {0,1}.The l-bit

string may be represented symbolically as A = a

1

a

2

a

3

....a

l

,where a

i

∈ V.

• Consider V

′

= {0,1,∗},where * means “don’t care”.

• If H = ∗11 ∗ 0 ∗ ∗,then A = 0111000 is an example of the schema H because A

matches schema positions at the ﬁxed positions 2,3,and 5.

• In general,for alphabets of cardinality k,there are (k +1)

l

schemata.

• Order:

The order of a schema H,denoted by o(H),is simply the number of ﬁxed positions (in

a binary alphabet,the number of 1’s and 0’s) present in the template.For example,

the order of the schema 011*1** is 4 (symbolically,o(011 ∗ 1 ∗ ∗) = 4).

• Deﬁning length:

The deﬁning length of a schema H,denoted by δ(H),is simply the distance between

the ﬁrst and last speciﬁc string position.For example,the schema 011*1** has

deﬁning length δ = 4.The schema 0****** has deﬁning length δ = 0.

3.The eﬀect of reproduction on the expected number of schemata:

• Suppose at a given time step t,there are m examples of a particular schema H

contained within the population A(t) where we write m= m(H,t).

• During reproduction,a string is copied according to its ﬁtness (with probability

p

i

= f

i

/Σf

j

).

• After picking a nonoverlapping population of size n with replacement from the pop-

ulation A(t),we expected to have m(H,t + 1) representatives of the schema H in

the population at time t + 1,m(H,t + 1) = m(H,t)nf(H)/Σf

j

,where f(H) is the

average ﬁtness of the strings representing schema H at time t.

• Let

f = Σf

j

/n,then m(H,t +1) = m(H,t)f(H)/

f

• A particular schema grows as the ratio of the average ﬁtness of the schema to the

average ﬁtness of the population.

• Suppose a particular schema H remains above average an amount c

f with c a con-

stant,then m(H,t +1) = m(H,t)(

f +c

f)/

f = (1 +c)m(H,t)

10

• Starting at t = 0 and assuming a stationary value c,then we have m(H,t) =

m(H,0)(1 +c)

t

• Reproduction allocates exponentially increasing (decreasing) numbers of trials to

above- (below-) average schemata.

4.The inﬂuence of crossover on the schemata:

• Consider two schemata H

1

and H

2

,H

1

= ∗1 ∗ ∗ ∗ ∗0,H

2

= ∗ ∗ ∗10 ∗ ∗.Then,it is

clear that schema H

1

is less likely to survive crossover than schema H

2

.

• The survival probability under simple crossover is p

s

= 1 −δ(H)/(l −1).

• If crossover is itself performed by random choice,say with probability p

c

,at a par-

ticular mating,the survival probability may be given by p

s

≥ 1 −p

c

δ(H)/(l −1).

• The combined eﬀect of reproduction and crossover:

Assuming independence of the reproduction and crossover operators,then we have

m(H,t +1) ≥ m(H,t)

f(H)

f

1 −p

c

δ(H)

l −1

• Therefore,the schema H grows or decays depending on two things:(1) whether the

schema is above or below the population average,and (2) whether the schema has

relatively short or long deﬁning length.

5.The inﬂuence of mutation on the schemata:

• In order for schema H to survive,all the speciﬁed positions must themselves survive.

• A single allele survives with probability (1 −p

m

).

• A particular schema survives when each of the o(H) ﬁxed positions within the schema

survives.Therefore,the probability of surviving mutation is (1 −p

m

)

o(H)

.

• When p

m

<< 1,the probability may be approximated by 1 −o(H)p

m

.

• Therefore,a particular schema H receives an expected number of copies in the next

generation under reproduction,crossover,and mutation as given by the following

equation (ignoring small cross-product terms):

m(H,t +1) ≥ m(H,t)

f(H)

f

1 −p

c

δ(H)

l −1

−o(H)p

m

6.Schema Theorem (The Fundamental Theorem of Genetic Algorithms):

Short,low-order,above-average schemata receive exponentially increasing trials in subse-

quent generations.

7.An immediate result of this theorem is that GAs explore the search space by short,low-

order schemata which,subsequently,are used for information exchange during crossover.

8.Building Block Hypothesis:

A genetic algorithm seeks near-optimal performance through the juxtaposition of short,

lower-order,high-performance schemata,called the building blocks.

• During the last twenty years,many GAs applications were developed which supported

the building block hypothesis in many diﬀerent problem domains.

11

• Although some research has been done to prove this hypothesis,for most nontrivial

applications we rely mostly on empirical results.

• Bethke (1981) has generated a number of test cases that provable misleading the

simple three-operator genetic algorithm (we call these coding-function combinations

GA-deceptive).

• It suggests that functions and codings that are GA-deceptive tend to contain isolated

optima:the best points tend to be surrounded by the worst.

• Practically speaking,many of the functions encountered in the real world do not have

this needle-in-the-haystack quality;there is usually some regularity in the function-

coding combination —much like the regularity of one or more schemata —that may

be exploited by the recombination of building blocks.

• If the building blocks are misleading due to the coding used or function itself,the

problem may require long waiting times to arrive at near optimal solutions.

• Example:Violation of Building Block Hypothesis

We would like to have short,low-order building blocks lead to incorrect (suboptimal)

longer,higher-order building blocks.

– Suppose we have a set of four order-2 schemata over two deﬁning positions,each

schema associated with a ﬁtness value as follows:

* * * 0 * * * * * * 0 f

00

* * * 0 * * * * * * 1 f

01

* * * 1 * * * * * * 0 f

10

* * * 1 * * * * * * 1 f

11

k ← δ(H) → k

The ﬁtness values are schema averages,assumed to be constant with no variance

(this last restriction may be lifted without changing our conclusions as we only

consider expected performance).

– Assume that f

11

is the global optimum:f

11

> f

00

,f

11

> f

01

,f

11

> f

10

.

– We want a problem where one or both of the suboptimal,order-1 schemata are

better than the optimal,order-1 schemata.Mathematically,we want one or both

of the following conditions to hold:

f(0∗) > f(1∗)

f(∗0) > f(∗1)

In the expressions we have dropped consideration of all alleles other than the two

deﬁning positions,and the ﬁtness expression implies an average over all strings

contained within the speciﬁc similarity subset.

– Thus we would like the following two expressions to hold:

f(00) +f(01)

2

>

f(10) +f(11)

2

f(00) +f(10)

2

>

f(01) +f(11)

2

12

Figure 2:Sketch of Type I,minimal deceptive problem (MDP) f

01

> f

00

.

– Unfortunately,both expressions cannot hold simultaneously in the two-bit prob-

lem,and without loss of generality we assume that ﬁrst expression is true.

– Thus,the deceptive two-bit problem is speciﬁed by the globality condition (f

11

is the best) and one deception condition (we choose f(0∗) > f(1∗))

– To put the problem into closer perspective,we normalize all ﬁtness values with

respect to the ﬁtness of the complement of the global optimum as follows:

r =

f

11

f

00

,c =

f

01

f

00

,c

′

=

f

10

f

00

.

– We may rewrite the globality condition in normalized form:

r > c,r > 1,r > c

′

.

– We may also rewrite the deceptive condition in normalized form:

r < 1 +c −c

′

Because based on the deception condition,f(0∗) > f(1∗),we have f

00

> f

10

and

f

01

> f

11

,and then f

00

+f

01

> f

10

+f

11

,=⇒

=⇒

f

00

+f

01

f

00

>

f

10

+f

11

f

00

,=⇒ 1 +c > c

′

+r,=⇒ 1 +c −c

′

> r.

– From these conditions,we may conclude a number of interesting facts:

c

′

< 1,c

′

< c.

– From these,we recognize that there are two types of deceptive two-bit problem:

Type I:f

01

> f

00

(c > 1).

Type II:f

00

≥ f

01

(c ≤ 1).

Figure 2 and 3 are representative sketches of these problems where the ﬁtness is

graphed as a function of two boolean variables.

– Both ﬁgures are representative sketches of these problems where the ﬁtness is

graphed as a function of two boolean variables.Both cases are deceptive,and it

may be shown that neither case can be expressed as a linear combination of the

individual allele values:

f(x

1

x

2

) = b +

2

X

i=1

a

i

x

i

13

Figure 3:Sketch of Type II,minimal deceptive problem (MDP) f

00

> f

01

.

– It may be proved that no one-bit problemcan be deceptive,the deceptive,two-bit

problem is the smallest possible deceptive problem:it is the minimal,deceptive

problem (MDP).

• The approach proposed to deal with this problem:

– Assumes prior knowledge of the object function to code it in an appropriate way

(to get “tight” building blocks).For example,prior knowledge about the objec-

tive function,and consequently about the deception,might result in a diﬀerent

coding,where the bit required to optimize the function are adjacent.

– Employing a new operator:inversion:

It selects two points within a string and invert the order of bits between selected

points,but remembering the bit’s “meaning’.This means that we have to identify

bits in the strings:we do so by keeping bits together with a record of their original

positions.

∗ In Goldberg (1989a),

“Put another way,inversion is to orderings what mutation is to alleles:both

ﬁght the good ﬁght against search-stopping lack of diversity,but neither is

suﬃciently powerful to search for good structure,allelic or permutational,on

its own when good structures require epistatic interaction of the individual

parts.”

– Messy genetic algorithms (mGA):Goldberg (1989a),Goldberg,Korb,and Deb

(1989),and Goldberg,Deb,and Korb (1991):

The term “messy GA ” is meant to be contrasted with the standard “neat”

ﬁxed-length,ﬁxed-population-size GAs.The goal of messy GAs is to improve

the GA’s function-optimization performance by explicitly building up increasing

longer,highly ﬁt strings from well-tested shorter building blocks.The general

idea was biologically motivated:“After all,nature did not start with strings of

length 5.9×10

9

or even of length two million and try to make man.Instead,

simple life forms gave way to more complex life forms,with the building blocks

learned at earlier times used and reused to good eﬀect along way.” (Goldberg,

Korb,and Deb,1989,p.500)

14

Chapter 3:Computer Implementation of a Genetic Algorithm

1.Basic Elements for Running Genetic Algorithms:

Population size

N

Chromosome length

l

Maximum number of generation

G

Probability of reproduction

p

r

Probability of crossover

p

c

Probability of mutation

p

m

Selection mechanism

∗

Proportional selection

Fitness function

∗

Sum of square error

2.Implementation of The Genetic Operators:

• Sequential:

reproduction =⇒

p

c

crossover =⇒

p

m

mutation.

• Probabilistic:

reproduction,crossover,and mutation are performed based on the probability.

p

r

:the probability of the selected chromosome being reproduction

p

c

:the probability of the selected chromosome being crossed over

p

m

′:the probability of the selected chromosome being mutated

p

m

:the probability of each bit being mutated for the chromosome going to be mutated

3.How to determine the appropriate values of these parameters?

• There is no objective way to determine the value of these parameters.

• In De Jong’s (1975) study of genetic algorithms in function optimization,a series

of parametric studies across a ﬁve-function suite of problems suggested that good

GA performance requires the choice of a high crossover probability,a low mutation

probability (inversely proportional to the population size),and a moderate population

size.For example,p

c

=0.6,p

m

=0.0333,N=30.

• However,it is not always true when we apply GAs in other research areas,for example,

social adaptive systems.

4.Mapping Objective Functions to Fitness Form

• In many problems,the objective is more naturally stated as the minimization of some

cost function g(x) rather than the maximization of some utility or proﬁt function u(x).

• Even if the problem is naturally stated in maximization form,this alone does not

guarantee that the utility function will be nonnegative for all x as we require in

ﬁtness function.

• As a result,it is often necessary to map the underlying natural object function to a

ﬁtness function form through one or more mappings.

• The duality of cost minimization and proﬁt maximization is well known.In normal

operations research work,to transform a minimization to a maximization problem

15

we simply multiply the cost function by -1.However,we usually need the following

transform function to guarantee the positive ﬁtness values,

f(x) =

(

C

max

−g(x),when g(x) < C

max

,

0,otherwise.

• Of course,there are a variety of ways to choose the coeﬃcient C

max

.C

max

may be

taken as an input coeﬃcient,as the largest g value observed thus far,as the largest

value in the current population,or the largest of the last k generations.Perhaps more

appropriately,C

max

should vary depending on the population variance.

• When the natural objective function formulation is a proﬁt or utility function,we

may still have a problem with negative utility function u(x) values.Therefore,we

simply transform ﬁtness according to the equation:

f(x) =

(

u(x) +C

min

,when u(x) +C

min

> 0,

0,otherwise.

• We may choose C

min

as an input coeﬃcient,as the absolute value of the worst u

value in the current or last k generations,or as a function of the population variance.

5.Fitness Scaling

• Problem 1:

– Regulation of the number of copies is especially important in small population

genetic algorithms.

– At the start of GA runs it is common to have a few extraordinary individuals in

a population of mediocre colleagues.

– If let to the normal selection rule (p

i

= f

i

/Σf

j

),the extraordinary individu-

als would take over a signiﬁcant proportion of the ﬁnite population in a single

generation,and this is undesirable,a leading cause of premature convergence.

• Problem 2:

– Late in a run,there may still be signiﬁcant diversity within the population;

however,the population average ﬁtness may be close to the population best

ﬁtness.If this situation is left alone,average members and best members get

nearly the same number of copies in future generations,and the survival of the

ﬁttest necessary for improvement becomes a random walk among the mediocre.

• In both cases,at the beginning of the run and as the run matures,ﬁtness scaling can

help.

• One useful scaling procedure is linear scaling:f

′

= af +b.

– f:raw ﬁtness.

– f

′

:scaled ﬁtness.

– a,b:constant coeﬃcients.

• In this way,simple scaling helps prevent the early domination of extraordinary indi-

viduals,while it later on encourages a healthy competition among near equals.

6.Codings

16

• The basic principles for choosing a GA codings:

– Principle of meaningful building blocks:

The user should select a coding so that short,low-order schemata are relevant

to the underlying problem and relatively unrelated to schemata over other ﬁxed

positions.

– Principle of minimal alphabets:

The user should select the smallest alphabet that permits a natural expression

of the problem.

• Illustration:

Comparison of Binary and Nonbinary String Population

Binary String Value X Nonbinary String Fitness

01101 13 N 169

11000 24 Y 576

01000 8 I 64

10011 19 T 361

Binary and Nonbinary Coding Correspondence

Binary Nonbinary

00000 A

00001 B

..

..

..

11001 Z

11010 1

11011 2

..

..

..

11111 6

• What is the inﬂuence for the diﬀerent codings?

– The diﬀerent alphabet cardinalities require diﬀerent string length.For equality

of the number of points in each space,we require 2

l

= k

l

′

,where l is the binary

code string length and l

′

is the nonbinary code string length.

– The number of schemata for each coding may then be calculated using the re-

spective string length:3

l

in the binary case and (k +1)

l

′

in the nonbinary case.

– It is easy to show that the binary alphabet oﬀers the maximum number of

schemata per bit of information of any coding.

– Since these similarities are the essence of our search,when we design a code we

should maximize the number of them available for the GA to exploit.

17

7.Encoding a Problem for a Genetic Algorithm:

Most GA applications use ﬁx-length,ﬁx-order bit strings to encode candidate solutions.

However,in recent years,there have been many experiments with other kinds of encodings.

• Binary Encodings

– The most common encodings for a number of reasons:

∗ In the early work,Holland and his students concentrated on such encodings

and GA practice has tended to follow this lead.

∗ Much of the existing GA theory is based on the assumption of ﬁx-length,

ﬁx-order binary encodings.

∗ Much of that theory can be extended to apply to nonbinary encodings,but

such extensions are not as well developed as the original theory.

∗ Heuristics about appropriate parameter settings (e.g.,for crossover and muta-

tion rates) have generally been developed in the context of binary encodings.

– However,binary encodings are unnatural and unwieldy for many problems (e.g.,

evolving weights for neural networks or evolving condition sets in the manner of

Meyer and Packard),and they are prone to rather arbitrary orderings.

• Many-Character and Real-Valued Encodings

– For many applications,it is most natural to use an alphabet of many characters

or real numbers to form chromosomes.

– Examples include Kitano’s many-character representation for graph-generation

grammars,Meyer and Packard’s real-valued representation for condition sets,

Montana and Davis’s real-valued representation for neural-network weights,and

Schultz-Kremer’s real-valued representation for torsion angles in proteins.

– Holland’s schemata-counting argument seems to imply that GAs should exhibit

worse performance on multiple-character encodings than on binary encodings.

However,this has been questioned by some.(e.g.,Antonisse,1989)

– Several empirical comparisons between binary encodings and multiple-character

or real-valued encodings have shown better performance for the latter.(e.g.,

Janikow and Michalewicz,1991;Wright,1991)

– But the performance depends very much on the problem and the details of the

GA being used,and at present there are no rigorous guidelines for predicting

which encoding will work best.

• Adapting the Encoding

– Choosing a ﬁxed encoding ahead of time presents a paradox to the potential GA

user:for any problemthat one would want to use a GA,one doesn’t know enough

about the problem ahead of time to come up with the best encoding for the GA.

– In fact,coming up with the best encoding is almost tantamount to solving the

problem itself!

– We either have no idea how best to order the bits ahead of time for this problem.

This is known in the GA literature as the “linkage problem” — one wants to

have functionally related loci be more likely to stay together on the string under

crossover,but it is not clear how this is to be done without knowing ahead of

time which loci are important in useful schemata.

– A second reason for adapting the encoding is that a ﬁx-length representation

limits the complexity of the candidate solutions.

18

– Messy GAs

8.Selection Methods:

The purpose of selection is to emphasize the ﬁtter individuals in the population in hopes

that their oﬀspring will in turn have even higher ﬁtness.Selection has to be balanced

with variation fromcrossover and mutation (the “exploitation/exploration balance”).Too-

strong selection means that suboptimal highly ﬁt individuals will take over the population,

reducing the diversity needed for further change and progress.Too-weak selection will

result in too-slow evolution.

• Fitness-Proportionate Selection with “Roulette Wheel”.

– Under ﬁtness-proportionate selection,they and their descendants will multiply

quickly in the population,in eﬀect preventing the GA from doing any further

exploration.

– This is known as “premature convergence”.

– In other words,ﬁtness-proportionate selection often puts too much emphasis on

“exploitation” of highly ﬁt strings at the expense of exploration of other regions

of the search space.

• Sigma Scaling

– Mapping “raw” ﬁtness values to expected values so as to make the GA less

susceptible to premature convergence.

– Forrest (1985) proposed sigma scaling,which keep the selection pressure relatively

constant over the course of the run rather than depending on the ﬁtness variances

in the population.

– Under sigma scaling,an individual’s expected value is a function of its ﬁtness,

the population mean,and the population standard deviation.For example,

ExpV al(i,t) =

1 +

f(i)−

f(t)

2σ(t)

,if σ(t) 6= 0,

1.0,if σ(t) = 0.

where ExpV al(i,t) is the expected value of individual i at time t,f(i) is the

ﬁtness of i,

f(t) is the mean ﬁtness of the population at time t,and σ(t) is the

standard deviation of the population ﬁtnesses at time t.

– At the beginning of a run,when the standard deviation of ﬁtnesses is typical

high,the ﬁtter individuals will not be many standard deviations above the mean,

and so they will not be allocated the lion’s share of oﬀspring.

– Likewise,later in the run,when the population is typically more converged and

the standard deviation is typically lower,the ﬁtter individuals will stand out

more,allowing evolution to continue.

• Elitism

– “Elitism”,ﬁrst introduced by Kenneth De Jong (1975),is an addition to many se-

lection methods that forces the GA to retain some number of the best individuals

at each generation.

– Such individuals can be lost if they are not selected to reproduce or if they are

destroyed by crossover or mutation.

– Many researchers have found that elitism signiﬁcantly improves the GA’s perfor-

mance.

19

• Boltzmann Selection

– Sigma scaling keeps the selection pressure more constant over a run.But often

diﬀerent amounts of selection pressure are needed at diﬀerent times in a run –

for example,early on it might be good to be liberal,allowing less ﬁt individuals

to reproduce at close to the rate of ﬁtter individuals,and having selection occur

slowly while maintaining a lot of variation in the population.

– Later it might be good to have selection be stronger in order to strongly emphasize

highly ﬁt individuals,assuming that the early diversity with slow selection has

allowed the population to ﬁnd the right part of the search space.

– A typical implementation is to assign to each individual i an expected value,

ExpV al(i,t) =

e

f(i)/T

[e

f(i)/T

]

t

.

where T is temperature and [ ]

t

denotes the average over the population at time

t.

– Experimenting with this formula will show that,as T decreases,the diﬀerence in

ExpV al(i,t) between high and low ﬁtnesses increases.

– The desire is to have this happen gradually over the course of the search,so

temperature is gradually decreased according to a predeﬁned schedule.

• Rank Selection

– Rank selection is an alternate method whose purpose is also to prevent too-quick

convergence.

– In the version proposed by Baker (1985),the individuals in the population are

ranked according to ﬁtness,and the expected value of each individual depends

on its rank rather than on its absolute ﬁtness.

– There is no need to scale ﬁtnesses in this case,since absolute diﬀerences in ﬁtness

are obscured.

– This discarding of absolute ﬁtness information can have advantages (using abso-

lute ﬁtness can lead to convergence problems) and disadvantages (in some cases

it might be important to know that one individual is far ﬁtter than its nearest

competitor).

– Ranking avoids giving the far largest share of oﬀspring to a small group of highly

ﬁt individuals,and thus reduces the selection pressure when the ﬁtness variance

is high.

– It also keeps up selection pressure when the ﬁtness variance is low.

– The linear ranking method proposed by Baker is as follows:

∗ Each individual is the population is ranked in increasing order of ﬁtness,from

1 to N.

∗ The user chooses the expected value Max of the individual with rank N,

with Max ≥ 0.

∗ The expected value of each individual i in the population at time t is given

by

ExpV al(i,t) = Min +(Max −Min)

rank(i,t) −1

N −1

where Min is the expected value of the individual with rank 1.

20

∗ Given the constraints Max ≥ 0 and Σ

i

ExpV al(i,t) = N (since population

size stays constant from generation to generation),it is required that 1 ≤

Max ≤ 2 and Min = 2 −Max.

– Rank selection has a possible disadvantage:slowing down selection pressure

means that GA will in some cases be slower in ﬁnding highly ﬁt individuals.

– However,in many cases the increased preservation of diversity that results from

ranking leads to more successful search than the quick convergence that can result

from ﬁtness proportionate selection.

– A variation of rank selection with elitism was used by Meyer and Packard for

evolving condition sets,Mitchell and his colleagues used a similar scheme for

evolving cellular automata.In those examples the population was rank by ﬁtness

and the top E strings were selected to be parents.The N −E oﬀsprings were

merged with the E parents to create the next population.This is a form of

the so-called (µ + λ) strategy used in the evolving strategies community.This

method can be useful in cases where the ﬁtness function is noisy (i.e.,is a random

variable);the best individuals are retained so that they can be tested again and

thus,over time,gain increasingly reliable ﬁtness estimates.

• Tournament Selection

– Tournament selection is similar to rank selection in terms of selection pressure,

but it is computationally more eﬃcient and more amenable to parallel implemen-

tation.

– Two individuals are chosen at random from the population.

– A random number r is then chosen between 0 and 1.

– If r < k (where k is a parameter,for example 0.75),the ﬁtter of the two individ-

uals is selected to be a parent;otherwise the less ﬁt individual is selected.

– The two are then returned to the original population and can be selected again.

– A more detailed description,please refer to Goldberg and Deb (1991).

• Steady-State Selection

– Most GAs described in the literature have been “generational” — at each gen-

eration the new population consists entirely of oﬀspring formed by parents in

the previous generation (though some of these oﬀspring may be identical to their

parents).

– In some schemes,such as elitist schemes,successive generations overlap to some

degree —some portion of the previous generation is retained in the new genera-

tion.

– The fraction of newindividuals at each generation has been called the “generation

gap”.(De Jong,1975)

– In steady-state selection,only a few individuals are replaced in each generation:

usually a small number of the least ﬁt individuals are replaced by oﬀspring re-

sulting from crossover and mutation of the ﬁttest individuals.

– Steady-state GAs are often used in evolving rule-based systems (e.g.,classiﬁer

systems;see Holland 1986) in which incremental learning (and remembering what

has already been learned) is important and in which members of the population

collectively (rather than individually) solve the problem at hand.

21

• For more technical comparisons of diﬀerent selection methods,see Goldberg and Deb

(1991),B¨ack and Hoﬀmeister (1991),de la Maza and Tidor (1993),and Hancock

(1994).

9.Advanced Topic in Crossover:

• The usefulness of crossover is to recombine building blocks (schemata) on diﬀerent

strings.

• Single-point crossover has some shortcomings.It cannot combine all possible schemata.

For example,it cannot in general combine instances 11*****1 and ****11** to form

an instance of 11**11*1.

• Likewise,schemata with long deﬁning lengths are likely to be destroyed under single-

point crossover.The schemata that can be created or destroyed by a crossover depend

strongly on the location of the bits in the chromosome.

• Single-point crossover assumes that short,low-order schemata are the functional

building blocks of strings,but one generally does not know in advance what ordering

of bits will group functionally related bits together.

• Eshelman,Caruana,and Schaﬀer (1989) pointed out that there may not be any way

to put all functionally related bits close together on a string,since particular bits

might be crucial in more than one schema.

• They pointed out further that the tendency of single-point crossover to keep short

schemata intact can lead to the preservation of hitchhikers — bits that are not part

of a desired schema but which,by being close on the string,hitchhike along with the

beneﬁcial schema as it reproduces.

• Many people have also noted that single-point crossover treats some loci preferen-

tially:the segments exchanged between the two parents always contain the endpoints

of the strings.

• Two-point crossover.

• Parameterized uniform crossover:

An exchange happens at each bit position with probability p (typically 0.5 ≤ p ≤0.8).

However,this lack of positional bias can prevent coadapted alleles from ever forming

in the population,since parameterized uniform crossover can be highly disruptive of

any schema.

• Given these arguments,which one should we use?There is no simple answer.The

success or failure of a particular crossover operator depends in complicated ways on

the particular ﬁtness function,encoding,and other details of the GA.It is still a very

important open problem to fully understand these interactions.

• It is common in recent GA applications to use either two-point crossover or parame-

terized uniform crossover with p ≈ 0.7 −0.8.

10.Advanced Topic in Mutation:

• A common view in GA community,dating back to Holland (1975),is that crossover

is the major instrument of variation and innovation in GAs,with mutation insuring

the population against permanent ﬁxation at any particular locus and thus playing

more of a background role.

22

• However,the appreciation of the role of mutation is growing as the GA community

attempts to understand how GAs solve complex problems.

• Spears (1993) formally veriﬁed the intuitive idea that,while mutation and crossover

have the same ability for “disruption” of existing schemata,crossover is a more robust

“constructor” of new schemata.

• M¨uhlenbein (1992,p.15),on the other hand,argues that in many cases a hill-climbing

strategy will work better than a GA with crossover and that the “power of mutation

has been underestimated in traditional genetic algorithms”.

23

Chapter 4:Introduction to Genetic-Based Machine Learning

1.This topic is based on Goldberg (1989).

2.Introduction

• A classiﬁer system is a learning system that learns syntactically simple string rules

(called classiﬁers) to guild its performance in an arbitrary environment.

• A classiﬁer system consists of three main components:

– Rule and message system

– Apportionment of credit system

– Genetic algorithms

• The rule and message system of a classiﬁer system is a special kind of production

system.A production system is a computational scheme that uses rules as its only

algorithmic device.The rules are generally of the following form:

if < condition > then < action >.

• At ﬁrst glance,the restriction to such a simple device for the representation of knowl-

edge might seem too constraining.Yet it has been shown that production systems

are computationally complete (Minsky,1967).A single rule or small set of rules can

represent a complex set of thoughts compactly.

• Traditional rule-based systems have been less frequently suggested in situations in

need of learning.One of the main obstacles to learning has been complex rule syntax.

• Classiﬁer systems depart from the mainstream by restricting a rule to a ﬁx-length

representation.This restriction has two beneﬁts.First,all strings under the per-

missible alphabet are syntactically meaningful.Second,a ﬁxed string representation

permits string operators of the genetic kind.This leaves the door propped open,

ready for a genetic algorithm search of the space of permissible rules.

3.Rule and Message System:

• A schematic depicting the rule and message system,the apportionment of credit

system,and genetic algorithm is shown in Figure 1.

• The rule and message system form the computational backbone.Information ﬂows

from the environment through the detectors — the classiﬁer system’s eyes and ears

— where it is decoded to one or more ﬁnite length messages.These environmental

messages are posted to a ﬁnite-length message list where the messages may then

activate string rules called classiﬁers.

• When activated,a classiﬁer posts a message to the message list.These messages may

then invoke other classiﬁers or they may cause an action to be taken through the

system’s action triggers called eﬀectors.

• In this way classiﬁers combine environmental cues and internal thoughts to determine

what the system should do and think next.

• A message within a classiﬁer system is simple a ﬁnite-length string over some ﬁnite

alphabet.If we limit ourselves to a binary alphabet we obtain the following deﬁnition:

< message >::= {0,1}

l

where l means the length of the string.

24

Figure 1:A Learning Classiﬁer System Interacts with Its Environment

• The condition is a simple pattern recognition device where * is added to the underlying

alphabet:

< condition >::= {0,1,∗}

l

• Therefore,a classiﬁer is a production rule with excruciatingly simply syntax:

< classifier >::=< condition >:< message >

• Once a classiﬁer’s condition is matched,that classiﬁer becomes a candidate to post its

message to the message list on the next time step.Whether the candidate classiﬁer

posts its message is determined by the outcome of an activation auction,which in

turn depends on the evaluation of a classiﬁer’s value or weighting.

• An example:

Suppose we have a classiﬁer store consisting the classiﬁers shown in the following

table,

Four Classiﬁers

Index Classiﬁer

1 01**:0000

2 00*0:1100

3 11**:1000

4 **00:0001

– At the ﬁrst time step,an environment message 0111 appears on the message list.

– This message matches classiﬁer 1,which then posts its message,0000.

– This message matches rules 2 and 4,which in turn post their messages (1100 and

0001).

25

– Message 1100 then matches classiﬁer 3 and 4.Thereafter the message sent by

classiﬁer 3,1000 then matches classiﬁer 4 and the process terminates.

4.Apportionment of Credit Algorithm:The Bucket Brigade

• Many classiﬁer systems attempt to rank or rate individual classiﬁers according to a

classiﬁer’s role in achieving reward from the environment.

• The most prevalent method incorporates what Holland has called a bucket brigade

algorithm.The bucket brigade may most easily be viewed as an information economy

where the right to trade information is bought and sold by classiﬁers.

• This service economy contains two main components:an auction and a clearinghouse.

When classiﬁer are matched they do not directly post their message.Instead,having

its condition matched qualiﬁes a classiﬁer to participate in an activation auction.

• To participate in the auction,each classiﬁer maintains a record of its net worth,called

its strength S.Each matched classiﬁer makes a bid B proportional to its strength.In

this way,rules that highly ﬁt are given preference over other rules.

• The auction permits appropriate classiﬁers to be selected to post their messages.

Once a classiﬁer is selected for activation,it must clear its payment through the

clearinghouse,paying its bid to other classiﬁers for matching messages rendered.

• A matched and activated classiﬁer sends its bid B to those classiﬁers responsible for

sending the messages that matched the bidding classiﬁer’s condition.

• The bid payment is divided in some manner among the matching classiﬁers.The

division of payoﬀ among contributing classiﬁers helps ensure the formation of an

appropriately sized subpopulation of rules.Thus diﬀerent types of rules can cover

diﬀerent types of behavioral requirements without undue interspecies competition.

• In a rule-learning system of any consequence,we cannot search for one master rule.

We must instead search for a coadapted set of rules that together cover a range of

behavior that provides ample payoﬀ to the learning system.

5.A detailed auction and payment scheme:

• Classiﬁers make bids (B

i

) during the auction.Winning classiﬁers turn over their bids

to the clearinghouse as payments (P

i

).A classiﬁer may also have receipts R

i

from its

previous message-sending activity or from environment reward.In addition to bids

and receipts,a classiﬁer may be subject to one or more taxes T

i

.Taken together,

we may write an equation governing the deletion or accretion of the ith classiﬁer’s

strength as follows:

S

i

(t +1) = S

i

(t) −P

i

(t) −T

i

(t) +R

i

(t)

• A classiﬁer bids in proportion to its strength:

B

i

= C

bid

S

i

where C

bid

is the bid coeﬃcient,S is strength,and i is classiﬁer index.

• We hold auction in the presence of random noise.We calculate an eﬀective bid (EB):

EB

i

= B

i

+N(σ

bid

)

where the noise N is a function of the speciﬁed bidding noise standard deviation σ

bid

.

26

• Each classiﬁer is taxed to prevent freeloading,we simply collect a tax proportional

to the classiﬁer’s strength:

T

i

= C

tax

S

i

• Therefore,the apportionment of credit algorithmfor an active classiﬁer can be rewrote

as:

S

i

(t +1) = S

i

(t) −C

bid

S(t) −C

tax

S(t) +R

i

(t)

• Let K = C

bid

+C

tax

,and 0 ≤ K ≤ 1.We have

S(n) = (1 −K)

n

S(0) +Σ

n−1

j=0

R(j)(1 −K)

n−j−1

• To investigate the eﬀect of this mechanism further,we examine the steady-state

response.If the process continues indeﬁnitely with a constant receipt R(t) = R

ss

,we

obtain the steady-state strength by setting S(t +1) = S(t) = S

ss

.Then we have

S

ss

= R

ss

/K.

• The steady bid may be derived as follows:

B

ss

=

C

bid

K

R

ss

=

C

bid

C

bid

+C

tax

R

ss

• Since C

tax

is usually small with respect to the bid coeﬃcient,the steady bid value

usually approaches the steady receipt value,B

ss

≈ R

ss

.In other words,for steady

receipts,the bid value approaches the receipt.

6.An example:See Figure 2.

7.Genetic Algorithm

• The bucket brigade provides a clean procedure for evaluating rules and deciding

among competing alternatives.Yet we still must devise a way of injecting new,

possible better rules into the system.This is precisely where the genetic algorithm

steps in.

• However,we must be a little less cavalier about wanton replacement of the entire

population,and we must pay more attention to who replaces whom.

• Here,we deﬁne a quantity called the selection proportion where we replace that

proportion of the population at a given genetic algorithm invocation.We also deﬁne

a quantity called the GA period,T

GA

,that speciﬁes the number of time steps (rule

and message cycles) between GA calls.This period may be treated deterministically

or stochastically..Additionally,the invocation of genetic algorithm learning may be

conditioned on particular events such as lack of a match or poor performance.

27

Figure 2:A Simple Classiﬁer System by Hand–Matching and Payments

28

References

[1] Antonisse,J.(1989),“A New Interpretation of Schema Notation that Overturns the Bi-

nary Encoding Constraints,” In J.D.Schaﬀer (ed.),Proceedings of the Third International

Conference on Genetic Algorithms.Morgan Kaufmann.

[2] Baker,J.E.(1985),“Adaptive Selection Methods for Genetic Algorithms,” in J.J.Grefen-

stette (ed.),Proceedings of the First International Conference on Genetic Algorithms and

Their Appliactions.Erlbaum.

[3] B¨ack T.and F.Hoﬀmeister (1991),“Extended Selection Mechanisms in Genetic Algo-

rithms,” in R.k.Belew and L.B.Booker (eds.),Proceedings of the Fourth International

Conference on Genetic Algorithms.Morgan Kaufmann.

[4] Bethke,A.D.(1981),“Genetic Algorithms as Function Optimizers,” (Doctoral Dissertation,

University of Michigan).Dissertation Abstracts International,41 (9),3503B.(University

Microﬁlms No.8106101).

[5] De Jong,K.A.(1975),An Analysisof the Behavior of a Class of Genetic Adaptive Systems,”

(Doctoral disseration,University of Michigan).Disseratation Abstracts International,36

(10),5140B.(University Microﬁlms No.76-9381)

[6] de la Maza,M.and B.Tidor (1993),“An Analysis of Selection Procedures with Particular

Attention Paid to proportional and Boltzmann Selection,” in S.Forrest (ed.),Proceedings

of the Fifth International Conference on Genetic Algorithms.Morgan Kaufmann.

[7] Eshelman,L.J.,R.A.Caruana,and J.D.Schaﬀer (1989),“Biases in the Crossover Land-

scape,” in J.D.Schaﬀer (ed.),Proceedings of the Third International Conference on Genetic

Algorithms,Morgan Kaufmann.

[8] Fogel,L.J.,A.J.Owens,and M.J.Walsh (1966),Artiﬁcial Intelligen0ce through Simulated

Evolution.Wiley.

[9] Forrest,S.(1985),“Scaling Fitness in the Genetic Algorithm,” in Documentation for

PRISONERS DILEMMA and NORMS ThatUse the Genetic Algorithm.Unpublished

manuscript.

[10] Goldberg.D.E.(1989),Genetic Algorithms in Search,Optimization,and Machine Learn-

ing,Reading,MA:Addison-Wesley.

[11] Goldberg.D.E.(1989a),“Messy Genetic Algorithms:Motivation,Analysis,and First

Results,” Complex Systems,Vol.3,pp.493-530.

[12] Goldberg,D.E.,K.Deb (1991),“A Comparitive Analysis of Selection Schemes Used in

Genetic Algorithms,” in G.Rawlins (ed.),Foundations of Genetic Algorithms.Morgan

Kaufmann.

[13] Goldberg,B.Korb,and D.E.,K.Deb (1989),“Messy Genetic Algorithms:Motivation,

Analysis,and The First Results,” Complex System,Vol.3,pp.493-530.

[14] Goldberg,D.E.,K.Deb,and B.Korb (1991),“Do not Worry,Be Messy,” Proceedings

of the Fourth International Conference on Genetic Algorithms,Belew,R.,and L.Booker

(eds.),Morgan Kaufmann Publishers,Los Altos,CA.pp.24-30.

29

[15] Hancock,P.J.B.(1994),“An Empirical Comparison of Selection Methods in Evolutionary

Algorithms,” in T.C.Fogarty (ed.),Evolutionary Computing:AISB Workshop,Leeds,

U.K.,April 1994,Selected Papers.Springer-Verlag.

[16] Holland,J.H.(1975),Adaptation in Natural and Artiﬁcial Systems.University of Michigan

Press.(Second edition:MIT Press,1992.)

[17] Holland,J.H.(1986),“Escaping Brittleness:The Possibilities of General-Purpose Learning

Algorithms Applied to Parallel Rule-Based Systems,” in R.S.Michalski,J.G.Carbonell,

and T.M.Mitchell (eds.),Machine learning II.Morgan Kaufmann.

[18] Janikow,C.Z.and Z.Michalewicz (1991),“An Experimental Comparison of Binary and

Floating Point Representations in Genetic Algorithms,” in R.K.Belew and L.B.Booker

(eds.),Proceedings of the Fourth International Conference on Genetic Algorithms.Morgan

Kaufmann.

[19] Michalewicz,Z.(1996),Genetic Algorithms + Data Structures = Evolution Programs.Third

edition.New York:Springer-Verlag.

[20] Mitchell,M.(1996),An Introduction to Genetic Algorithms.MIT Press.

[21] M¨uhlenbein,H.(1992),“How Genetic Algorithms Really Work:1.Mutation and Hill-

climbing,” in R.M¨anner and B.Manderick (eds.),Parallel Problem Solving from Nature 2.

North-Holland.

[22] Rechenberg,I.(1965),“Cybernetic Solution Path of an Experimental Problem,” Ministry

of Aviation,Royal Aircraft Establishment (U.K.)

[23] Rechenberg,I.(1973),Evolutionsstrategie:Optimierung Technischer Systeme nach Prinzip-

ien der Biologischen Evolution.Frommann-Holzboog (Stuttgart).

[24] Spears,W.M.(1993),“Crossover or Mutation?,” in L.D.Whitely (ed.),Foundations of

Genetic Algorithms 2.Morgan Kaufmann.

[25] Wright,A.H.(1991),“Genetic Algorithms for Real Parameter Optimization,” In G.Rawl-

ins (ed.),Foundations of Genetic Algorithms.Morgan Kaufmann.

30

## Σχόλια 0

Συνδεθείτε για να κοινοποιήσετε σχόλιο