Theory of Genetic Algorithms
how do GAs work?
•
biological motivation
–
general idea but no precise explanation
•
suitable for scientific study
–
simplified genetics
–
fully controllable
–
rapid experimentation
•
theory could improve applications
application issues
•
robustness
•
flexibility
•
parameter tuning
•
large number of design choices
•
known performance problems
–
information loss
–
premature convergence
–
deception
the linkage problem
•
not all problems are like the Ones Problem
•
variables can interact
–
Royal Road problems
–
chemotherapy application
•
problem structure
–
unknown interactions between variables
–
the linkage graph
•
graphical model
linkage and encoding
•
string encoding has a natural sequence
•
neighbouring alleles may not be linked
•
genetic operators should preserve linkage
•
crossover can be very disruptive
•
linked alleles should be close together
•
can encoding be changed to respect
linkage?
–
Mitchell Section 5.3
inversion
•
Holland 1975, Goldberg 1989
•
re

order alleles by inversion
–
sections of the chromosome reversed
•
searches alternative encodings
•
motivated by a similar phenomenon in
genetics
•
not found to be widely useful after
investigation
inversion
0 0 0 1 0 1 0 1
encode positions and values, e.g.
{(1,0),(2,0) ,(3,0) ,(4,1), (5,0)
,(6,1) ,(7,0) ,(8,1)}
same as
{(1,0),(2,0) ,(6,1) , (5,0), (4,1)
,(3,0),(7,0) ,(8,1)}
crossover needs to be specialised
crossover hotspots
•
Shaffer and Morishima (1987)
•
limit where crossover can occur
•
may avoid disrupting linkage
•
does not change ordering of original
encoding
•
showed some improvement on test
problems
•
not widely applied
crossover hotspots
Parents
1 0 0 1
!
1 1 1
!
1
0 0 0 0 0 0
!
0 0
Children
1 0 0 1
!
0 0
!
1
!
0
0 0 0 0 1 1 0 1
messy GAs
•
build up a good solution from good pieces
•
variable length strings
–
strings of bits and positions
–
“ideas” as to what bits can go in what position
–
multiple positions for the same bit possible
–
not all bits have to be included
•
evaluation
–
average over several chromosomes
messy GAs
•
primordial phase
–
build up promising short length patterns
–
repeated selection to enrich the population
–
population size halves each time
•
juxtapositional phase
–
population size fixed
–
selection continues
–
cut and splice operators introduced
messy GAs
•
suffers from combinatorial explosion
–
too many short patterns as chromosome
length increases
•
assumes fitness function can be built up
from contributions on short patterns
–
separability
–
not necessarily true for all problems
•
applied with some success on small
problems
Schema Processing
Holland, Goldberg
Mitchell section 1.10
Schema Processing
•
schema
–
pattern of bits in a chromosome
–
defined values at certain positions
****
1
***
000
***
1
****
101
•
There are 3
L
schemata
•
each length L chromosome contains 2
L
schemata
•
Do GAs work by implicitly processing
many schemata in parallel?
The Schema Theorem
•
propagation of a schema depends on
–
relative fitness of the schema
–
defining length of the schema
–
number of defining bits in the schema
Building Block Hypothesis
•
Building Blocks are schemata that:
–
are of short defining length
–
have few defined bits
–
have a high relative fitness
•
GAs select chromosomes containing
Building Blocks
•
crossover builds highly fit chromosomes
from Building Blocks
successes of schema
processing
•
useful conceptual tool
–
assists in GA design
•
theory extends to different types of
representation and crossovers
•
definition and analysis of GA

deceptive
problems
•
development of test problems
–
Royal Road functions
problems with schema
processing
•
how does crossover work to recombine
Building Blocks?
–
hitchhiking limits effectiveness
•
Schema Theorem only gives an
expectation over one time step
•
results are relative to starting
population
•
when can a GA be expected to
converge to a global optimum?
Example Problem: 6 Peaks
•
Simple test problem with interactions
between variables
•
For a chromosome of length N, it is
defined as:
Fitness = max(tail, head)+R
Where R = N if tail(0)>T and head(1)>T or
tail(1)>T and head(0)>T
(T is an integer < N)
6 Peaks Problem continued
•
N=10, T=3:
1100101110
f=2
1110010000
f=14
0000000101
f=7
•
Optimal solution has string of 1s or 0s at
one end, with the rest of the string being
the opposite value
•
Deceptive; false optimum with a string of
all 1s or all 0s
deception
•
Deceptive problems
–
test functions used in GA theory
–
fitness function leads evolution away from the
global optimum
–
“hidden optima”
•
Relation to schemas and building blocks
–
Deceptive patterns get selected
–
Building Block Hypothesis is false
Statistical Mechanics
Prugel

Bennett, Shapiro
Mitchell 4.4
Statistical Mechanics
•
analyse “bulk properties” of GA behaviour
–
mean fitness in the population
–
mean degree of symmetry in the
chromosomes
•
relate predicted behaviour of bulk
properties to design parameter choices
1

dimensional spin glass
problem
minimise spin glass energy
Boltzmann selection
fitness function
•
analysis of a spin

glass GA
•
very accurate prediction of mean and
variance of fitness distribution for each
generation
•
mean and variance are functions of
population size and parameter
•
possible to select optimal parameters to
control performance
successes of statistical
mechanics
problems with statistical
mechanics
•
highly problem specific
–
extensions of the same work exist
•
relies on the form of selection
–
Boltzmann selection
–
probability model amenable to mathematical
analysis
Estimation of Distribution
Algorithms
Muhlenbein, Goldberg, Pelikan
what is an EDA?
•
Estimation of Distribution Algorithm
•
derived from genetic algorithms
–
population

based
–
fitness

driven evolution
•
learn the distribution of good solutions
•
memory resides in a probabilistic model
•
model sampled to produce new solutions
GA vs EDA
select
parents
breed
children
update
population
GA
EDA
initialisation
completion
select
parents
model
parents
generate
population
initialisation
completion
probabilistic models
•
solution x a collection
of random variables
•
model distribution of x
as a j.p.d.
•
model factorises if all
variables are
independent
x
1
x
2
x
3
x
4
x
5
x
6
PBIL
–
a simple EDA
0.5
0.5
0.5
0.5
0.5
0.5
0
1
1
1
0
0
1
0
1
1
0
0
0
0
1
1
0
1
0
1
1
0
1
0
0
0
0
0
1
1
1
0
0
0
0
0
0.25
0.5
1
0.75
0.25
0.25
1.
initialise model
2.
generate M solutions
from the model
3.
select N best solutions
4.
calculate probabilities
5.
increment the model
6.
stop or return to 2.
PBIL
–
a simple EDA
1.
initialise model
2.
generate M solutions
from the model
3.
select N best solutions
4.
calculate probabilities
5.
increment the model
6.
stop or return to 2.
0.3
0.5
0.7
0.6
0.3
0.3
1
1
1
1
0
1
1
0
1
1
1
0
0
1
1
1
0
0
0
0
1
0
1
0
0
1
1
1
0
1
0
0
0
0
0
0
0.25
0.5
1
0.75
0.25
0.25
classification of EDAs
•
classified by the level of dependency in
the probabilistic model, i.e.
linkage
•
univariate
–
fully factorised jpd
•
bivariate
–
jpd factorises up to pairwise conditional
dependencies
•
multivariate
–
higher

order dependencies in the j.p.d.
Bayes Theorem
•
The joint probability of A and B is the product of
the probability of A with the conditional
probability of B given A.
•
This is a
factorisation
of the joint probability
•
If A and B are independent:
independent variables
Variable
values
X
2
= 0
X
2
= 1
X
1
= 0
25%
25%
X
1
= 1
25%
25%
dependent variables
Variable
values
X
2
= 0
X
2
= 1
X
1
= 0
45%
5%
X
1
= 1
5%
45%
The distribution of X
2
depends on the value of X
1
probabilistic graphical models
•
probabilistic graphical models explicitly
represent linkage
•
structural representation of a jpd
•
random variables become nodes on a
graph
•
edges represent dependencies between
variables
•
Bayesian Networks most often used
Bayesian Networks
•
widely used in EDAs to represent the jpd
•
Directed Acyclic Graph (DAG)
–
causal relationships
–
factorises
the jpd for sampling
1.
detect dependencies
2.
estimate conditional dependencies
–
from current population
3.
sample jpd to get new chromosomes
univariate model
x
6
x
1
x
2
x
3
x
4
x
5
bivariate model
x
6
x
1
x
2
x
3
x
4
x
5
multivariate model
x
6
x
1
x
2
x
3
x
4
x
5
Bayesian Networks in EDAs
•
univariate
–
PBIL, UMDA
•
bivariate
–
MIMIC, COMIT, BMDA
•
multivariate
–
EBNA, BOA, LFDA
successes of EDAs
•
improved performance on test problems
where linkage deceives a GA
•
hierarchy of algorithms appropriate to
problems of varying difficulty
•
linkage learning may represent human

understandable knowledge
•
beginning to be used in real

world
applications
problems with EDAs
•
computational cost
–
increases exponentially with linkage level
•
appropriate selection of probabilistic model
–
greedy algorithms currently used for
multivariate models
•
sampling bias
Comments 0
Log in to post a comment