Theory of Genetic Algorithms

libyantawdryAI and Robotics

Oct 23, 2013 (3 years and 7 months ago)

166 views

Theory of Genetic Algorithms

how do GAs work?


biological motivation


general idea but no precise explanation


suitable for scientific study


simplified genetics


fully controllable


rapid experimentation


theory could improve applications

application issues


robustness


flexibility


parameter tuning


large number of design choices


known performance problems


information loss


premature convergence


deception

the linkage problem


not all problems are like the Ones Problem


variables can interact


Royal Road problems


chemotherapy application


problem structure


unknown interactions between variables


the linkage graph


graphical model

linkage and encoding


string encoding has a natural sequence


neighbouring alleles may not be linked


genetic operators should preserve linkage


crossover can be very disruptive


linked alleles should be close together


can encoding be changed to respect
linkage?


Mitchell Section 5.3

inversion


Holland 1975, Goldberg 1989


re
-
order alleles by inversion


sections of the chromosome reversed


searches alternative encodings


motivated by a similar phenomenon in
genetics


not found to be widely useful after
investigation

inversion


0 0 0 1 0 1 0 1

encode positions and values, e.g.

{(1,0),(2,0) ,(3,0) ,(4,1), (5,0)
,(6,1) ,(7,0) ,(8,1)}

same as

{(1,0),(2,0) ,(6,1) , (5,0), (4,1)
,(3,0),(7,0) ,(8,1)}

crossover needs to be specialised

crossover hotspots


Shaffer and Morishima (1987)


limit where crossover can occur


may avoid disrupting linkage


does not change ordering of original
encoding


showed some improvement on test
problems


not widely applied

crossover hotspots

Parents

1 0 0 1
!

1 1 1
!

1

0 0 0 0 0 0
!

0 0


Children

1 0 0 1
!

0 0
!

1
!

0

0 0 0 0 1 1 0 1

messy GAs


build up a good solution from good pieces


variable length strings


strings of bits and positions


“ideas” as to what bits can go in what position


multiple positions for the same bit possible


not all bits have to be included


evaluation


average over several chromosomes

messy GAs


primordial phase


build up promising short length patterns


repeated selection to enrich the population


population size halves each time


juxtapositional phase


population size fixed


selection continues


cut and splice operators introduced

messy GAs


suffers from combinatorial explosion


too many short patterns as chromosome
length increases


assumes fitness function can be built up
from contributions on short patterns


separability


not necessarily true for all problems


applied with some success on small
problems

Schema Processing

Holland, Goldberg

Mitchell section 1.10

Schema Processing


schema


pattern of bits in a chromosome


defined values at certain positions

****
1
***
000
***
1
****
101


There are 3
L

schemata


each length L chromosome contains 2
L
schemata


Do GAs work by implicitly processing
many schemata in parallel?

The Schema Theorem


propagation of a schema depends on


relative fitness of the schema


defining length of the schema


number of defining bits in the schema

Building Block Hypothesis


Building Blocks are schemata that:


are of short defining length


have few defined bits


have a high relative fitness


GAs select chromosomes containing
Building Blocks


crossover builds highly fit chromosomes
from Building Blocks

successes of schema
processing


useful conceptual tool


assists in GA design


theory extends to different types of
representation and crossovers


definition and analysis of GA
-
deceptive
problems


development of test problems


Royal Road functions

problems with schema
processing


how does crossover work to recombine
Building Blocks?


hitchhiking limits effectiveness


Schema Theorem only gives an
expectation over one time step


results are relative to starting
population


when can a GA be expected to
converge to a global optimum?

Example Problem: 6 Peaks


Simple test problem with interactions
between variables


For a chromosome of length N, it is
defined as:


Fitness = max(tail, head)+R


Where R = N if tail(0)>T and head(1)>T or






tail(1)>T and head(0)>T


(T is an integer < N)


6 Peaks Problem continued


N=10, T=3:

1100101110


f=2

1110010000


f=14

0000000101


f=7


Optimal solution has string of 1s or 0s at
one end, with the rest of the string being
the opposite value


Deceptive; false optimum with a string of
all 1s or all 0s

deception


Deceptive problems


test functions used in GA theory


fitness function leads evolution away from the
global optimum


“hidden optima”


Relation to schemas and building blocks


Deceptive patterns get selected


Building Block Hypothesis is false

Statistical Mechanics

Prugel
-
Bennett, Shapiro

Mitchell 4.4

Statistical Mechanics


analyse “bulk properties” of GA behaviour


mean fitness in the population


mean degree of symmetry in the
chromosomes


relate predicted behaviour of bulk
properties to design parameter choices

1
-
dimensional spin glass
problem

minimise spin glass energy

Boltzmann selection

fitness function


analysis of a spin
-
glass GA


very accurate prediction of mean and
variance of fitness distribution for each
generation


mean and variance are functions of
population size and parameter


possible to select optimal parameters to
control performance

successes of statistical
mechanics

problems with statistical
mechanics


highly problem specific


extensions of the same work exist


relies on the form of selection


Boltzmann selection


probability model amenable to mathematical
analysis

Estimation of Distribution
Algorithms

Muhlenbein, Goldberg, Pelikan

what is an EDA?


Estimation of Distribution Algorithm


derived from genetic algorithms


population
-
based


fitness
-
driven evolution


learn the distribution of good solutions


memory resides in a probabilistic model


model sampled to produce new solutions

GA vs EDA

select

parents

breed

children

update

population

GA

EDA

initialisation

completion

select

parents

model

parents

generate

population

initialisation

completion

probabilistic models


solution x a collection
of random variables



model distribution of x
as a j.p.d.



model factorises if all
variables are
independent

x
1

x
2

x
3

x
4

x
5

x
6

PBIL


a simple EDA

0.5

0.5

0.5

0.5

0.5

0.5

0

1

1

1

0

0

1

0

1

1

0

0

0

0

1

1

0

1

0

1

1

0

1

0

0

0

0

0

1

1

1

0

0

0

0

0

0.25

0.5

1

0.75

0.25

0.25

1.
initialise model

2.
generate M solutions
from the model

3.
select N best solutions

4.
calculate probabilities

5.
increment the model

6.
stop or return to 2.

PBIL


a simple EDA

1.
initialise model

2.
generate M solutions
from the model

3.
select N best solutions

4.
calculate probabilities

5.
increment the model

6.
stop or return to 2.

0.3

0.5

0.7

0.6

0.3

0.3

1

1

1

1

0

1

1

0

1

1

1

0

0

1

1

1

0

0

0

0

1

0

1

0

0

1

1

1

0

1

0

0

0

0

0

0

0.25

0.5

1

0.75

0.25

0.25

classification of EDAs


classified by the level of dependency in
the probabilistic model, i.e.
linkage


univariate


fully factorised jpd


bivariate


jpd factorises up to pairwise conditional
dependencies


multivariate


higher
-
order dependencies in the j.p.d.

Bayes Theorem


The joint probability of A and B is the product of
the probability of A with the conditional
probability of B given A.


This is a
factorisation

of the joint probability


If A and B are independent:

independent variables

Variable
values

X
2

= 0

X
2

= 1

X
1

= 0

25%

25%

X
1

= 1

25%

25%

dependent variables

Variable
values

X
2

= 0

X
2

= 1

X
1

= 0

45%

5%

X
1

= 1

5%

45%

The distribution of X
2

depends on the value of X
1

probabilistic graphical models


probabilistic graphical models explicitly
represent linkage


structural representation of a jpd


random variables become nodes on a
graph


edges represent dependencies between
variables


Bayesian Networks most often used

Bayesian Networks


widely used in EDAs to represent the jpd


Directed Acyclic Graph (DAG)


causal relationships


factorises

the jpd for sampling

1.
detect dependencies

2.
estimate conditional dependencies


from current population

3.
sample jpd to get new chromosomes

univariate model

x
6

x
1

x
2

x
3

x
4

x
5

bivariate model

x
6

x
1

x
2

x
3

x
4

x
5

multivariate model

x
6

x
1

x
2

x
3

x
4

x
5

Bayesian Networks in EDAs


univariate


PBIL, UMDA


bivariate


MIMIC, COMIT, BMDA


multivariate


EBNA, BOA, LFDA

successes of EDAs


improved performance on test problems
where linkage deceives a GA


hierarchy of algorithms appropriate to
problems of varying difficulty


linkage learning may represent human
-
understandable knowledge


beginning to be used in real
-
world
applications

problems with EDAs


computational cost


increases exponentially with linkage level


appropriate selection of probabilistic model


greedy algorithms currently used for
multivariate models


sampling bias