Genetic Algorithm

cathamAI and Robotics

Oct 23, 2013 (3 years and 9 months ago)

62 views

Genetic algorithms

Genetic Algorithms in a slide


Premise


Evolution worked once (it produced us!), it might work
again


Basics


Pool of solutions



Mate existing solutions to produce new solutions



Mutate current solutions for long
-
term diversity



Cull population



Originator


John Holland



Seminal work


Adaptation in Natural and Artificial Systems introduced
main GA concepts, 1975


Introduction


Computing pioneers (especially in AI) looked to natural
systems as guiding metaphors



Evolutionary computation


Any biologically
-
motivated computing activity simulating
natural evolution



Genetic Algorithms are one form of this activity



Original goals


Formal study of the phenomenon of adaptation


John Holland



An optimization tool for engineering problems

Main idea


Take a population of candidate solutions to a given problem



Use operators inspired by the mechanisms of natural genetic
variation



Apply selective pressure toward certain properties



Evolve a more fit solution

Why evolution as a metaphor



Ability to efficiently guide a search through a large solution
space



Ability to adapt solutions to changing environments



“Emergent” behavior is the goal



“The hoped
-
for emergent behavior is the design of
high
-
quality solutions to difficult problems and the
ability to adapt these solutions in the face of a
changing environment”


Melanie Mitchell, An Introduction to Genetic
Algorithms


Evolutionary terminology



Abstractions imported from biology


Chromosomes, Genes, Alleles


Fitness, Selection


Crossover, Mutation

GA terminology


In the spirit


but not the letter


of biology


GA chromosomes are strings of genes


Each gene has a number of alleles; i.e., settings



Each chromosome is an encoding of a solution to a
problem



A population of such chromosomes is operated on by a GA

Encoding


A data structure for representing candidate solutions


Often takes the form of a bit string



Usually has internal structure; i.e., different parts of the
string represent different aspects of the solution)

Crossover


Mimics biological recombination


Ssome portion of genetic material is swapped between
chromosomes


Typically the swapping produces an offspring



Mechanism for the dissemination of “building blocks”
(schemas)

Mutation


Selects a random locus


gene location


with some
probability and alters the allele at that locus



The intuitive mechanism for the preservation of variety in the
population

Fitness


A measure of the goodness of the organism



Expressed as the probability that the organism will live
another cycle (generation)



Basis for the natural selection simulation


Organisms are selected to mate with probabilities
proportional to their fitness



Probabilistically better solutions have a better chance of
conferring their building blocks to the next generation (cycle)


A Simple GA


Generate initial population


do



Calculate the fitness of each member



// simulate another generation



do




Select parents from current population




Perform crossover add offspring to the




new population



while new population is not full




Merge new population into the current population




Mutate current population



while not converged

How do GAs work


The structure of a GA is relatively simple to comprehend, but
the dynamic behavior is complex



Holland has done significant work on the theoretical
foundations of Gas



“GAs work by discovering, emphasizing, and recombining
good ‘building blocks’ of solutions in a highly parallel fashion.”


Melanie Mitchell, paraphrasing John Holland



Using formalism


Notion of a building block is formalized as a schema


Schemas are propagated or destroyed according to the
laws of probability

Schema


A template, much like a regular expression, describing a set
of strings



The set of strings represented by a given schema
characterizes a set of candidate solutions sharing a property



This property is the encoded equivalent of a building block

Example


0 or 1 represents a fixed bit



Asterisk represents a “don’t care”



11****00 is the set of all solutions encoded in 8 bits,
beginning with two ones and ending with two zeros



Solutions in this set all share the same variants of the
properties encoded at these loci

Schema qualifiers


Length


The inclusive distance between the two bits in a schema
which are furthest apart (the defining length of the
previous example is 8)



Order


The number of fixed bits in a schema (the order of the
previous example is 4)


Not just sum of the parts


GAs explicitly evaluate and operate on whole solutions



GAs implicitly evaluate and operate on building blocks


Existing schemas may be destroyed or weakened by
crossover


New schemas may be spliced together from existing
schema



Crossover includes no notion of a schema


only of the
chromosomes



Why do they work


Schemas can be destroyed or conserved



So how are good schemas propagated through generations?


Conserved


good


schemas confer higher fitness on the
offspring inheriting them



Fitter offspring are probabilistically more likely to be
chosen to reproduce

Approximating schema dynamics


Let H be a schema with at least one instance present in the
population at time t



Let m(H, t) be the number of instances of H at time t



Let x be an instance of H and f(x) be its fitness



The expected number of offspring of x is f(x)/f(pop) (by
fitness proportionate selection)



To know E(m(H, t +1)) (the expected number of instances of
schema H at the next time unit), sum f(x)/f(pop) for all x in H


GA never explicitly calculates the average fitness of a
schema, but schema proliferation depends on its value


Approximating schema dynamics


Approximation can be refined by taking into account the
operators



Schemas of long defining length are less likely to survive
crossover


Offspring are less likely to be instances of such
schemas



Schemas of higher order are less likely to survive
mutation



Effects can be used to bound the approximate rates at
which schemas proliferate

Implications


Instances of short, low
-
order schemas whose average fitness
tends to stay above the mean will increase exponentially



Changing the semantics of the operators can change the
selective pressures toward different types of schemas

Theoretical Foundations


Empirical observation


GAs can work



Goal


Learn how to best use the tool



Strategy


Understand the dynamics of the model


Develop performance metrics in order to quantify success

Theoretical Foundations


Issues surrounding the dynamics of the model


What laws characterize the macroscopic behavior of GAs?



How do microscopic events give rise to this macroscopic
behavior?


Theoretical Foundation


Holland’s motivation


Construct a theoretical framework for adaptive systems as
seen in nature


Apply this framework to the design of artificial adaptive
systems



Issues in performance evaluation


According to what criteria should GAs be evaluated?


What does it mean for a GA to do well or poorly?


Under what conditions is a GA an appropriate solution
strategy for a problem?



Theoretical Foundation


Holland’s observations


An adaptive system must persistently identify, test, and
incorporate structural properties hypothesized to give
better performance in some environment



Adaptation is impossible in a sufficiently random
environment

Theoretical Foundation


Holland’s intuition


A GA is capable of modeling the necessary tasks in an
adaptive system



It does so through a combination of explicit computation
and implicit estimation of state combined with incremental
change of state in directions motivated by these
calculations

Theoretical Foundation


Holland’s assertion


The ‘identify and test’ requirement is satisfied by the
calculation of the fitnesses of various schemas



The ‘incorporate’ requirement is satisfied by implication of
the Schema Theorem


Theoretical Foundation


How does a GA identify and test properties?


A schema is the formalization of a property


A GA explicitly calculates fitnesses of individuals and
thereby schemas in the population


It implicitly estimates fitnesses of hypothetical individuals
sharing known schemas


In this way it efficiently manages information regarding
the entire search space


Theoretical Foundation


How does a GA incorporate observed good properties into the
population?


Implication of the Schema Theorem


Short, low
-
order, higher than average fitness schemas
will receive exponentially increasing numbers of
samples over time

Theoretical Foundation


Lemmas to the Schema Theorem


Selection focuses the search


Crossover combines good schemas


Mutation is the insurance policy

Theoretical Foundation


Holland’s characterization


Adaptation in natural systems is framed by a tension
between exploration and exploitation


Any move toward the testing of previously unseen
schemas or of those with instances of low fitness takes
away from the wholesale incorporation of known high
fitness schemas


But without exploration, schemas of even higher fitness
can not be discovered

Theoretical Foundation


Goal of Holland’s first offering


The original GA was proposed as an “adaptive plan” for
accomplishing a proper balance between exploration and
exploitation

Theoretical Foundation


GA does in fact model this


Given certain assumptions, the balance is achieved


A key assumption is that the observed and actual
fitnesses of schemas are correlated


This assumption creates a stumbling block to which we
will return

Traveling Salesperson Problem

Initial Population for TSP

(5,3,4,6,2)

(2,4,6,3,5)

(4,3,6,5,2)

(2,3,4,6,5)

(4,3,6,2,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

Select Parents

(5,3,4,6,2)

(2,4,6,3,5)

(4,3,6,5,2)

(2,3,4,6,5)

(4,3,6,2,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

Try to pick the better ones.

Create Off
-
Spring


1 point

(5,3,4,6,2)

(2,4,6,3,5)

(4,3,6,5,2)

(2,3,4,6,5)

(4,3,6,2,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

(
3
,
4
,
5
,
6
,
2
)

(
3
,
4
,
5
,
6
,
2
)

Create More Offspring

(5,3,4,6,2)

(2,4,6,3,5)

(4,3,6,5,2)

(2,3,4,6,5)

(4,3,6,2,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

(
5
,
4
,
2
,
6
,
3
)

(
3
,
4
,
5
,
6
,
2
)

(
5
,
4
,
2
,
6
,
3
)

Mutate

(5,3,4,6,2)

(2,4,6,3,5)

(4,3,6,5,2)

(2,3,4,6,5)

(4,3,6,2,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

Mutate

(5,3,4,6,2)

(2,4,6,3,5)

(4,3,6,5,2)

(2,3,4,6,5)

(
2
,3,6,
4
,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

(
3
,
4
,
5
,
6
,
2
)

(
5
,
4
,
2
,
6
,
3
)

Eliminate

(5,3,4,6,2)

(2,4,6,3,5)

(4,3,6,5,2)

(2,3,4,6,5)

(
2
,3,6,
4
,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

Tend to kill off the worst ones.

(
3
,
4
,
5
,
6
,
2
)

(
5
,
4
,
2
,
6
,
3
)

Integrate

(5,3,4,6,2)

(2,4,6,3,5)

(
2
,3,6,
4
,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

(
3
,
4
,
5
,
6
,
2
)

(
5
,
4
,
2
,
6
,
3
)

Restart

(5,3,4,6,2)

(2,4,6,3,5)

(2,3,6,4,5)

(3,4,5,2,6)

(3,5,4,6,2)

(4,5,3,6,2)

(5,4,2,3,6)

(4,6,3,2,5)

(3,4,2,6,5)

(3,6,5,1,4)

(3,4,5,6,2)

(5,4,2,6,3)

Genetic Algorithms


Facts


Very robust but slow


Can make simulated annealing seem fast


In the limit, optimal

Other GA
-
TSP Possibilities


Ordinal Representation


Partially
-
Mapped Crossover


Edge Recombination Crossover



Problem


Operators are not sufficiently exploiting the proper
“building blocks” used to create new solutions.

Genetic Algorithms


Some ideas


Parallelism


Punctuated equilibria


Jump starting


Problem
-
specific information


Synthesize with simulated annealing


Perturbation operator

Heuristic H

Length(MST) < Length(T)

Let T be the optimal tour.

Heuristic H

Tour T’

Tour T’’

Perturbation of points

Perturbation of a Point

Mutation Operator

Points are perturbed in a normal distribution centered

around the original location and a

standard deviation which is a function of the original

interpoint distances.

Crossover Operator


Perturbed points tend to stay close to original locations, hence
distances remain reasonable.





Small shifts in point position can have an effect on the MST,
hence see many different solutions.

Characteristics of Operators

% Improvement Over EC

Average Improvement: 32.1%

% Improvement Over H

Average Improvement: 15.1%