CSM6120 Introduction to Intelligent Systems

turnfeastAI and Robotics

Oct 24, 2013 (3 years and 9 months ago)

62 views

rkj@aber.ac.uk

CSM6120

Introduction to Intelligent Systems

Evolutionary and Genetic Algorithms

Informal biological terminology


Genes


Encoding rules that describe how an organism is built up from
the tiny building blocks of life



Chromosomes


Long strings formed by connecting genes together



Recombination


Process of two organisms mating, producing offspring that may
end up sharing genes of their parents

Basic ideas of EAs


An EA is an iterative procedure which evolves a
population of individuals


Each individual is a candidate solution to a given problem


Each individual is evaluated by a fitness function, which
measures the quality of its candidate solution



At each iteration (generation):


The best individuals are selected


Genetic operators are applied to selected individuals in order
to produce new individuals (offspring)


New individuals are evaluated by fitness function

Taxonomy

Search Techniques

Informed

Uninformed

BFS

DFS

A*

Hill Climbing

Simulated
Annealing

Evolutionary
Algorithms

Genetic
Programming

Genetic
Algorithms

Swarm Intelligence

Evolutionary
Strategies

The Genetic Algorithm


Directed search algorithms based on the mechanics of
biological evolution



Developed by John Holland, University of Michigan
(1970s)


To understand the adaptive processes of natural systems


To design artificial systems software that retains the robustness
of natural systems



Provide efficient, effective techniques for optimization and
machine learning applications


Some GA applications

Application: function optimisation (1)

-
1

-
0.8

-
0.6

-
0.4

-
0.2

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

f
(
x
) =
x
2

g
(
x
) =
sin
(
x
)
-

0
.
1
x

+ 2

h
(
x,y)

=
x.sin
(4

x
)
-

y.
sin
(4

y
+

) + 1

Application: function optimisation (2)


Conventional approaches:


Often requires knowledge of derivatives or other specific
mathematical technique



Evolutionary algorithm approach:


Requires only a measure of solution quality (fitness function)

Components of a GA

A problem to solve, and ...


Encoding technique


(
gene, chromosome
)


Initialization procedure


(creation)


Evaluation function


(environment)


Selection of parents


(
reproduction)


Genetic operators

(mutation, recombination)


Parameter settings


(practice and art)


GA terminology


Population


The collection of potential solutions (i.e. all the chromosomes)



Parents/Children


Both are chromosomes


Children are generated from the parent chromosomes



Generations


Number of iterations/cycles through the GA process

Simple GA


initialize population;

evaluate population;


while TerminationCriteriaNotSatisfied

{

select parents for reproduction;

perform recombination and mutation;

evaluate population;

}


The GA cycle

selection

population

evaluation

modification

discard

deleted

members

parents

children

modified

children

evaluated children

recombination

chosen

parents

Population

Chromosomes could be:


Bit strings (0101 ... 1100)


Real numbers (43.2
-
33.1 ... 0.0 89.2)


Permutations of element (E11 E3 E7 ... E1 E15)


Lists of rules (R1 R2 R3 ... R22 R23)


Program elements (genetic programming)


... any data structure ...


Representation of an individual can be using discrete values
(binary, integer, or any other system with a discrete set of
values)



The following is an example of binary representation:


CHROMOSOME

GENE

Example: Discrete representation

0

0

1

1

1

0

1

0

8 bits Genotype

Phenotype:



Integer



Real Number



Schedule



...



Anything?

Example: Discrete representation

0

0

1

1

1

0

1

0

Phenotype could be integer numbers

Genotype:

1*2
7
+ 0*2
6
+ 1*2
5
+ 0*2
4
+ 0*2
3
+ 0*2
2
+ 1*2
1
+ 1*2
0

=

128 + 32 + 2 + 1 = 163

= 163

Phenotype:

Example: Discrete representation

0

0

1

1

1

0

1

0

Phenotype could be real numbers

e.g. a number between 2.5 and 20.5 using 8 binary digits

= 13.9609

Genotype:

Phenotype:

Example: Discrete representation

0

0

1

1

1

0

1

0

Phenotype could be a schedule

e.g. 8 jobs, 2 time steps

Genotype:

=

1

2

3

4

5

6

7

8

2

1

2

1

1

1

2

2

Job

Time Step

Phenotype

Example: Discrete representation

0

0

1

1

1

0

1

0


A very natural encoding if the solution we are looking
for is a list of real
-
valued numbers, then encode it as a list
of real
-
valued numbers! (i.e., not as a string of 1s and 0s)



Lots of applications, e.g. parameter optimisation



Example: Real
-
valued representation

Representation


Task


how to represent the travelling salesman problem
(TSP)?


Find a tour of a given set of cities so that


Each city is visited only once


The total distance travelled is
minimised


Representation

One possibility
-

an ordered list of city numbers

(this is known as an
order
-
based

GA)


1) London 3) Dunedin 5) Beijing 7) Tokyo

2) Venice 4) Singapore 6) Phoenix 8) Victoria


Chromosome 1

(3 5 7 2 1 6 4 8)

Chromosome 2

(2 5 7 6 8 1 3 4)

Selection

selection

population

parents

Selection


Need to choose which chromosomes to use based on
their ‘fitness’


Why not choose the best chromosomes?



We want a balance between
exploration

and
exploitation


Roulette wheel selection


Rank
-
based selection


1st step


Sort (rank) individuals according to fitness


Ascending or descending order (minimization or maximization)



2nd step


Select individuals with probability proportional to their rank only
(ignoring the fitness value)


The better the rank, the higher the probability of being selected



It avoids most of the problems associated with roulette
-
wheel
selection, but still requires global sorting of individuals,
reducing potential for parallel processing

Tournament selection


A number of “tournaments” are run


Several chromosomes chosen at random


The chromosome with the highest fitness is selected each time


Larger tournament size means that weak chromosomes are
less likely to be selected



Advantages


It is efficient to code


It works on parallel architectures


The GA cycle

selection

population

evaluation

modification

discard

deleted

members

parents

children

modified

children

evaluated children

recombination

chosen

parents

Crossover: recombination


P1

(0 1 1 0 1 0 1 1) (1 1 0 1 1 0 1 1)
C1

P2

(1 1 0 1 1 0 0 1) (0 1 1 0 1 0 0 1)
C2



Crossover is a critical feature of GAs:


It greatly accelerates search early in evolution of a population


It leads to effective combination of sub
-
solutions on different
chromosomes


Several methods for crossover exist…

Crossover


How would we implement crossover for TSPs?



Parent 1

(3 5 7 2 1 6 4 8)

Parent 2

(2 5 7 6 8 1 3 4)

Crossover


Parent 1

(3 5 7 2 1 6 4 8)

Parent 2

(2 5 7 6 8 1 3 4)


Child
1 (
3

5 7 6 8 1
3

4)

Child
2 (
2

5 7
2

1 6 4 8)





Mutation: local modification

Before:


(1 0 1 1 0 1 1 0)

After:



(0 1 1 0 0 1 1 0)


Before:


(1.38
-
69.4 326.44 0.1)

After:



(1.38
-
67.5 326.44 0.1)


Causes movement in the search space

(local or global)


Restores lost information to the population

Mutation


Given the representation for TSPs, how could we achieve
mutation?

Mutation involves reordering of the list:



*

*

Before: (5 8 7 2 1 6 3 4)


After: (5 8 6 2 1 7 3 4)

Mutation

Note


Both mutation and crossover are applied based on user
-
supplied probabilities



We usually use a fairly
high

crossover rate and fairly
low

mutation rate


Why do you think this is?

Evaluation of fitness






The evaluator decodes a chromosome and assigns it a fitness
measure



The evaluator is the only link between a classical GA and the
problem it is solving

evaluation

modified

children

evaluated children

Fitness functions


Evaluate the ‘goodness’ of chromosomes


(How well they solve the problem)



Critical to the success of the GA



Often difficult to define well



Must be fairly fast, as each chromosome must be
evaluated each generation (iteration)

Fitness functions


Fitness function for the TSP?


(3 5 7 2 1 6 4 8)



As we’re minimizing the distance travelled, the fitness is
the total distance travelled in the journey defined by the
chromosome

Deletion








Generational

GA:

entire populations replaced with each iteration



Steady
-
state

GA:

a few members replaced each generation

population

discard

deleted

members

The GA cycle

selection

population

evaluation

modification

discard

deleted

members

parents

children

modified

children

evaluated children

recombination

chosen

parents

Stopping!


The GA cycle continues until


The system has ‘converged’; or


A specified number of iterations (‘generations’) has been
performed

An abstract example

Distribution of Individuals in Generation 0

Distribution of Individuals in Generation N


Good demo of the GA components


http://www.obitko.com/tutorials/genetic
-
algorithms/example
-
function
-
minimum.php

TSP example: 30 cities

Overview of performance

Example:
n
-
queens


Put
n

queens on an
n
×

n

board with no two queens on
the same row, column, or diagonal

Examples


Eaters


http://math.hws.edu/xJava/GA/



TSP


http://www.heatonresearch.com/articles/65/page1.html


http://www.ads.tuwien.ac.at/raidl/tspga/TSPGA.html




Exercise: The Card Problem


You have 10 cards numbered from 1 to 10. You have to choose a way of
dividing them into 2 piles, so that the cards in Pile0 *sum* to a number as
close as possible to 36, and the remaining cards in Pile1 *multiply* to a
number as close as possible to 360



Encoding


Each card can be in Pile0 or Pile1, there are 1024 possible ways of
sorting them into 2 piles, and you have to find the best. Think of a
sensible way of encoding any possible solution.



Fitness


Some of these chromosomes will be closer to the target than others.
Think of a sensible way of evaluating any chromosome and scoring it
with a fitness measure.


Issues for GA practitioners


Choosing basic implementation issues:


Representation


Population size, mutation rate, ...


Selection, deletion policies


Crossover, mutation operators



Termination criteria


Performance, scalability


Solution is only as good as the fitness function (often hardest
part)



Your assignment will be to code a GA for a given task! Be
aware of the above issues…


Concept is easy to understand



Supports multi
-
objective optimization



Good for “noisy” environments



Always an answer; answer gets better with time



Inherently parallel; easily distributed

Benefits of GAs