Genetic Algorithms - International Hellenic University

losolivossnowAI and Robotics

Oct 23, 2013 (3 years and 9 months ago)

82 views

Kokkoras F. | Paraskevopoulos K.


P a g e

|
1

E x e r c i s e

Genetic Algorithms

GAs

In this Exercise

1.

Theory (in brief)

2.

Things you have to consider

3.

GAs and Matlab

4.

Part #1 (fitness function, variables, representation, plots)

5.

Part #
2

(population diversity


size


range, fitness scaling)

6.

Part #
3

(selection, elitism, mutation)

7.

Part #
4

(global vs. local minima)


Duration
:
120

min


1.

Theory
(in brief)

(
5

min)

A Genetic Algorithm is an optimization technique that is based on the
evolution theory. Instead of
searching for a solution to a problem in the "state space" (like the traditional search algorithms do), a
GA works in the "solution space" and builds

(or better
,

"
breeds
"
)

new
,

h
opefully better solutions
based on existing ones.


The
general
idea behind GAs is that we can build a better solution if we somehow combine the

"good" parts of other solutions

(schemata theory)
, just like nature does by combining the DNA of li
v-
ing beings. The overall idea of a GA is depicted in
Figure
1

(you should refer to the theory for the very
details).


Figure
1
: The outline of a Genetic Algorithm

2.

Things you have to consider
(and be aware of)

(5 min)

The first thing you must do in order to use a GA is to decide
if it is possible to
automatically
build
solutions to your problem
. For example, in the Traveling Salesman Problem, every route that passes
through the cities in
question

is potentially a solution, although probably not the optimal one.

You
must be able to do that because a GA requires an initial
population P

of solutions.

International Hellenic University


Genetic Algorithms

Kokkoras F. | Paraskevopoulos K.


P a g
e

|
2


Then you must decide
what "gene" representation you will use
. You have

a few alternatives like
binary
,
integer
,
double
,
permutation
, etc. with the
binary

and
double

being the most commonly used
since they are the most flexible.

After having selected the representation you must decide

in order
:



t
he
method to select parents

fr
om the population P

(Cost Roulette Wheel, Stochastic Un
i-
versal Sampling, Rank Roulette Wheel, Tournament Selection, etc.)



t
he
way these parents will "mate"

to create descendants

(to many methods to mention
here


just note that your available options are a result of the representation decided earlier)



the
mutation method

(optional but useful



again, options are representation depended
)



the
method you will use to populate the nex
t generation

(P
i+1
) (
age based
,
quality based
,
etc.


you probably use
elitism

as well)



the algorithm's
termination condition

(number of generations, time limit,
acceptable

quality
threshold, improvement stall, etc.


combination of these is commonly used)

3.

GAs and Matlab

(
10

min)


Figure
2
: GAs in Matlab
's Optimization Toolbox


Matlab provides an optimization toolbox that includes a GA
-
based solver. You start the toolbox
by typing
optimtool

in the Matlab's

command line and pressing enter. As soon as the optimization
window appears, you select the solver
ga


Genetic Algorithm

and you are ready to go.

Matlab does
International Hellenic University


Genetic Algorithms

Kokkoras F. | Paraskevopoulos K.


P a g
e

|
3

not provide every method available
in the literature
in every step but it does have a lot of opt
ions for
fine tuning

and also provides hooks for customization
. The user should program (
by
writing m files)
any extended functionality required.
Take your time and explore the window. If you mesh up with
the settings, before you proceed close the window a
nd run the toolbox again.


Matlab R2008a
(v.7.6)
was used for the tutorial
. E
arlier versions are OK

as soon as the proper
toolbox is presented and installed
.


4.

Part

#
1

(fitness function, variables, representation, plots)

(
15

min)

The first thing you
have to do is to provide the
fitness function
, that is, the function that calculates
the quality of each member of the population (or in plain mathematics, the function you have to o
p-
timize). Let's use one provided by Matlab: type
@rastriginsfcn

in

the pro
per field and set the
Number of variables

to 2.
The representation used is
defined in the Options
-
Population section. The default
selection
Double Vector

is fine.


To have an idea of what we are looking for, check
the
equation of this function and its
plot

on the right
.

Ras(x,y)=20+x
2
+y
2
-
10(cos2
π
x+cos2
π
y)


We want to find the absolute minimum

which is 0
at (0,0)
.

Note that by default, only minimization is
supported. If for example you want to
maximize

the
f
1
(x,y) function then built and minimize

the
following custom function:


f
2
(x,y) =
-

f
1
(x,y).


Although you are ready to run, let's ask for some plots, so
we
will be able to better figure out
what happens. Go in the
Options

section, scroll down
to the
Plot fun
c
tions

and check
Best Fitness

and
Di
s-
tan
ce

checkboxes.


Now you are ready (the default settings
in ever
y-
thing else
is adequate
). Press the
Start

button. The
al
gorithm starts, the plots are pop
-
up and soon you
have the results at the bottom left of the window.
The best fitness function value (the smallest one
since

we minimize) and the termination condition
met are printed, together with the solution (Final
Point



it is very close to (0,0)
). Since the method is
stochastic, don't expect to be able to reproduce
any
result found in a different run.


Now check the two plots

on the left
. It is obvious
that the
population

converges, since the average
distance between i
ndividuals (solutions)
in term of
the fitness value
is reduced
,

as the generations pass.
This is a
measure of the diversity of a popul
a-
tion
.
It is hard to avoid convergence but keeping it low or postpon
ing

its appearance is better. Having
diversity
in the

population

allows the GA to search better in the solution space.

International Hellenic University


Genetic Algorithms

Kokkoras F. | Paraskevopoulos K.


P a g
e

|
4


Check
also the fitness value

as it
gradually get
s

smaller.
This is required, it is an indication that
optimization takes place.
Not only the fitness value of the best individual was reduced

but the mean
(average) fitness of the population was also reduced (that is,
in terms of the fitness value,
the whole
population was improved


we have better solutions in the population
,

at the end
).


All the

above

together are a good indication that the
GA did its job well

but we are
really
happy
only because we know where the solution is (at (0,0) with fitness value 0)
.

Note however that the nature of the GA prevents it from finding the best solution (0,0). It can go very close
to this value but getting

exactly to (0,0) is hard and could be done only by luck.
This is OK since in this kind of
problems (optimization) we are happy even with a good (and not the perfect one) solution.

If not, the hybrid
function option should be used (not discussed here).


Generally

speaking, t
o get the best results from the
GA requires
experiment
ation
with
the
diffe
r-
ent options.

Let's see how some of these affect the performance of the GA.

5.

Part

#2

(population diversity


size


range
,
f
itness
s
caling
)

(3
5

min)

The

per
form
ance of a GA is affected by
the diversity of the
initial
population. If the average distance
between individuals is large, the diversity is high; if the average distance is small, the diversity is low.
You should experiment to get
the right amount of d
iversity
.

If the diversity is too high or too low, the
genetic algorithm might not perform well.

We will demonstrate this in the following.


By default, the Optimization Tool creates a random initial population using a creation function.
You can
limit this

by setting the
Initial range

field in
Population

options
.

Set it to (1; 1.1).

By this we
actually make it harder for the GA to search equally well in all the
solutions

space. We do not prevent
it though.
The genetic algorithm
can find the solution even if

it does not lie in the initial range
, pr
o-
vided that the populations have enough diversity.

Note
:

The
initial range

only restricts the range of the points in the initial population by specifying the lower
and upper bounds. Subsequent generations can contain points whose entries do not lie in the initial range.
If
you want to bound all the individuals in all generations

in a range,
then you can use
the
lower

and
upper

bound

fields in the
constraints

panel, on the left.


Leave the rest

settings

as in
Part

#1 except
O
p-
tions
-
Stopping Criteria
-
Stall Generations

which should
be set to 100. This will let the algorithm run for 100
generation providing us with better results (and
plots). Now click the Start button.


The
GA

returns the best fitness function value of
approximately 2 and displays the plots in the figu
re

on the right
.



The upper plot, which displays the best fitness at
each generation, shows little progress in lowering the
fitness value

(black dots)
. The lower plot shows the
average distance between individuals at each gener
a-
tion, which is a good measu
re of the diversity of a
population. For this setting of initial range, there is too little diversity for the algorithm to make pr
o-
gress.

The algorithm was trapped in a local minimum due to the initial range restriction!


International Hellenic University


Genetic Algorithms

Kokkoras F. | Paraskevopoulos K.


P a g
e

|
5


Next, set

Initial range to [1;
100] and run
the algorithm

again
. The
GA

returns the best
fitness value of approximately 3.
3

and displays
the following plots
:


This time, the genetic algorithm makes
progress, but because the average distance
between individuals is so large, the best ind
i-
viduals are far from the optimal solution.

Note
though that if we let the GA to run for more
generations
(by setting
Generations

and
Stall
Generation
s

in
Stopping Criteria

to 200)
it will
eventually find a better solution.


Note
: If you try this, please le
ave the settings
in their initial value
s
before you proceed

(
d
e-
fault

and
100
, respectively)
.


Finally, set
Initial range

to [1; 2] and run
the
GA
. This returns the best fitness value of
approximately
0
.012 and displays the plots

that follow
.


The diversit
y in this case is better suited to the problem, so the genetic algorithm returns a much
better result than in the previous two cases.


In all the examples above, we had the
Population Size

(
Options
-
Population
) set to
20 (the default). This value determines

the
size of the population at each generation.
Increasing the population size enables the
genetic algorithm to search more points and
thereby obtain a better result. However, the
larger the population size, the longer the g
e-
netic algorithm takes to comput
e each ge
n-
eration.


Note though that you should set
Popul
a-
tion Size

to be at least the value of
Number of
variables
, so that the individuals in each
population span the space being searched.
You can experiment with different settings
for
Population Size

t
hat return good results
without taking a prohibitive amount of time to run.



Finally, another parameter that affects the diversity of the population (remember, it's vital to
have good diversity in the population) is the
Fitness Scaling

(in Options).
If t
he
fitness
values vary too
widely

Figu
re
3
, the individuals with the
lowest
values
(recall that we minimize)

reproduce too rapi
d-
ly, taking over the population pool to
o quickly and preventing the
GA

from searching other areas of
the solution space. On the other hand, if the values vary only a little, all individuals have approx
i-
mately the same chance of reproduction and the search will progress very slowly.


International Hellenic University


Genetic Algorithms

Kokkoras F. | Paraskevopoulos K.


P a g
e

|
6



Figu
re
3
:
Raw f
itness
values
(lower is better)
vary too widely

on the left. Scaled values (right) do not

alter the selection advantage of the good individuals (except that now bigger is better). They just

reduce the diversity we have

on the left.
This prevents the GA from converging too early.


The
Fitness Scaling

adjusts the fitness values (scaled values) before the selection step of the GA.
This is done without changing the ranking order, that is, the best individual based on the ra
w fitness
value remains the best in the scaled rank, as well. Only the values are changed,
and
thus the prob
a-
bility of an individual
to
get selected
for mating
by the selection procedure.

This prevents the GA
from converging too fast which allows the
algorithm to better search the solution space.


6.

Part

#3
(selection, elitism, mutation)

(3
5

min)

We continue this GA tutorial using the Rastrigin's function
. Use the following settings leaving ever
y-
thing else in its default value

(
Fitness function
:
@r
astriginsfcn
,
Number of Variables
: 2,
Initial Range
:
[1; 2
0
]
,
Plots
: Best Fitness, Distance).


The
Selection

panel

in
Options

controls the
Selection Function
, that is, how individuals are s
e-
lected to become parents.

Note that this mechanism works on the sc
aled values, as described in the
previous section.

Most
well
-
known methods are presented (uniform, roulette and tournament).
An
individual can be selected more than once as a parent, in which case it contributes its genes to more
than one child.


Figure
4
:
Stochastic uniform

selection method. For 6 parents we step the

selection line with steps equal to 15/6.


The default selection option,
Stochastic
U
niform
, lays out a line
(
Figure
4
)
in which each parent
corresponds to a section of the line of length proportional to its scaled value. The algorithm moves
International Hellenic University


Genetic Algorithms

Kokkoras F. | Paraskevopoulos K.


P a g
e

|
7

along the line in steps of equal size. At each step, the algorithm a
llocates a parent from the section it
lands on.

For example, assume a population of 4 individuals with scaled values 7, 4, 3 and 1. The ind
i-
vidual with the scaled value of 7 is the best and should contribute its genes more than the rest. We
create a line o
f length 1+3+4+7=15. Now, let's say that we need to select 6 individuals for parents.
We step over this line in steps of 15/6 and select the individual we land in (
Figure
4
).


The
Reproduction

panel in
Options

control how the
GA

creates the next generation. Here you
specify the amount of elitism and the fraction of the population of the next generation that is gene
r-
ated through mating

(the rest is generated by mutatio
n)
. The options are:



Elite
C
ount
: the number of individuals with the best fitness values in the current generation
that are guaranteed to survive to the next generation. These individuals are called elite chi
l-
dren. The default value of Elite count is 2.

Try to solve the Rastrigin's problem by changing only this parameter.

Try values of 10, 3 and
1. You will get results like those depicted in

Figure
5
.

It is obvious th
at you should keep this
value low. 1
(
or 2
-

depending on the population

size
) i
s

OK
. (Why?)




Figure
5
: Elite count 10 (left), 3 (middle) and 1 (right). Too much elitism results in

early convergence which can make the
search less effective.



Crossover
F
raction
:

t
he fraction of individuals in the next generation, other than elite chi
l-
dren, that are created by crossover

(that is, mating)
.

The rest are generated by mutation.
A
crossover fraction of 1 means that all children

other than elite individuals are crossover chi
l-
dren
.

A

crossover fraction of 0 means that all children are mutation children.


The following example show
s

that neither of these extremes is an effective strategy for
optimizing a function.

You will now chan
ge the problem (you better restart the optimization
toolbox to have everything set to default values). You will optimize this function:

f(x
1
, x
2
, ..., x
10
) = |X
1
| + |X
2
| + ... +|X
1
0
|


Use the following settings:

o

Fitness Function:
@(x) sum(abs(x))

o

Number

of variables: 10

o

Initial range: [
-
1; 1].

o

Plots: Best fitness and Distance


Run the example with the default value of 0.8 for
Crossover fraction
, in the
Options > R
e-
production

pane
l
. This returns the best fitness value of approximately 0.
25

and displays
plots

International Hellenic University


Genetic Algorithms

Kokkoras F. | Paraskevopoulos K.


P a g
e

|
8

like those
in
Figu
re
6

(left)
.

Note though that
for another fitness function, a different setting
for Crossover fraction might yield the best result.



Figu
re
6
: Plots for Crossover fraction set to 0.8 (left) and 1 (right).


To see how the genetic algorithm performs when there is no mutation, set Crossover
fraction to 1.0 and click Start. This returns the best fitness value of approxi
mately 1.
1

and
displays plots

similar to th
e one in
Figu
re
6

(right).



In this case, the algorithm selects genes from the individuals in the initial population and
recombines them. The algorithm cannot create any new genes because there is no mutation.
The algorithm generates the best individual that it can using these genes
at generation nu
m-
ber
~
15
, where the best fitness plot becomes level. After this, it creates new copies of the
best individual, which are then are selected for the next generation. By generation number
~
1
9
, all individuals in the population are the same, na
mely, the best individual. When this o
c-
curs, the average distance between individuals is 0. Since the algorithm cannot improve the
best fitness value after generation
~
15
, it
terminates because the average change to the fi
t-
ness function is less what is set

to the termination conditions.



To see how the genetic algorithm performs
when there is
no crossover
, set Crossover fraction
to 0 and click Start. This returns the best fitness
value of approximately
~2
.
7

and di
s
plays plots like
that on the right
.


In
this case, all children are generated though
mutation. The random changes that the algorithm
applies never improve the fitness value of the
best individual at the first generation. While it i
m-
proves the individual genes of other individuals,
as you can see

in the upper plot by the decrease
in the mean value of the fitness function, these
improved genes are never combined with the
genes of the best individual because there is no
International Hellenic University


Genetic Algorithms

Kokkoras F. | Paraskevopoulos K.


P a g
e

|
9

crossover. As a result, the best fitness plot is level and the algorithm stalls
at generation
number 50.

7.

Part #4
(global vs. local minima)

(
15

min)

Optimization algorithms sometimes return a local minimum instead of the global one, that is, a point
where the function value is smaller than the nearby
points, but possibly greater th
an one at a distant point in
the solution space. The genetic algorithm can sometimes
overcome this deficiency with the right settings. As an e
x-
ample, consider the following function which has the plot
depicted on the right:


(

)

{



(


)









(



)
(



)





The function has two local minima, one at x = 0, where
the function value is

1, and the other at x = 21, where the function value is
about
-
1.37
. Since the
latter value is smaller, the global minimum occurs at x = 21.


Let us now see how we can

define custom fitness functions. Go to Matlab and select
File>New>M
-
File. Define the function in the editor window as shown is the picture.
Then
Save the file
to your desktop

using the suggested filename

two_min

(do not change it!).


Now i
n the Matlab's
m
ain
toolbar,
set the cu
r-
rent directory to the Desktop. This way, your M file
will be visible to the Matlab.


In the Optimization Toolbox set
Fitness fun
c-
tion

to
@two_min
,
Number of variables

to 1
,
Stopping criteria>Stall Generations to 100

and click
Start. The genetic algorithm returns a point very
close to the local minimum at x = 0.


The problem here is the default initial range of [0; 1] (in the
Options > Population

panel). This
range

is not large enough to explore
points near the globa
l minimum at x
= 21.



One way to make the
GA

explore
a wider range of points

(
that is, to
increase the diversity of the popul
a-
tions
)
is to increase the
Initial range
.
It
does not have to include the point
x=21, but it must be large enough so
that the algo
rithm
will be able to
generates individuals near x = 21.
So, s
et Initial range to [0;

15]

and click
Start

once again
.

Now t
he
GA
returns a point very close to 21
.