The George Washington University

cottonseedbotanistAI and Robotics

Oct 24, 2013 (4 years and 17 days ago)

74 views




The George Washington University

The Computer Science Department










Introduction to Artificial Intelligence

Professor Mark Happel





Project II



An Investigation on the P
erformance of a Genetic Algorithm used to
solve the 15
-
puzzle







Seyed
Ali Ahmadi

6
-
May
-
2004






















Abstract:

15
-
puzzle has been considered a classic
problem in Artificial Intelligence. This paper
compares two different solutions for this
problem using IDA* and Genetic Algorithms.
Some of the factors in the Gene
tic Algorithm
have

been changed to review their effect on the
performance of the algorithm.


Keywords: 15
-
puzzle, Genetic Algorithms, 15
-
puzzle,IDA*



Classic 15
-
puzzle and IDA*


The classic 15
-
puzzle has been solved with two different techniques for this

project. In
phase one, the problem was solved using IDA* search method. The solution was found in
13 iterations. Two different heuristic functions were tested for the IDA* algorithm:


1
-

Number of misplaced tiles


2
-
Manhattan distance of tiles to their f
inal location


The average run time for

both of these admissible heuristics were close to about 20
seconds on a pentium4
2.7 GHz
. Although the node generation sequence was different,
but changing the heuristic function did not caused a major change in the
performance. It
was discussed that switching from IDA* to A* search can show the effect of c
hanging
the heuristic function.


Genetic Algorithm and 15
-
puzzle


In phase two, a Genetic Algorithm has been implemented to solve the puzzle. Genetic
Algorithms are

mainly a family of related programs. There are too many parameters in a
Genetic Algorithm that can change the performance of the program.

For this project the following assumptions have been made:


(assumption1)


The algorithm knows that the solution ca
n be found in 13 moves.

(postulate1)


The same heuristic functions have been tested a
s Fitness functions.

(postulate2)


The fitness

value for an illegal move has been changed to 0.0000005.


The most important decision in creating a Genetic Algorithm is h
ow to represent a
problem with a bit
-
string (chromosomes).








Problem Domain Representation as a Chromosome


In Genetic Algorithms it is very important to make a good decision on how to represent
the problem domain as a set of chromosomes. The interes
ting point about this problem is
that the final state of the puzzle is already known before we start the search. What we
have to find is an optimum path from the initial state to the goal state. In other words,
encoding the puzzle state into bit
-
strings do
es not solve the problem as it will search
through all the states and when program terminates the result would be the final state
which we already know in advance.

Therefore in order to find the optimal path, a series of moves were encoded into bit
-
string
s. Based on the first assumption we know that we are looking for a set of 13 moves
to find the final answer therefore we need 13 different moves encoded into a single bit
-
string which presents
a

hypothesis in our population:


L

U

U

R

R

D

L

L

D

D

R

R

L

An
example of a chromosome in the population.


Assumption one has just been made to simplify the program and this program cannot find
the optimal solutions that are longer than 13 moves.

For coding purposes numbers 1, 2, 3 and 4 have been respectively used fo
r Up, Down,
Right and Left. A typical encoded hypothesis in the program looks like this:


1

1

4

3

2

2

2

4

1

2

3

3

4



Population Size (N) and Performance Graph


Population size (N) is an important variable which was changed to evaluate its effect on
the f
inal performance. Common sense might suggests that a larger population can achieve
better results that a smaller one, but will take more time to run. Goldberg showed that the
most effective population directly depends on the program being solved by GA and
specifically directed to the coding scheme (Goldberg, 1989). The only way to determine
what the best population size
is
still remains a
s a

matter of experimentation.

True, what we are looking for in creating a new generation is a set of hypotheses with
hi
gher fitness values

until one of them can satisfy

the fitness threshold. Graphical curves

showing the average fitness

of the entir
e population
and also the best individual in the
populatio
n are useful tools

to examine the behavior of a GA over a chosen num
ber of
populations.



Problem Domain Knowledge and Fitness Function


A Genetic Algorithm program does not have any knowledge about the problem domain.
All the GA agent is concerned with is a fitness value. In fact if the program does not have
the right
fitness function you might run for ever without converging to a better population
and eventually to meet the threshold. But then again the next question would be what
the
correct fitness function is.

In this project the same heuristic functions in IDA* wer
e used
as the fitness function.





Selection, Roulette Rule


The roulette wheel selection (Goldberg, 1989, Davis, 1991) has been used to calculate the
probability of each
hypothesis

to survive to the next generation. The following is an
example of a popu
lation with six hypotheses and their respective selection probability:


In order to pick a chromosome a random number between 0 and 100 will be generated
and chromosome whose segment spans the number will be selected.





Fitness Value and illegal move se
quences


Number of misplaced tiles (Heuristic function used in IDA*) is used as the fitness
function for this program.
The maximum number in misplaced tiles is 16, therefore the
fitness function was defined as:



Fitness( H
i

)= 16
-

Number of misplaced tile
s





0 < # misplaced tiles < 16






0 < Fitness( H
i

) < 16






if there is no misplaced tile


Fitness (H
i
) = 16

(Threshold)


Illegal move sequence:


If a generated chromosome (either randomly or by one of the Genetic operations) causes
an i
llegal move (forcing the blank tile to move out of the puzzle board), the fitness value
has been changed to a very small number (0.000005) in order to minimize the probability
of such chromosome being selected for the next generation.

Exhaustive Search and

GA


Considering the encoding model for this problem,
a brute force approach can find the
optimal solution by examining 4^13 different strings.



4^13 = 67,108,864



In one iteration of the Genetic Algorithm, the program creates a new population set wit
h
20 chromosomes. Therefore the reasonable number of iterations would be:




Max iterations:

67108
864 / 20 = 3,355,444


If the program creates more than 3,355,444 generations, it is actually performing worse
than a brute force approach and therefore
will be stopped.


Random Generation Function


In order to generate random numbers for this program the following functions have been
used :



srand ( unsigned int (time (NULL)))



using the system time to seed the










rand function




return ((ra
nd()%4)+1)



generate a random integer number in [0,4]


because the program is using the system time function, every time the program is
executed a different set if initial population will be generated and therefore the run time is
different.




Runni
ng the GA algorithm and tuning the parameters


Run1


I
n the first run of the program the following values were chosen for the factors:


1
-

Population size = 20

2
-

Crossover ratio r = 0.5


10 were directly selected and 5 pairs by crossover

3
-

Mutation rate: one pai
r (one bit per chromosome)

4
-

Crossover mask: 0000000111111

5
-

Starting population: completely

random

6
-

Selection rule: roulette wheel


After a couple of iterations a very interesting behavior was observed by the program.
The
algorithm had generated 20 random se
ries of moves for the initial population. Most of
them were illegal moves and were assigned very small fitness values. The program found
a good chromosome and because it had so many bad ones in the population selected this
one too many times.

In fact, aft
er several iterations, all of the population was almost the same except of course
for the two bits of mutations. None of the Genetic operators could make the program to
create a better population. The selection process was selecting the same chromosome
(be
cause of its high probability) and applying the crossover mask to two same
chromosome was also generating the same thing. The only operator that was making a
small change in the population was mutation. This run never found the optimum solution
in less tha
n MAX_ITERATION.



Performance Graph, Fitness Threshold=16, Run Time ~ 12 hours and 35 min (this graph does
not show the whole data)




Run2


7
-

Population size = 20

8
-

Crossover ratio r = 0.5


10 were directly selected and 5 pairs by crossover

9
-

Mutation rate:

four

pair
s

(one bit per chromosome)

10
-

Crossover
mask: random mask point

11
-

Starting population: completely random

12
-

Selection rule: roulette wheel


Based on the experience from the first run, the mutation rate was raised to four pairs and
also instead of usin
g a fixed point for crossover mask, a random position was selected for
each crossover operation.




Average Fitness value in each population (
Fitness target=16

, time ~ 3 hours and 4

min



this graph does not show the whole data set)


The above graph sho
ws the average fitness value for the first 30745 population of Run2.
The target fitness value is sixteen and as the graph shows,
the average fitness is

approximately moving 6 and 10. The solution was found in generation 89371
1 after 3
hours and 4 min.
The
solution chromosome was generated by a mutation operator.




Run3


In the third run of the program the following values were chosen for the factors:


13
-

Population size = 20

14
-

Crossover ratio r = 0.5


10 were directly selected and 5 pairs by crossover

15
-

Mutation

rate: four pairs (one bit per chromosome)

16
-

Crossover mask: 0000000111111

17
-

Starting population: completely random

18
-

Selection rule: roulette wheel


In this Run, the mutation rate was raised to four pairs and a fixed crossover mask was
used.



Average Fitne
ss value in each population (Fitness target=16

, time~ 18 hours


this
graph does not show the whole data set)



Lucky Run!


The final solution was found in less than
couple of seconds
! One of the initial randomly
generated chromosomes had a fitness value
of 13 and with only one mutation was
changed to the optimal solution. (fitness = 16 )





Average Fitness value in each population (Fitness target=16
, Solution was found

in generation number 24. Run time: couple of seconds!







Discussion


Genetic Algo
rithm is a stochastic parallel beam search. As this small project shows,
tuning the GA parameters in the run time is very important in that the performance of the
algorithm depends on all of these variables. The Performance Graph (Average Fitness,
populati
on number) is a very useful tool that shows if the average fitness has improved
over time and it getting closer to the threshold.

One of the main challenges in creating an effective GA is preventing the programs to be
trapped in
a

local optimum

(Holland 19
75)
. As the trial runs in this project showed, the
mutation operator is very important to prevent such situation. Although a higher mutation
rate creates a higher genetic diversity among the chromosomes and has a random nature,
but guarantees that not all
the chromosomes will be identical to a local optimum.

If a search problem can be solves with a technique such as IDA*, this project clearly
shows that using GA can increase the run time from couple of minutes to hours and hours
of searching.







Refer
ence:

-

Goldberg, D.E (1992). Genetic
Algorithms in

Search, Optimization and Machine
Learning.

Addison
-
Wesley, MA.

-

Negnevitsky, M. (2002). Artificial Intelligence, A guide to Intelligence Systems,
Addison
-
Wesley, Essex, UK

-

Russel S., Norvig P. , Artificial I
ntelligence. A modern Approach. Prentice Hall
2003. NJ