Systems Topic 5: Genetic Algorithms

libyantawdryAI and Robotics

Oct 23, 2013 (3 years and 8 months ago)

56 views

ICT619 Intelligent
Systems


Topic 5: Genetic
Algorithms

ICT619

2

Genetic Algorithms


Introduction


How GAs work


The TSP as an example


Business Applications of GA


Advantages of GA systems


Some issues related to GA based systems


Case Study



ICT619

3

What is a genetic algorithm?



GA part of the broader
soft computing

(aka
"computational intelligence") paradigm known as
evolutionary computation




First introduced by Holland (1975)


Inspired by possibility of problem solving through a
process of evolution


ICT619

4

What is a GA? (cont’d)


GA mimics biological evolution to generate better
solutions from existing solutions through



survival of the fittest



crossbreeding and



mutation



ICT619

5

What is a GA? (cont’d)



A GA is capable of finding solutions for many problems
for which no usable algorithmic solutions exist



GA methodology particularly suited for
optimization



Optimization searches a solution space consisting of a
large number of possible solutions



GA reduces the search space through evolution of
solutions, conceived as individuals in a population

ICT619

6

Intelligence and Evolution


One way of understanding intelligence is as the capability of a creature
to adapt itself to an ever
-
changing environment



We normally think of adaptation as changes in the
characteristics
(including behaviours) of a single animal

in response to experiences
over its history



But adaptation is also change in the
characteristics of a species
, over
the generations, in response to environmental change



An individual creature is in
competition

with other individuals of the
same species for resources, mates etc.



There is also
rivalry

from other species which may be a
direct

(predator)

or
indirect

(food, water, land, etc.) threat



In nature, evolution operates on
populations of organisms
, ensuring by
natural selection

that
characteristics

that serve the members well tend
to be passed on to the next generation, while those that don’t die out


ICT619

7

Evolution as Optimisation



Evolution can be seen as a process leading to the
optimisation of a
population’s ability to survive

and thus reproduce in a specific
environment.



Evolutionary fitness

-

the measure of the ability to respond adequately
to the environment, is the quantity that is actually optimised in natural
life



Consider a normal population of rabbits. Some rabbits are naturally
faster than others. Any characteristic has a
range of variation

that is due
to i) sexual reproduction and ii) mutation



We may say that the faster rabbits possess superior fitness, since they
have a greater chance of avoiding foxes, surviving and then breeding



If two parents have superior fitness, there is a good chance that a
combination of their genes will produce an offspring with even higher
fitness. We say that there is
crossover

between the parents’ genes




Over successive generations, the entire population of rabbits tends to
become faster to meet their environment challenges in the face of foxes

ICT619

8

How GAs work



A population of candidate solutions
-

mathematical
representations
-

is repeatedly altered until an optimal

solution is found



The GA evolutionary cycle

1.
Starts with a randomly generated initial population of solutions
(1
st

generation)


2.
Selects a population of better solutions (next generation) by
using a measure of 'goodness' (a fitness evaluation function)


3.
Alters new generation population through crossbreeding and
mutation




Processes of selection (step 2) and alteration (step 3) lead to
a population with a higher proportion of better solutions


ICT619

9

How GAs work (cont’d)



The GA evolutionary cycle continues until an
acceptable solution is found in the current population,


or


some control parameter such as the maximum
number of generations is exceeded

ICT619

10

How solutions are represented



A series of genes, known as a
chromosome
,
represents one possible solution



Each
gene

in the chromosome represents one
component of the solution pattern



Each gene can have one of a number of possible
values known as
alleles



The process of converting a solution from its original
form into a chromosome is known as
coding



ICT619

11

How solutions are represented
(cont’d)



The most common form of representing a solution as a
chromosome is a string of binary digits (aka a binary
vector) eg 1010110101001



Each bit in this string is a gene with two alleles: 0 and 1



Other forms of representation are also used, eg,
integer vectors



Solution bit strings are
decoded

to enable them to be
evaluated using a fitness measure


ICT619

12

GA Selection


Selection in GA based on a process analogous to that
of biological evolution



Only the fittest survive and contribute to the gene pool
of the next generation



Fitness proportional selection



Each chromosome’s likelihood of being selected is
proportional to its
fitness value
.



Solutions failing selection are “bad”, and are discarded

ICT619

13

Alteration = Crossover + Mutation


Alteration r
efines good solutions from current generation to
produce next generation of solutions



Carried out by performing crossover and mutation



Crossover

by splicing two chromosomes at a crossover point and
swapping the spliced parts



A better chromosome may be created by combining genes with
good characteristics from one chromosome with some good genes
in the other chromosome



Crossover carried out with a probability


typically 0.7



Chromosomes not crossed over are cloned

ICT619

14

Crossover and Mutation

Mutation



A random adjustment in the genetic composition


Can be useful for introducing new characteristics in a
population


May be counterproductive


Probability kept low: typically 0.001 to 0.01

ICT619

15


An albino is a common mutation

ICT619

16

The typical Genetic Algorithm


1.
Represent the solution as a chromosome of fixed
length, choose size of population N, crossover
probability

p
c

and mutation probability p
m
.


2.
Define a fitness function
f
for measuring fitness of
chromosomes.


3.
Create an initial solution population randomly of size
N:


x
1
, x
2
, …, x
N


4.
Use the fitness function
f


to evaluate the fitness value
of each solution in the current generation:


f
(x
1
),
f
(x
2
), …,
f
(x
N
)

ICT619

17

The typical Genetic Algorithm
(cont’d)


5.
Select “good” solutions based on fitness value.
Discard rest of the solutions.


6.
If acceptable solution(s) found in the current
generation or maximum number of generations is
exceeded then stop.


7.
Alter the solution population using crossover and
mutation to create a new generation of solutions with
population size N.


8.
Go to step 4.

ICT619

18

The typical Genetic Algorithm
(cont’d)


ICT619

19

The Travelling Salesperson Problem


Given a set of
n
cities (
A, B, C,

...) find a closed tour of all cities
with the shortest total distance
d




Tour 'cost' may be something other than distance d



This is an optimization problem with following constraints




1.

Each city to be visited once and only once



2.

Total distance travelled must be shortest



possible



The time required to find a solution by exhaustive search
increases exponentially
-

the problem is NP
-
hard



Possible number of tours for
n

cities = n!/2n



1 million centuries for 50 cities at the rate of 1 billion tours per sec!

In 1987, Martin Groetschel and Olaf Holland found an optimal tour of 666

interesting places in the world. Source: http://www.tsp.gatech.edu//index.html

The Travelling Salesperson Problem

ICT619

21

The TSP example (cont’d)

Representation and coding of TSP solutions



The representation might be an ordered list of numbers each
representing a city nominally (known as order
-
based GA):




1) London

3) Dunedin

5) Beijing

7) Tokyo


2) Venice

4) Singapore

6) Phoenix

8) Victoria



CityList1

(3 5 7 2 1 6 4 8)


CityList2

(2 5 7 6 8 1 3 4)



Alternatively, the representation of the solution may be
encoded in binary on a matrix...

ICT619

22

The TSP example (cont’d)

Representation and coding of TSP solutions


A solution to the TSP problem is an ordered list of the
n

cities


Each city is assigned 1 out of
n
possible positions


Representation of the solution may be visualised as a
table:


Each row represents a city


Each column associated with a tour position for cities

ICT619

23

The TSP example (cont’d)


The tour represented above is CAEBDC


One possible bit string code for this solution:



01000 00010 1000 00001 00100



(rows written end to end)


Binary bit strings can produce "faulty" chromosomes
needing repair


An integer vector scheme produced a 100 city tour
9.4% above optimal cost


ICT619

24

An Optimal 100
-
City Tour

ICT619

25

Business Applications of GA



Increasing number of industrial and business
applications of GA since late 1980s



In business, applications include (Kingdon 1997)


Portfolio optimisation


Bankruptcy prediction


Financial forecasting


Fraud detection


Scheduling


Design of complex


machines eg. jet engines

ICT619

26

Business Applications of GA (cont’d)



First Quadrant
-

investment firm in
California


Started using GA technique in 1993


Uses GA to manage US$5 billion worth of
investments in 17 different countries


Their evolved model earns, on average, $255 for
every $100 invested over six years, as opposed to
$205 for other types of modeling systems (Begley &
Beals, 1995)

ICT619

27

Advantages of GA systems



Useful when no algorithms or heuristics are
available for solving a problem



No formulation of the solution is required
-

only
"recognition" of a good solution



A GA system can be built as long as a solution
representation and an evaluation scheme can
be worked out


So minimal domain expert access is required

ICT619

28

Advantages of GA systems



GA can act as an alternative to
-


Expert Systems if


number of rules is too large or


the nature of the knowledge
-
base too dynamic



Traditional optimization techniques if


constraints and objective functions are
non
-
linear
and/or
discontinuous

ICT619

29

Advantages of GA systems (cont'd)



GA does not guarantee optimal solutions, but produce
near optimal solutions

which are likely to be very good



Solution time with GA is highly predictable

Determined by


Size of the population


Time taken to decode and evaluate a solution and


Number of generations of population



GA uses simple operations to solve problems that are
computationally prohibitive otherwise


Example: the TSP problem


ICT619

30

Advantages of GA systems (cont'd)



Because of simplicity, GA software are


Reasonably sized and self
-
contained


Easier to embed them as a module in another
system



GA can also aid in developing intelligent
business systems that use other
methodologies, eg,


Building the rule base of an expert system


Finding optimal neural networks

ICT619

31

Some issues related to GA based
systems


Level of explainability


Capability to explain why a particular solution was
arrived at is practically nil


The system does not know what a fitness value really
means


Scalability


Moderately scalable

Accommodates increased number of variables by
increasing the length of the chromosome


But


A longer chromosome means a larger population space
(more potential combinations of genes)


More time required for decoding and fitness evaluation

ICT619

32

Some issues related to GA based
systems (cont’d)

Data requirements


In general, GA do not require extensive access to data
but some applications may need it to evaluate solutions


This makes the quality and quantity of data is important


Local maxima


Local maxima are regions that hold good solutions
relative to regions around them, but which do not
necessarily contain the best overall solutions


The region(s) that contain the best solutions are called
global maxima


GAs are less prone to being trapped in local maxima
because of the use of mutation and crossover

ICT619

33

Some issues related to GA based
systems (cont’d)

Premature convergence


A GA is said to have converged prematurely if it
explores a local maximum extensively



It may be then dominated by very similar solutions
within the region



Most significant factor leading to such convergence is a
mutation rate which is too slow



Mutation interference is an effect opposite to that of
premature convergence


ICT619

34

Some issues related to GA based
systems (cont’d)


Mutation interference


Finding a mutation rate which allows the GA to
converge but which also allows adequate exploration of
the solution space is essential for satisfactory
performance



Mutation interference

occurs when mutation rates in a
GA are too high, and as a result:


Solutions are frequently or drastically mutated


The algorithm never manages to explore any region of
the space thoroughly


Any good solutions found tend to be destroyed rapidly


ICT619

35

Case Study
-

Help Desk Task
Scheduling (Dhar & Stein 1997,
pp.219
-
227)

GA based system developed at Moody’s
for scheduling service tasks to its
customer service representatives


Major constraints



The system


Must minimise computer downtime and customer
dissatisfaction


Must integrate with existing database system which kept track
help desk requests.

ICT619

36

Case study


constraints (cont’d)


Must be flexible to


Accommodate new types of task definitions and changes
in employee, training etc.


Allow administrator to modify solutions



Must generate and reevaluate schedules quickly
(under 15 minutes) and consistently



Must not take administrator or CSRs away from their
jobs for any extended period of time



Must be developed quickly


ICT619

37

Case study


constraints (cont’d)


Should be scalable in case of future growth in number
of requests for help and the number of CSRs



Must not be too complicated for its users


the
administrator and CSRs



The main difficulties in meeting the constraints:


the large number of tasks


the large number of CSRs


the varying capabilities of CSRs, and


the wide variety of tasks

ICT619

38

Case Study
-

Variables and issues
needing to be considered



The priority of a task, which is determined by the severity of
the problem



The length of time required to perform the task and how it
would affect the servicing of other users



The ability of various CSRs to perform different levels of
tasks (expertise must match the complexity)



Low priority tasks must not be kept waiting indefinitely



The measure of goodness of a schedule to be based on
amount of downtime each schedule cost the organisation.

ICT619

39

Possible solution methodologies
considered


1.
Traditional linear programming (a numerical
optimisation technique)

2.
A rule based expert system

3.
A GA based system



ES ruled out because


Expertise to solve this problem not expressible as a set of
rules


Help desk administrator not available for knowledge extraction



Linear programming ruled out because


It fails if no optimal solution can be found


It does not produce any sub
-
optimal solutions, which is the
case with GA.

ICT619

40

Case Study
-

the solution

SOGA (Schedule Optimising for GA)


A hybrid system consisting of GA and fuzzy system
components


The GA component deals with the scheduling task.



Each task in the queue is represented by a gene



The entire task list forms the chromosome



Each chromosome is decoded by a scheduling module
that assigns tasks to available CSRs who can perform
them


ICT619

41

Case Study
-

the solution (cont’d)


Fitness of each chromosome is determined by
calculating the amount of downtime that would result
based on the schedule represented by the
chromosome.



Schedules generated by the GA component are
modified by the FS component



SOGA runs in the background behind the help request
tracking system


Updates schedules based upon a predefined time
interval (eg, every 10 or 15 minutes)


CSRs access their current job queue through their
interface to accept jobs.

ICT619

42

Case Study
-

Results


The system is timely


generating schedules in about 5
minutes.



The solutions are found to be good by the help desk
administrator



The system is flexible enough to allow for task
definitions



The system scales up well to larger domains (higher
number of tasks)



The SOGA system was developed in two months using
one programmer overseen by its designers

ICT619

43

REFERENCES



Begley, S. and Beals, G. "Software au naturel."
Newsweek
, May 8, 1995,
p.70


Dhar, V., & Stein, R
., Seven Methods for Transforming Corporate
Data into Business Intelligence
., Prentice Hall 1997, pp. 126
-
148,
203
-
210.


Goldberg, D. E.,
Genetic and Evolutionary Algorithms Come of
Age
, Communications of the ACM, Vol.37, No.3, March 1994,
pp.113
-
119.


Holland, J. H.,
Adaptation in Natural and Artificial Systems,

Univ.
of Michigan Press, 1975.


Kingdon, J.,
Intelligent Systems and Financial Forecasting
,
Springer Verlag, London 1997.


Medsker,L.,
Hybrid Intelligent Systems
, Kluwer Academic Press,
Boston 1995.


Michalewicz, Z.,
Genetic Algorithms + Data Structures = Evolution
Programs
, Springer
-
Verlag, Berlin 1996.


Negnevitsky, M. Artificial Intelligence A Guide to Intelligent
Systems, Addison
-
Wesley 2005.