B620 Intelligent Systems Topic 4: Genetic Algorithms

satisfyingcanAI and Robotics

Oct 24, 2013 (3 years and 5 months ago)

61 views

satisfyingcan_aebcb39e
-
7419
-
4ae8
-
a71a
-
26d71571a2d4.doc

1

B482 S2
-
01


B620 Intelligent Systems

Topic 4: Genetic Algorithms




Introduction


How GA work


The TSP as an example


Business Applications of GA


Advantages of GA systems


Case Study




References




Dhar, V., & Stein, R
., Seven Methods for Transforming Corporate Data into
Business
Intelligence
., Prentice Hall 1997, pp. 126
-
148, 203
-
210
.



Goldberg, D. E.,
Genetic and Evolutionary Algorithms Come of Age
, Communications of the
ACM, Vol.37, No.3, March 1994, pp.113
-
119.



Holland, J. H.,
Adaptation in Natural and Artificial Syste
ms,

Univ. of Michigan Press, 1975.



Kingdon, J.,
Intelligent Systems and Financial Forecasting
, Springer Verlag, London 1997.



Medsker,L.,
Hybrid Intelligent Systems
, Kluwer Academic Press, Boston 1995.



Michalewicz, Z.,
Genetic Algorithms + Data Structures =

Evolution Programs
, Springer
-
Verlag, Berlin 1996.


satisfyingcan_aebcb39e
-
7419
-
4ae8
-
a71a
-
26d71571a2d4.doc

2

B482 S2
-
01


Introduction


Genetic algorithms (GA) were first introduced by John Holland in the 1970s (Holland 1975) as a result
of investigations into the possibility of computer programs undergoing evolution in the

Darwinian
sense.


GA are part of a broader soft computing paradigm known as evolutionary computation. They attempt
to arrive at optimal solutions through a process similar to biological evolution. This involves following
the principles of survival of the

fittest, and crossbreeding and mutation to generate better solutions
from a pool of existing solutions.


Genetic algorithms have been found to be capable of finding solutions for a wide variety of problems
for which no acceptable algorithmic solutions exi
st. The GA methodology is particularly suited for
optimization
, a problem solving technique in which one or more very good solutions are searched for in
a solution space consisting of a large number of possible solutions. GA reduce the search space by
cont
inually evaluating the current generation of candidate solutions, discarding the ones ranked as
poor, and producing a new generation through crossbreeding and mutating those ranked as good. The
ranking of candidate solutions is done using some pre
-
determin
ed measure of goodness or fitness.




How GA work


A genetic algorithm is a probabilistic search technique that computationally simulates the process of
biological evolution. It mimics evolution in nature by repeatedly altering a population of candidate
so
lutions until an optimal solution is found.


The GA evolutionary cycle starts with a randomly selected initial population. The changes to the
population occur through the processes of selection based on fitness, and alteration using crossover and
mutatio
n. The application of selection and alteration leads to a population with a higher proportion of
better solutions. The evolutionary cycle continues until an acceptable solution is found in the current
generation of population, or some control parameter suc
h as the number of generations is exceeded.





Population



Alteration

(Mutation &
Crossover)



Selection


Genetic algorithm evolutionary cycle.

Discarded

Solutions

satisfyingcan_aebcb39e
-
7419
-
4ae8
-
a71a
-
26d71571a2d4.doc

3

B482 S2
-
01


The smallest unit of a genetic algorithm is called a
gene
, which represents a unit of information in the
problem domain. A series of genes, known as a
chromosome
, represents one pos
sible solution to the
problem. Each gene in the chromosome represents one component of the solution pattern.


The most common form of representing a solution as a chromosome is a string of binary digits. Each
bit in this string is a gene. The process of c
onverting the solution from its original form into the bit
string is known as
coding
. The specific coding scheme used is application dependent. The solution bit
strings are decoded to enable their evaluation using a fitness measure.


Selection

In biologic
al evolution, only the fittest survive and their gene pool contributes to the creation of the
next generation. Selection in GA is also based on a similar process. In a common form of selection,
known as
fitness proportional selection
, each chromosome’s lik
elihood of being selected as a good one
is proportional to its fitness value.


Alteration to improve good solutions

The alteration step in the genetic algorithm refines the good solution from the current generation to
produce the next generation of candida
te solutions. It is carried out by performing crossover and
mutation.


Crossover

may be regarded as artificial mating in which chromosomes from two individuals are
combined to create the chromosome for the next generation. This is done by splicing two chro
mosomes
from two different solutions at a crossover point and swapping the spliced parts. The idea is that some
genes with good characteristics from one chromosome may as a result combine with some good genes
in the other chromosome to create a better solu
tion represented by the new chromosome.



Mutation

is a random adjustment in the genetic composition. It is useful for introducing new
characteristics in a population


something not achieved through crossover alone. Crossover o
nly
rearranges existing characteristics to give new combinations. For example, if the first bit in every
chromosome of a generation happens to be a 1, any new chromosome created through crossover will
also have 1 as the first bit.


The mutation operator ch
anges the current value of a gene to a different one. For bit string
chromosome this change amounts to flipping a 0 bit to a 1 or vice versa.

Although useful for introducing new traits in the solution pool, mutations can be counterproductive,
and applied
only infrequently and randomly.


0

1

0

0

0

1

1

0

1

0

1

1

0

1

0

1

Crossover

point

0

1

1

0

1

1

0

1

1

0

0

0

0

1

0

1

0

0

1

1

0

1

0

1

0

0

1

0

0

1

0

1

Mutation

point

satisfyingcan_aebcb39e
-
7419
-
4ae8
-
a71a
-
26d71571a2d4.doc

4

B482 S2
-
01


The steps in the typical genetic algorithm for finding a solution to a problem are listed below:


1.

Create an initial solution population of a certain size randomly


2.

Evaluate each solution in the current generation and assig
n it a fitness value.


3.

Select “good” solutions based on fitness value and discard the rest.


4.

If acceptable solution(s) found in the current generation or maximum number of generations is
exceeded then stop.


5.

Alter the solution population using crossover a
nd mutation to create a new generation of
solutions.


6.

Go to step 2.




The TSP as an example


One classic example of an optimization problem is the travelling salesperson problem (TSP), which is
stated below:


Given a set of
n
cities (
A, B, C,

...) with di
stances
d
AB
, d
BC
, d
AC

etc., find a closed tour of all cities with a
short total distance
d.



This is an optimization problem with the following constraints


1.

Each city is to be visited once and only once


2.

Total distance travelled is to be as short as

possible


Time required to find a solution for the TSP with traditional methods is proportional to t
n
, which
increases exponentially with
n.
The TSP is an example of a type of computing problem (known as non
-
deterministic polynomial
-
complete or
np
-
comple
te
) for which the computational cost becomes
prohibitive beyond a small problem size. To solve the TSP problem for 50 cities with a computer
capable of calculating 1 billion tours per second, the time required is 1 million centuries.


Representation and co
ding of solutions


A solution to the TSP problem is an ordered list of the
n

given cities, with each city assigned one out of
n
possible positions. The representation of the solution may be visualised with the help of a matrix or a
table as shown below fo
r a five city problem. Each row in the table represents a city, and each column
is associated with a tour position for the cities.











The tour represented above is CAEBDC, which can be coded as a bit string by wr
iting the rows end to
end, starting with row one:

01000 00010 1000 00001 00100

satisfyingcan_aebcb39e
-
7419
-
4ae8
-
a71a
-
26d71571a2d4.doc

5

B482 S2
-
01



It is worth pointing out though that a binary bit string is not the best solution representation scheme for
the TSP for the following reason. The operations of crossover and m
utation are likely to lead the search
to outside the solution space by creating chromosomes which are illegal, for example the same city
appearing more than once in a tour. This would then necessitate some sort of chromosome repair
algorithm to place the s
earch back into the solution space.


In one reported work (Michalewicz 1996) among many, an integer rather than a binary vector
representation is used together with a variant of the crossover operator. With 100 cities, a tour with a
cost 9.4% above the opt
imum was arrived at after 20000 generations.





Business Applications of GA


Although initially GA were mainly of academic interest, since the late 1980s, an increasing number of
industrial and business applications of GA has appeared (Goldberg 1993). I
n business, applications of
GA include (Kingdon 1997)



portfolio optimisation



bankruptcy prediction



financial forecasting



fraud detection



scheduling.


In Europe, the ESPRIT III project PAPAGEN was the largest pan
-
European investment into the
research, explo
ration, and commercial development of GA. It demonstrated the potential of genetic
algorithm technology in a broad range of business applications. These included credit scoring, direct
marketing, insurance risk assessment, economic modelling and handwritte
n character recognition.

GA have also been used in portfolio optimisation and financial time series analysis.


In the US, Prediction Company of California has developed a set of time series prediction and trading
tools in which GA play an important role.

One measure of investment efficacy in financial circles is the
Sharpe ratio


the ratio of return to risk. A known group of currency traders was found to have Sharpe
ratios in the range 0.3


1.0. Tests with the Prediction Company’s technique demonstrated

ratios as
good as the best of the known currency traders.


First Quadrant, an investment firm also in California, uses GA to manage US$5 billion worth of
investments. It started using GA technique in 1993 and claims to have made substantial profits.
Curr
ently the company uses GA to govern tactical asset management in 17 different countries. Many
other investment houses both in the US and Europe are rumoured to be using the technique.



Advantages of GA systems


GA can be used when no algorithms or heurist
ics are available for solving a problem. A GA based
system can be built as long as a solution representation and an evaluation scheme can be worked out.
Since it only requires the description of a good solution and not how to achieve it, the need for exper
t
access is minimised.


Even where rules are available for solving a problem, the number of rules may be too large or the
nature of the knowledge base too dynamic. GA can act as alternative problem solving tools in such
cases.


Optimisation problems in wh
ich the constraints and objective functions are non
-
linear and/or
discontinuous are not amenable to solution by traditional methods such as linear programming. GA can
solve such problems.

satisfyingcan_aebcb39e
-
7419
-
4ae8
-
a71a
-
26d71571a2d4.doc

6

B482 S2
-
01



GA do not guarantee optimal solutions, but produce near optimal so
lutions which are likely to be very
good.


Solution time with GA is highly predictable


it is determined by the size of the population, time taken
to decode and evaluate a solution and the number of generations of population.


GA use simple operations,
but are able to solve problems which are found to be computationally
prohibitive by traditional algorithmic and numerical techniques. One example is the TSP problem
(discussed earlier in this topic).


Because of their relative simplicity, GA software are r
easonably sized and self
-
contained. Due to their
compact nature, it is easier to embed them as a module in another system compared with rule based
systems.


Some possible drawbacks of GA based systems


Level of explainability

GA themselves are blind to the

optimisation process, as they only look at the fitness value of each
chromosome rather than knowing what the fitness value actually means. As a result, their capability to
explain why a particular solution was arrived at is practically nil.


Scalability

A
lthough GA are moderately scalable


an increased number of variables can be accommodated by
increasing the length of the chromosome


a longer chromosome also makes finding the solution more
time consuming. The longer the chromosome, the larger the popula
tion needs to be since there are
more potential combinations of genes. This results in more time required for decoding and fitness
evaluation.


Data requirements

In general, GA do not require extensive access to data. But some applications may require acce
ss and
process data from the organisation’s databases to be able to evaluate the fitness of solutions. For these
applications, the quality and quantity of data is important.




satisfyingcan_aebcb39e
-
7419
-
4ae8
-
a71a
-
26d71571a2d4.doc

7

B482 S2
-
01


Case Study


Help Desk Task Scheduling (Dhar & Stein 1997, pp.219
-
227)


Moody’s

Invertor Service is a large organisation with a help desk dedicated to providing IT user
support for its employees. This case study looks at the GA based system developed at Moody’s for
scheduling service tasks to its customer service representatives (CSR
).


The major constraints

were that the system



Must minimise computer downtime.



Must minimise customer dissatisfaction through timeliness of service and reporting of
estimated time for service.



Must be able to integrate with the organisation’s existing dat
abase system which kept track
help desk requests.



Must be flexible enough to allow new types of task definitions and accommodate changes in
employee, training etc.



Must also be flexible enough to allow the administrator to modify solutios.



Must be able to
generate and reevaluate schedules quickly (under 15 minutes) and
consistently.



Must not take administrator or CSRs away from their jobs for any extended period of time.



Must be developed quickly.



Should be scalable in case of future growth in number of req
uests for help and the number of
CSRs.



Must not be too complicated for its users


the administrator and CSRs.


The main difficulties in meeting the above constraints were due to the large number of tasks, the large
number of CSRs, the varying capabilities

of CSRs, and the wide variety of tasks. In order to develop an
effective scheduling system, the following issues needed to be considered.


-

The priority of a task, which is determined by the severity of the problem.

-

The length of time required to perform
the task and how it would affect the servicing of other
users.

-

The ability of various CSRs to perform different levels of tasks. The level of CSR expertise
must match the complexity of the problem.

-

Low priority tasks must not be kept waiting indefinitely.

-

The measure of goodness of a schedule was to be based on the number of downtime each
schedule cost the organisation.


The following three problem solving methodologies were considered:

1. Traditional linear programming (a numerical optimisation technique)
.

2. A rule based expert system

3. GA based system


The expertise to solve this problem was not expressible clearly as a set of rules. Besides, the help desk
administrator was not available for knowledge extraction (one of the constraints listed above). So

an
ES solution was ruled out.


Although linear programming methods are useful for a wide variety of optimisation problems, one
problem with integer programming (the specific linear programming technique to be used in this case)
is that it searches for an
optimal solution, and fails if no such solution can be found. It does not produce
any sub
-
optimal solutions, which is the case with GA.


satisfyingcan_aebcb39e
-
7419
-
4ae8
-
a71a
-
26d71571a2d4.doc

8

B482 S2
-
01


The solution

SOGA (Schedule Optimising for GA)
-

a hybrid system consisting of GA and fuzzy system
components was deve
loped to meet the requirements. The GA component deals with the scheduling
task.


Each task in the queue is represented by a gene. The entire task list forms the chromosome. Each
chromosome is decoded by feeding the task list into a scheduling module that

assigns tasks only to
those CSRs who can perform them and are available.


The fitness of each chromosome is determined by calculating the amount of time that would be lost
while the employees of Moody’s wait for tasks to be completed based on the schedul
e represented by
the chromosome.


The schedules generated by the GA component is modified by a “goodwill” function that is used to
estimate how dissatisfied each user would be if forced to wait the amount of time prescribed by the
schedule. The amount of
modification is determined by the FS, which applies rules like “IF wait time is
long THEN satisfaction is decreased”.


SOGA runs in the background behind the help request tracking system. It updates schedules based
upon a predefined time interval (eg, eve
ry 10 or 15 minutes). CSRs access their current job queue
through their interface to accept jobs.


Results

The system is timely


generating schedules in about 5 minutes.

The solutions are found to be good by the help desk administrator.

The system is f
lexible enough to allow for task definitions

The system scales up well to larger domains (higher number of tasks)


The SOGA system was developed in two months using one programmer and overseeing by its
designers.