Genetic Algorithms
Overview
•
Genetic Algorithms: a gentle introduction
–
What are GAs
–
How do they work/ Why?
–
Critical issues
•
Use in Data Mining
–
GAs and statistics
–
decile performance maximization
–
multi

objective models
Natural Genetics to AI
•
Computational models inspired by
biological evolution
–
survival of the fittest
–
reproduction through cross

breeding
Genetic Algorithms
•
Population based search (
parallel
)
–
simultaneous search from multiple points in search space
–
useful in complex, unstructured search spaces
(less prone to local failures)
Population members: potential solutions
•
Population of solutions evolve from one
generation to the next
Genetic Algorithms
•
Search objective
–
Fitness score for population members
(
fitness function
)
•
Survival of the fittest
–
selection
•
Generating new solutions
–
“Mating” and reproduction of individuals
(crossover, mutation)
Basic Operation
Selection
Recombination
Crossover
Mutation
Generation t
Generation t+1
GAs: Parallel Search
X
X
Hill
climber
Fitness
x
GAs: Basic Principles
•
Representation of individuals
–
String of parameters (
genes
) :
chromosome
eg
. optimize a function F(p,q,r,s,t)
Population members: p q r s t
–
genotype
and
phenotype
Binary representation?
•
Population members as bit strings
F( p,q,r,s,t) as:
1 0 0 1 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 0
p q r s t
–
early theory in terms of binary strings
(schema
theorem)
–
unnecessary perversity?
GAs: Basic Principles
•
Survival of the fittest (
Fitness function
)
–
numerical “figure of merit”/utility measure of an individual
–
tradeoff amongst a multiple evaluation criteria
–
efficient evaluation
GAs: Basic Principles
•
Iterative search
–
population evolves over generations
•
Convergence
–
progression towards uniformity in population
–
premature convergence?
(local optima)
Typical GA Run
Fitness
Generations
Best
Average
Operators: Selection
•
Fitness proportionate selection (f
i
/f )
•
number of
reproductive trials
for individuals
Selection
•
Roulette

wheel selection
(stochastic sampling with replacement)
–
wheel spaced in proportion to
fitness values
–
N (pop size) spins of the wheel
•
Stochastic universal sampling
–
N equally spaced pins on wheel
–
single turn of the wheel
Selection
•
Premature converge
•
Fitness scaling
f = f

(2*avg.

max.)
•
Ranked fitness
•
Elitism
•
Steady

state selection
•
Demetic grouping
Operators: Crossover
Parent 1: axpsqvqbtpihd
Parent 2: qzxxaycgbtphw
crossover sites
Offspring 1: azpsavcbtpphd
Offspring 2: qxxxqyqgbtihw
(
Uniform crossover
)
•
combining good
building blocks
Operators: Mutation
•
alters each gene with small probability
x 1 y x 0 y
0
y y 0 x y x y
x 1 y x 0 y
1
y y 0 x x x y
Non

Binary Representations
•
Integer, real

number, order

based, rules, ...
•
Binary or Real

valued?
real representations give faster, more
consistent, more accurate results
•
High

level representation
–
intuitive, can utilize
specialized
operators
–
effective search over complex spaces
Real

valued representation
Parent1:
3.45 0.56 6.78 0.976 2.5
Parent2:
0.98 1.06 4.20 0.34 1.8
Offspring1:
3.22
0.56 6.78
0.65
2.12
Offspring2:
1.43
1.06 4.20
0.41
1.93
(Arithmetic crossover)
High

level representation
Parent1:
Parent2:
Offspring1:
Offspring2:
High

level representation
•
Generalize/Specialize
Tree

structured representation (GP)
/
x
5
log
*
(x log(y))/5)
y
•
Automated learning of programs (originally)
parse tree expressions
•
Non

linear interaction terms
•
Function set : internal nodes
{+,

,*,/,log}
•
terminal set: leaf nodes
{constants, variables}
Tree

structured representation
•
Representing complex patterns
<
if
y
7
0
*
y
x
2
+
AND
>
x
2
If (y<7) and (x>2)
then 0
else 2x+y
Genetic search: Issues
•
Coding scheme
,
fitness function
critical
–
the “art” in GA design!
–
General mechanism so robust that, within reasonable margins,
parameter settings are not critical
.
•
Representation to match problem, domain
–
utilizing domain knowledge
•
problem

specific crossover, mutation, selection
•
Flexibility in fitness function formulation
–
modeling business objectives
Genetic search: Issues
•
Stochastic search
–
initial populations, probabilistic operators
–
multiple runs with different random streams
–
Initializing population with known solutions
–
seeding initial population with solutions from multiple,
independent runs
Genetic search: Issues
•
Guarantees optimality?
–
But...
•
GAs and traditional techniques
–
especially useful where traditional approaches fail
–
in conjunction with traditional techniques
•
Parallelizable for large data
–
multi

processor, networked machines
Using GAs ?
•
When to use a GA?
•
GA and traditional techniques
•
How long does it take?
•
Will it perform better?
Using GAs
•
population size
•
mutation, crossover rates
•
how many generations
•
multiple runs
Is it a “black

box”?
?
Huh?
•
Data characteristics
•
Fitness function
•
GA parameters
GA Application Examples
•
Function optimizers
–
difficult, discontinuous, multi

modal, noisy functions
•
Combinatorial optimization
–
layout of VLSI circuits, factory scheduling, traveling
salesman problem
•
Design and Control
–
bridge structures, neural networks, communication networks
design; control of chemical plants, pipelines
GA Application Examples
•
Machine learning
–
classification rules, economic modeling, scheduling strategies
Portfolio design, optimized trading models, direct
marketing models, sequencing of TV advertisements,
adaptive agents, data mining, etc.
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment