Genetic Algorithms
Overview
•
Genetic Algorithms: a gentle introduction
–
What are GAs
–
How do they work/ Why?
–
Critical issues
•
Use in Data Mining
–
GAs and statistics
–
decile performance maximization
–
multi

objective models
Natural Genetics to AI
•
Computational models inspired by
biological evolution
–
survival of the fittest
–
reproduction through cross

breeding
Genetic Algorithms
•
Population based search (
parallel
)
–
simultaneous search from multiple points in search space
–
useful in complex, unstructured search spaces
(less prone to local failures)
Population members: potential solutions
•
Population of solutions evolve from one
generation to the next
Genetic Algorithms
•
Search objective
–
Fitness score for population members
(
fitness function
)
•
Survival of the fittest
–
selection
•
Generating new solutions
–
“Mating” and reproduction of individuals
(crossover, mutation)
Basic Operation
Selection
Recombination
Crossover
Mutation
Generation t
Generation t+1
GAs: Parallel Search
X
X
Hill
climber
Fitness
x
GAs: Basic Principles
•
Representation of individuals
–
String of parameters (
genes
) :
chromosome
eg
. optimize a function F(p,q,r,s,t)
Population members: p q r s t
–
genotype
and
phenotype
Binary representation?
•
Population members as bit strings
F( p,q,r,s,t) as:
1 0 0 1 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 0
p q r s t
–
early theory in terms of binary strings
(schema
theorem)
–
unnecessary perversity?
GAs: Basic Principles
•
Survival of the fittest (
Fitness function
)
–
numerical “figure of merit”/utility measure of an individual
–
tradeoff amongst a multiple evaluation criteria
–
efficient evaluation
GAs: Basic Principles
•
Iterative search
–
population evolves over generations
•
Convergence
–
progression towards uniformity in population
–
premature convergence?
(local optima)
Typical GA Run
Fitness
Generations
Best
Average
Operators: Selection
•
Fitness proportionate selection (f
i
/f )
•
number of
reproductive trials
for individuals
Selection
•
Roulette

wheel selection
(stochastic sampling with replacement)
–
wheel spaced in proportion to
fitness values
–
N (pop size) spins of the wheel
•
Stochastic universal sampling
–
N equally spaced pins on wheel
–
single turn of the wheel
Selection
•
Premature converge
•
Fitness scaling
f = f

(2*avg.

max.)
•
Ranked fitness
•
Elitism
•
Steady

state selection
•
Demetic grouping
Operators: Crossover
Parent 1: axpsqvqbtpihd
Parent 2: qzxxaycgbtphw
crossover sites
Offspring 1: azpsavcbtpphd
Offspring 2: qxxxqyqgbtihw
(
Uniform crossover
)
•
combining good
building blocks
Operators: Mutation
•
alters each gene with small probability
x 1 y x 0 y
0
y y 0 x y x y
x 1 y x 0 y
1
y y 0 x x x y
Non

Binary Representations
•
Integer, real

number, order

based, rules, ...
•
Binary or Real

valued?
real representations give faster, more
consistent, more accurate results
•
High

level representation
–
intuitive, can utilize
specialized
operators
–
effective search over complex spaces
Real

valued representation
Parent1:
3.45 0.56 6.78 0.976 2.5
Parent2:
0.98 1.06 4.20 0.34 1.8
Offspring1:
3.22
0.56 6.78
0.65
2.12
Offspring2:
1.43
1.06 4.20
0.41
1.93
(Arithmetic crossover)
High

level representation
Parent1:
Parent2:
Offspring1:
Offspring2:
High

level representation
•
Generalize/Specialize
Tree

structured representation (GP)
/
x
5
log
*
(x log(y))/5)
y
•
Automated learning of programs (originally)
parse tree expressions
•
Non

linear interaction terms
•
Function set : internal nodes
{+,

,*,/,log}
•
terminal set: leaf nodes
{constants, variables}
Tree

structured representation
•
Representing complex patterns
<
if
y
7
0
*
y
x
2
+
AND
>
x
2
If (y<7) and (x>2)
then 0
else 2x+y
Genetic search: Issues
•
Coding scheme
,
fitness function
critical
–
the “art” in GA design!
–
General mechanism so robust that, within reasonable margins,
parameter settings are not critical
.
•
Representation to match problem, domain
–
utilizing domain knowledge
•
problem

specific crossover, mutation, selection
•
Flexibility in fitness function formulation
–
modeling business objectives
Genetic search: Issues
•
Stochastic search
–
initial populations, probabilistic operators
–
multiple runs with different random streams
–
Initializing population with known solutions
–
seeding initial population with solutions from multiple,
independent runs
Genetic search: Issues
•
Guarantees optimality?
–
But...
•
GAs and traditional techniques
–
especially useful where traditional approaches fail
–
in conjunction with traditional techniques
•
Parallelizable for large data
–
multi

processor, networked machines
Using GAs ?
•
When to use a GA?
•
GA and traditional techniques
•
How long does it take?
•
Will it perform better?
Using GAs
•
population size
•
mutation, crossover rates
•
how many generations
•
multiple runs
Is it a “black

box”?
?
Huh?
•
Data characteristics
•
Fitness function
•
GA parameters
GA Application Examples
•
Function optimizers
–
difficult, discontinuous, multi

modal, noisy functions
•
Combinatorial optimization
–
layout of VLSI circuits, factory scheduling, traveling
salesman problem
•
Design and Control
–
bridge structures, neural networks, communication networks
design; control of chemical plants, pipelines
GA Application Examples
•
Machine learning
–
classification rules, economic modeling, scheduling strategies
Portfolio design, optimized trading models, direct
marketing models, sequencing of TV advertisements,
adaptive agents, data mining, etc.
Comments 0
Log in to post a comment