Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Benjamin B. Perry
Laboratory for Knowledge Discovery in Databases
Kansas State University
http://www.kddresearch.org
http://www.cis.ksu.edu/~bbp9857
A Genetic Algorithm for Learning
Bayesian Network Adjacency Matrices from Data
Ben Perry
–
M.S. Thesis Defense
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
Bayesian Network
–
Definitions and examples
–
Inference and learning
•
Genetic Algorithms
•
Structure Learning Background
–
Problem
–
K2
algorithm
–
Sparse Candidate
•
Improving
K2
: Permutation Genetic Algorithm (GASLEAK)
–
Shortcoming: greedy, sensitive to ordering
–
Permutation GA
•
Master’s thesis: Adjacency Matrix GA (SLAM GA)
–
Rationale
•
Evaluation with Known Bayesian Networks
•
Summary
Overview
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
Bayesian Network
–
Directed acyclic graph
–
Vertices
(nodes): denote events, or states of affairs (each a random variable)
–
Edges
(arcs, links): denote conditional dependencies, causalities
–
Model of
conditional dependence assertions
(or
CI assumptions
)
•
Example (“Ben’s Presentation” BBN) (sprinkler)
•
General
Product (Chain) Rule
for BBNs`
Bayesian Belief Networks (BBNS):
Definition
X
1
X
2
X
3
X
4
Sleep:
Narcoleptic
Well
Bad
All

nighter
Appearance:
Good
, Bad
Memory
: Elephant,
Good
, Bad, None
Ben is nervous:
Extremely, Yes,
No
X
5
Ben’s presentation:
Good
, Not so good, Failed miserably
P
(
Well
,
Good
,
Good
,
No
,
Good
) =
P
(
G
) ∙
P
(
G

W
) ∙
P
(
G

W
) ∙
P
(
N

G
,
G
) ∙
P
(
G

N
)
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
Idea
–
Want: model that can be used to perform inference
–
Desired properties
•
Correlations among variables
•
Ability to represent functional, logical, stochastic relationships
•
Probability of certain events
•
Inference: Decision Support Problems
–
Diagnosis (medical, equipment)
–
Pattern recognition (image, speech)
–
Prediction
•
Want to Learn: Most Likely Model that
Generates
Observed Data
–
Under certain assumptions (
Causal Markovity)
, it has been shown that we can
do it
–
Given: data
D
(
tuples
or vectors containing observed values of variables)
–
Return: directed graph (
V
,
E
) expressing
target CPTs
–
NEXT:
Genetic algorithms
Graphical Models
of Probability Distributions
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
Idea
–
Emulate natural process of survival of the fittest (Example: Roaches adapt)
–
Each generation has many diverse individuals
–
Each individual competes for the chance to survive
–
Most common approach: best individuals live to the next generation and mate
–
Produce children with traits from both parents
–
If parents are strong, children might be stronger
•
Major components (operators)
–
Fitness function
–
Chromosome manipulation
–
Cross

over (Not the “John Edward” type!), mutation
•
From (Educated?) Guess to Gold
–
Initial population typically random or not much better than random
–
bad scores
–
Performs well with a non

deceptive search space and good genetic operators
–
Ability to escape local optima with mutations.
–
Not guaranteed to get the best answer, but usually gets close
Genetic Algorithms
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Learning Structure:
K2
Algorithm
•
Algorithm
Learn

BBN

Structure

K2
(
D, Max

Parents
)
FOR
i
1瑯
n
DO
// arbitrary ordering of variables {
x
1
, x
2
, …, x
n
}
WHILE (
Parents
[
x
i
].
Size
<
Max

Parents
) DO
// find best candidate parent
Best
argmax
j>i
(
P
(
D

x
j
Parents
[
x
i
]
)
// max Dirichlet score
IF (
Parents
[
x
i
] +
Best
).
Score
>
Parents
[
x
i
].
Score
) THEN
Parents
[
x
i
] +=
Best
RETURN ({
Parents
[
x
i
] 
i
{1, 2, …,
n
}})
•
A
L
ogical
A
larm
R
eduction
M
echanism [Beinlich
et al
, 1989]
–
BBN model for patient monitoring in surgical anesthesia
–
Vertices (37): findings (e.g.,
esophageal intubation
), intermediates, observables
–
K2
: found BBN different in only 1 edge from
gold standard
(elicited from expert)
17
6
5
4
19
10
21
31
11
27
20
22
15
34
32
12
29
9
28
7
8
30
25
18
26
1
2
3
33
14
35
23
13
36
24
16
3
7
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Learning Structure:
K2
downfalls
•
Greedy (may fall into local maxima)
•
Highly dependent upon node ordering
•
Optimal node ordering must be given
•
If optimal order is already known, an expert could probably create the network
•
Number of orderings consistent with DAGs is exponential (n!)
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
General Idea:
–
Inspect k

best parent candidates at a time. (K2 only inspects one)
–
k is typically very small ~ 5 ≤ k ≤ 15
–
Exponential to the order of k
•
Algorithm:
Loop until no improvements or iteration limit exceeds:
For each node, select the top k parent candidates (mutual information or
m_disc) [Restrict]
Build a network by manipulating parents (add, remove, reverse from
candidate set for each node) . Only accept changes that maximizes the
network score (Minimum Descriptor Length) [Maximize phase]
•
Must handle cycles.. expensive.
–
K2 gives this to us for free
–
Next: Improving K2
Learning Structure:
Sparse Candidate
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
GASLEAK
:
A Permutation GA for Variable Ordering
[2] Representation Evaluator
for Bayesian Network
Structure Learning Problems
G
enetic
A
lgorithm for
S
tructure
L
earning
from
E
vidence,
A
IS, and
K
2
D
: Training Data
: Evidence Specification
D
train
(Structure Learning)
D
val
(Inference)
[1] Permutation Genetic Algorithm
α
Candidate
Ordering
f
(
α
)
Ordering
Fitness
Optimized
Ordering
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
Elitist
•
Chromosome representation
–
Integer permutation ordering
–
Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4
•
Seeding
–
Random shuffle
•
Operators
–
Order crossover
–
Swap mutation
•
Fitness
–
RMSE
•
Job farm
–
Java

based; Utilize many machines regardless of OS
Properties of the Genetic Algorithm
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Histogram of
estimated
fitness for all 8! = 40320
permutations of
Asia
variables.
•
Not encouraging
–
Bad fitness function
or bad evidence b.v.
–
Many graph errors
GASLEAK results
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
SLAM GA
–
Structure Learning Adjacency Matrix Genetic Algorithm
•
Initial population

tried several approaches:
–
Completely Random Bayesian Networks (Box

Muller, Max parents)
–
Many illegal structures; wrote fixCycles algorithm.
–
Random networks generated from parents pre

selected by the Restrict
phase of Sparse Candidate
–
Performed better than random
–
Aggregate of
k
learned networks from K2 given random orderings (cycles
eliminated)
–
Best approach
Master’s Thesis: SLAM GA
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
For small networks,
k
=1 is best. For larger networks,
k=2
is best.
D
K2
Random Order
K2
Random Order
Aggregator
BBN
BBN
K2
Random Order
BBN
.
.
.
.
Training Data
Aggregate BBN
K2 Manager
BBN
1
2
k
Aggregator Instantiater
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
Chromosome representation
–
Edge matrix
–
n^2 bits
–
Each bit represents a parent edge to node.
–
1 = parent, 0 = not parent
•
Operators
–
Crossover: Swap parents, fix cycles.
SLAM GA
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
SLAM GA: Crossover
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
•
Chromosome representation
–
Edge matrix
–
n^2
–
Each bit represents a parent edge to node.
–
1 = parent, 0 = not parent
•
Operators
–
Crossover: Swap parents, fix cycles.
–
Mutation: Reverse, delete, or add a random number of edges. Fix cycles.
•
Fitness
–
Total
Bayesian Dirichlet equivalence
score for each node
SLAM GA
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results

Asia
Best of first generation
Actual
15 Graph Errors
1 Graph Error
Learned network
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results
–
Asia
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results

Poker
Best of first generation
Actual
11 Graph Errors
2 Graph Errors
Learned network
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results

Poker
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results

Golf
Best of first generation
Actual
11 Graph Errors
4 Graph Errors
Learned network
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results

Golf
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results
–
Boerlage92
Initial
Actual
Learned network
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results

Boerlage92
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Results

Alarm
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Final Fitness Values
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
K2 vs. SLAM GA
•
K2:
–
Very good if ordering is known
–
Ordering is often not known
–
Greedy, very dependent on ordering.
•
SLAM GA
–
Stochastic; falls out of local optima trap
–
Can improve on bad structures learned by K2
–
Takes much longer than K2
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
GASLEAK vs. SLAM GA
•
GASLEAK:
–
Gold network never recovered
–
Much more computationally

expensive
–
K2 is run on each [new] individual each generation
–
Each chromosome must be scored
–
Final network has many graph errors
•
SLAM GA
–
For small networks, gold standard network often recovered.
–
Relatively few graph errors for final network.
–
Less computationally intensive
–
Initial population most expensive
–
Each chromosome must be scored
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
SLAM GA: Ramifications
•
Effective structure learning algorithm
–
Ideal for small networks
•
Improvement over GASLEAK
–
SLAM GA faster in spite of same GA parameters
–
SLAM GA more accurate
•
Improvement over K2
•
Aggregate algorithm produces better initial population
•
Parent

swapping crossover technique effective
–
Diversifies search space while retaining past information
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
SLAM GA: Future Work
•
Parameter tweaking
•
Better fitness function
–
Several ‘bad’ structures score better than gold standard
–
GA works fine
•
‘Intelligent’ mutation operator
–
Add edges from pre

qualified set of candidate parents
•
New instantiation methods
–
Use GASLEAK
–
Other structure

learning algorithms
•
Scalability
–
Job farm
Kansas State University
Department of Computing and Information Sciences
Ben Perry
–
M.S. thesis defense
Summary
•
Bayesian Network
•
Genetic Algorithms
•
Learning Structure: K2, Sparse Candidate
•
GASLEAK
•
SLAM GA
Comments 0
Log in to post a comment