A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data

grandgoatΤεχνίτη Νοημοσύνη και Ρομποτική

23 Οκτ 2013 (πριν από 4 χρόνια και 17 μέρες)

125 εμφανίσεις

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


Benjamin B. Perry

Laboratory for Knowledge Discovery in Databases

Kansas State University


http://www.kddresearch.org

http://www.cis.ksu.edu/~bbp9857

A Genetic Algorithm for Learning

Bayesian Network Adjacency Matrices from Data

Ben Perry


M.S. Thesis Defense

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


Bayesian Network


Definitions and examples


Inference and learning


Genetic Algorithms


Structure Learning Background


Problem


K2

algorithm


Sparse Candidate


Improving
K2
: Permutation Genetic Algorithm (GASLEAK)


Shortcoming: greedy, sensitive to ordering


Permutation GA


Master’s thesis: Adjacency Matrix GA (SLAM GA)


Rationale



Evaluation with Known Bayesian Networks


Summary

Overview

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


Bayesian Network


Directed acyclic graph



Vertices

(nodes): denote events, or states of affairs (each a random variable)


Edges

(arcs, links): denote conditional dependencies, causalities


Model of
conditional dependence assertions

(or
CI assumptions
)


Example (“Ben’s Presentation” BBN) (sprinkler)








General
Product (Chain) Rule

for BBNs`

Bayesian Belief Networks (BBNS):

Definition

X
1

X
2

X
3

X
4

Sleep:

Narcoleptic

Well

Bad

All
-
nighter

Appearance:

Good
, Bad

Memory
: Elephant,
Good
, Bad, None

Ben is nervous:

Extremely, Yes,
No

X
5

Ben’s presentation:

Good
, Not so good, Failed miserably

P
(
Well
,
Good
,
Good
,
No
,
Good
) =
P
(
G
) ∙
P
(
G

|
W
) ∙
P
(
G

|
W
) ∙
P
(
N

|
G
,
G
) ∙
P
(
G

|
N
)

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


Idea


Want: model that can be used to perform inference


Desired properties


Correlations among variables


Ability to represent functional, logical, stochastic relationships


Probability of certain events


Inference: Decision Support Problems


Diagnosis (medical, equipment)


Pattern recognition (image, speech)


Prediction


Want to Learn: Most Likely Model that
Generates

Observed Data


Under certain assumptions (
Causal Markovity)
, it has been shown that we can
do it


Given: data
D

(
tuples

or vectors containing observed values of variables)


Return: directed graph (
V
,
E
) expressing
target CPTs


NEXT:
Genetic algorithms

Graphical Models

of Probability Distributions

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


Idea


Emulate natural process of survival of the fittest (Example: Roaches adapt)


Each generation has many diverse individuals


Each individual competes for the chance to survive


Most common approach: best individuals live to the next generation and mate


Produce children with traits from both parents


If parents are strong, children might be stronger


Major components (operators)


Fitness function


Chromosome manipulation


Cross
-
over (Not the “John Edward” type!), mutation


From (Educated?) Guess to Gold


Initial population typically random or not much better than random


bad scores


Performs well with a non
-
deceptive search space and good genetic operators


Ability to escape local optima with mutations.


Not guaranteed to get the best answer, but usually gets close

Genetic Algorithms

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Learning Structure:

K2

Algorithm


Algorithm
Learn
-
BBN
-
Structure
-
K2

(
D, Max
-
Parents
)

FOR
i



1瑯
n

DO


// arbitrary ordering of variables {
x
1
, x
2
, …, x
n
}

WHILE (
Parents
[
x
i
].
Size

<
Max
-
Parents
) DO

// find best candidate parent


Best



argmax
j>i

(
P
(
D

|
x
j



Parents
[
x
i
]
)

// max Dirichlet score


IF (
Parents
[
x
i
] +
Best
).
Score

>
Parents
[
x
i
].
Score
) THEN
Parents
[
x
i
] +=
Best

RETURN ({
Parents
[
x
i
] |
i



{1, 2, …,
n
}})


A

L
ogical
A
larm
R
eduction
M
echanism [Beinlich
et al
, 1989]


BBN model for patient monitoring in surgical anesthesia


Vertices (37): findings (e.g.,
esophageal intubation
), intermediates, observables


K2
: found BBN different in only 1 edge from
gold standard

(elicited from expert)

17

6

5

4

19

10

21

31

11

27

20

22

15

34

32

12

29

9

28

7

8

30

25

18

26

1

2

3

33

14

35

23

13

36

24

16

3
7

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Learning Structure:

K2

downfalls


Greedy (may fall into local maxima)


Highly dependent upon node ordering


Optimal node ordering must be given


If optimal order is already known, an expert could probably create the network


Number of orderings consistent with DAGs is exponential (n!)



Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


General Idea:


Inspect k
-
best parent candidates at a time. (K2 only inspects one)


k is typically very small ~ 5 ≤ k ≤ 15


Exponential to the order of k


Algorithm:

Loop until no improvements or iteration limit exceeds:


For each node, select the top k parent candidates (mutual information or
m_disc) [Restrict]


Build a network by manipulating parents (add, remove, reverse from
candidate set for each node) . Only accept changes that maximizes the
network score (Minimum Descriptor Length) [Maximize phase]


Must handle cycles.. expensive.


K2 gives this to us for free


Next: Improving K2

Learning Structure:

Sparse Candidate

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

GASLEAK
:

A Permutation GA for Variable Ordering

[2] Representation Evaluator

for Bayesian Network

Structure Learning Problems




















G
enetic
A
lgorithm for
S
tructure
L
earning

from
E
vidence,
A
IS, and
K
2

D
: Training Data


: Evidence Specification

D
train

(Structure Learning)

D
val

(Inference)

[1] Permutation Genetic Algorithm

α


Candidate

Ordering

f
(
α
)


Ordering

Fitness



Optimized

Ordering

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


Elitist


Chromosome representation


Integer permutation ordering


Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4


Seeding


Random shuffle


Operators


Order crossover


Swap mutation


Fitness


RMSE


Job farm


Java
-
based; Utilize many machines regardless of OS




Properties of the Genetic Algorithm

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Histogram of
estimated

fitness for all 8! = 40320

permutations of
Asia

variables.


Not encouraging


Bad fitness function

or bad evidence b.v.


Many graph errors

GASLEAK results

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


SLAM GA


Structure Learning Adjacency Matrix Genetic Algorithm


Initial population
-

tried several approaches:


Completely Random Bayesian Networks (Box
-
Muller, Max parents)


Many illegal structures; wrote fixCycles algorithm.


Random networks generated from parents pre
-
selected by the Restrict
phase of Sparse Candidate


Performed better than random


Aggregate of
k

learned networks from K2 given random orderings (cycles
eliminated)


Best approach

Master’s Thesis: SLAM GA

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

For small networks,
k
=1 is best. For larger networks,
k=2

is best.

D

K2

Random Order


K2

Random Order

Aggregator

BBN

BBN

K2

Random Order

BBN

.

.

.

.

Training Data


Aggregate BBN


K2 Manager


BBN

1

2

k

Aggregator Instantiater

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


Chromosome representation


Edge matrix


n^2 bits


Each bit represents a parent edge to node.


1 = parent, 0 = not parent


Operators


Crossover: Swap parents, fix cycles.

SLAM GA

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

SLAM GA: Crossover

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense


Chromosome representation


Edge matrix


n^2


Each bit represents a parent edge to node.


1 = parent, 0 = not parent


Operators


Crossover: Swap parents, fix cycles.


Mutation: Reverse, delete, or add a random number of edges. Fix cycles.


Fitness


Total
Bayesian Dirichlet equivalence

score for each node


SLAM GA

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results
-

Asia

Best of first generation

Actual

15 Graph Errors

1 Graph Error

Learned network

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results


Asia

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results
-

Poker

Best of first generation

Actual

11 Graph Errors

2 Graph Errors

Learned network

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results
-

Poker

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results
-

Golf

Best of first generation

Actual

11 Graph Errors

4 Graph Errors

Learned network

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results
-

Golf

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results


Boerlage92

Initial

Actual

Learned network

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results
-

Boerlage92

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Results
-

Alarm

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Final Fitness Values

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

K2 vs. SLAM GA


K2:


Very good if ordering is known


Ordering is often not known


Greedy, very dependent on ordering.



SLAM GA


Stochastic; falls out of local optima trap


Can improve on bad structures learned by K2


Takes much longer than K2


Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

GASLEAK vs. SLAM GA


GASLEAK:


Gold network never recovered


Much more computationally
-
expensive


K2 is run on each [new] individual each generation


Each chromosome must be scored


Final network has many graph errors


SLAM GA


For small networks, gold standard network often recovered.


Relatively few graph errors for final network.


Less computationally intensive


Initial population most expensive


Each chromosome must be scored

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

SLAM GA: Ramifications


Effective structure learning algorithm


Ideal for small networks


Improvement over GASLEAK


SLAM GA faster in spite of same GA parameters


SLAM GA more accurate


Improvement over K2


Aggregate algorithm produces better initial population


Parent
-
swapping crossover technique effective


Diversifies search space while retaining past information


Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

SLAM GA: Future Work


Parameter tweaking


Better fitness function


Several ‘bad’ structures score better than gold standard


GA works fine


‘Intelligent’ mutation operator


Add edges from pre
-
qualified set of candidate parents


New instantiation methods


Use GASLEAK


Other structure
-
learning algorithms


Scalability


Job farm

Kansas State University

Department of Computing and Information Sciences

Ben Perry


M.S. thesis defense

Summary


Bayesian Network


Genetic Algorithms


Learning Structure: K2, Sparse Candidate


GASLEAK


SLAM GA