Bayesian network learning
By Zuko Fani and Victor Zimu
Introduction
•
The attributes of a domain can be
represented by a joint probability
distribution.
•
P(X
1
,X
2
,…,X
N
) can be represented as a
chain of conditional distributions where
each node given its parents.
•
P(X
1
,X
2
,…,X
N
) =
∏
P(X
i
Parents(X
i
))
Introduction
•
The BN is a simplification of the joint
probability distribution underlying the
data, using conditional independencies.
Graph Theory
•
Graph theory offers intuitively appealing
interface for humans to visualize a
representation of intersecting sets of
variables.
•
Graphical model can compactly
represent a joint probability distribution.
Directed Acyclic Graph
•
There has to be a one

to

one
correspondence between the variables in the
joint probability distribution and nodes in the
probabilistic graph model.
•
The structure of the probabilistic graphical
model represents a directed acyclic graph
(DAG).
•
A set of directed arcs (or links) connects pairs
of nodes (random variables), representing the
direct dependencies between variables.
Directed Acyclic Graph
•
The graph represents direct
dependencies by an arrow (edge) from
a node X to node Y, X is said to be a
parent of Y and Y is said be the child or
descendant of its ancestor X.
•
The important constraint on identifying a
DAG is that there must not be any
directed cycles: it should not be
possible return to a node by following
directed arcs.
Definition
•
Conditional probability table contained
in the DAG can represent the
conditional probability distribution of the
node.
•
Each row in a conditional probability
table contains the conditional probability
of each node value for the possible
combination of values for the parent
nodes.
Markov Blanket
•
Each node is conditionally independent
of all others givens Markov Blanket:
–
parents + children + children’s parents
Markov Blanket
Bayesian Network
•
Learning
an
optimal
probabilistic
model
is
known
to
be
NP

hard
and
performing
optimal
inference
search
is
also
known
to
NP

hard
•
Firstly,
the
construction
of
the
model
becomes
a
trade
off
between
avoiding
over
fitting
the
data
and
avoiding
model
complexity
.
•
Secondly,
the
other
purpose
of
the
model
is
to
summarize
or
compress
the
data
distribution
from
the
database
.
Bayesian Networks
Bayesian Networks
•
From the figure an attribute is a node.
•
WetGrass and Cloudy are conditionally
independent, conditionally on knowing
the value of Sprinkler and Rain.
•
Absence of an arc can also show
conditional independence.
Concept
•
Given data instances: find the best BN
that models the observed correlations.
•
The optimal BN, is chosen according to
a scoring metric: Trade

off between
complexity (number of arcs, number of
parameters) versus accuracy
Scoring Metric
•
Finding the best BN: computationally
non

tractable: too many possibilities to
consider
•
Different metrics evaluate the quality of
the network differently.
Scoring Metric
•
Bayesian metric :
–
The best network is the most probable one
–
Maximize the posterior probability of the
model given the data:
•
arg max
M
P(MD) = arg max
M
P(DM)P(M)
Search Strategy
•
After the scoring follows the search
algorithm which is responsible for
choosing the maximum score
•
In the case of Hillclimbing approach the
model with the highest score is chosen.
Search Strategy
•
Within the Hillclimbing approach is the
K2
•
K2 has domain specific approach
where the variables are arranged in
ancestral order before the search
algorithm
•
K2 is a dedicated search algorithm
where the parent

child attribute order is
known.
K2 search algorithm
•
do
{
•
(1) find a new BN structure candidate
•
(2) learn the conditional probabilities
•
(3) score the candidate using the metric
•
(4) update the best found structure
•
}
while
(
not
stop

criterion

reached)
Inference
•
Inference computes the probability
distribution of any combination of other
attributes.
–
P(Rain)
–
P(Rain  WetGrass = Yes, Cloudy = No)
–
P(Rain  WetGrass = Yes, Cloudy = No,
Sprinkler = On)
Inference
•
Exact inference can become intractable
for highly connected Bayesian
networks.
•
To solve this problem approximate
solutions are conducted through
sampling.
Approximate Inference
•
T
he importance sampling algorithm is
given an evidence file in which the
states of certain nodes are already
given.
•
Using this evidence file and the
conditional probability table of each
node, the program generates sample
instantiations of the network (i.e.,
defines the state of each node in the
network).
Approximate Inference
•
From these sample instantiations, it can
find the approximate probabilities of
each state for each node in the network.
•
The
examples of approximate inference
solutions
–
Logic Sampling
–
Likelihood Weighting
.
Reference
•
Richard Neopolitan Book on Bayesian
Networks
•
Software Packages
–
Weka
–
Kevin Murphy Matlab BNT
Comments 0
Log in to post a comment