Bayesian network learning

lettuceescargatoireΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

90 εμφανίσεις

Bayesian network learning

By Zuko Fani and Victor Zimu

Introduction


The attributes of a domain can be
represented by a joint probability
distribution.


P(X
1
,X
2
,…,X
N
) can be represented as a
chain of conditional distributions where
each node given its parents.


P(X
1
,X
2
,…,X
N
) =

P(X
i
|Parents(X
i
))


Introduction


The BN is a simplification of the joint
probability distribution underlying the
data, using conditional independencies.



Graph Theory


Graph theory offers intuitively appealing
interface for humans to visualize a
representation of intersecting sets of
variables.


Graphical model can compactly
represent a joint probability distribution.


Directed Acyclic Graph


There has to be a one
-
to
-
one
correspondence between the variables in the
joint probability distribution and nodes in the
probabilistic graph model.


The structure of the probabilistic graphical
model represents a directed acyclic graph
(DAG).


A set of directed arcs (or links) connects pairs
of nodes (random variables), representing the
direct dependencies between variables.

Directed Acyclic Graph


The graph represents direct
dependencies by an arrow (edge) from
a node X to node Y, X is said to be a
parent of Y and Y is said be the child or
descendant of its ancestor X.


The important constraint on identifying a
DAG is that there must not be any
directed cycles: it should not be
possible return to a node by following
directed arcs.


Definition



Conditional probability table contained
in the DAG can represent the
conditional probability distribution of the
node.


Each row in a conditional probability
table contains the conditional probability
of each node value for the possible
combination of values for the parent
nodes.


Markov Blanket


Each node is conditionally independent
of all others givens Markov Blanket:


parents + children + children’s parents



Markov Blanket

Bayesian Network


Learning

an

optimal

probabilistic

model

is

known

to

be

NP
-
hard

and

performing

optimal

inference

search

is

also

known

to

NP
-
hard



Firstly,

the

construction

of

the

model

becomes

a

trade

off

between

avoiding

over

fitting

the

data

and

avoiding

model

complexity
.



Secondly,

the

other

purpose

of

the

model

is

to

summarize

or

compress

the

data

distribution

from

the

database
.


Bayesian Networks

Bayesian Networks


From the figure an attribute is a node.


WetGrass and Cloudy are conditionally
independent, conditionally on knowing
the value of Sprinkler and Rain.


Absence of an arc can also show
conditional independence.


Concept


Given data instances: find the best BN
that models the observed correlations.


The optimal BN, is chosen according to
a scoring metric: Trade
-
off between
complexity (number of arcs, number of
parameters) versus accuracy

Scoring Metric


Finding the best BN: computationally
non
-
tractable: too many possibilities to
consider


Different metrics evaluate the quality of
the network differently.


Scoring Metric


Bayesian metric :


The best network is the most probable one


Maximize the posterior probability of the
model given the data:


arg max
M

P(M|D) = arg max
M

P(D|M)P(M)



Search Strategy


After the scoring follows the search
algorithm which is responsible for
choosing the maximum score


In the case of Hillclimbing approach the
model with the highest score is chosen.



Search Strategy


Within the Hillclimbing approach is the
K2


K2 has domain specific approach
where the variables are arranged in
ancestral order before the search
algorithm


K2 is a dedicated search algorithm
where the parent
-
child attribute order is
known.


K2 search algorithm


do
{


(1) find a new BN structure candidate


(2) learn the conditional probabilities


(3) score the candidate using the metric


(4) update the best found structure


}
while
(
not
stop
-
criterion
-
reached)




Inference


Inference computes the probability
distribution of any combination of other
attributes.


P(Rain)


P(Rain | WetGrass = Yes, Cloudy = No)


P(Rain | WetGrass = Yes, Cloudy = No,
Sprinkler = On)


Inference


Exact inference can become intractable
for highly connected Bayesian
networks.


To solve this problem approximate
solutions are conducted through
sampling.


Approximate Inference


T
he importance sampling algorithm is
given an evidence file in which the
states of certain nodes are already
given.


Using this evidence file and the
conditional probability table of each
node, the program generates sample
instantiations of the network (i.e.,
defines the state of each node in the
network).


Approximate Inference


From these sample instantiations, it can
find the approximate probabilities of
each state for each node in the network.


The
examples of approximate inference
solutions


Logic Sampling


Likelihood Weighting
.



Reference



Richard Neopolitan Book on Bayesian
Networks


Software Packages


Weka


Kevin Murphy Matlab BNT