# Bayesian network learning

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 8 μήνες)

116 εμφανίσεις

Bayesian network learning

By Zuko Fani and Victor Zimu

Introduction

The attributes of a domain can be
represented by a joint probability
distribution.

P(X
1
,X
2
,…,X
N
) can be represented as a
chain of conditional distributions where
each node given its parents.

P(X
1
,X
2
,…,X
N
) =

P(X
i
|Parents(X
i
))

Introduction

The BN is a simplification of the joint
probability distribution underlying the
data, using conditional independencies.

Graph Theory

Graph theory offers intuitively appealing
interface for humans to visualize a
representation of intersecting sets of
variables.

Graphical model can compactly
represent a joint probability distribution.

Directed Acyclic Graph

There has to be a one
-
to
-
one
correspondence between the variables in the
joint probability distribution and nodes in the
probabilistic graph model.

The structure of the probabilistic graphical
model represents a directed acyclic graph
(DAG).

A set of directed arcs (or links) connects pairs
of nodes (random variables), representing the
direct dependencies between variables.

Directed Acyclic Graph

The graph represents direct
dependencies by an arrow (edge) from
a node X to node Y, X is said to be a
parent of Y and Y is said be the child or
descendant of its ancestor X.

The important constraint on identifying a
DAG is that there must not be any
directed cycles: it should not be
directed arcs.

Definition

Conditional probability table contained
in the DAG can represent the
conditional probability distribution of the
node.

Each row in a conditional probability
table contains the conditional probability
of each node value for the possible
combination of values for the parent
nodes.

Markov Blanket

Each node is conditionally independent
of all others givens Markov Blanket:

parents + children + children’s parents

Markov Blanket

Bayesian Network

Learning

an

optimal

probabilistic

model

is

known

to

be

NP
-
hard

and

performing

optimal

inference

search

is

also

known

to

NP
-
hard

Firstly,

the

construction

of

the

model

becomes

a

off

between

avoiding

over

fitting

the

data

and

avoiding

model

complexity
.

Secondly,

the

other

purpose

of

the

model

is

to

summarize

or

compress

the

data

distribution

from

the

database
.

Bayesian Networks

Bayesian Networks

From the figure an attribute is a node.

WetGrass and Cloudy are conditionally
independent, conditionally on knowing
the value of Sprinkler and Rain.

Absence of an arc can also show
conditional independence.

Concept

Given data instances: find the best BN
that models the observed correlations.

The optimal BN, is chosen according to
-
off between
complexity (number of arcs, number of
parameters) versus accuracy

Scoring Metric

Finding the best BN: computationally
non
-
tractable: too many possibilities to
consider

Different metrics evaluate the quality of
the network differently.

Scoring Metric

Bayesian metric :

The best network is the most probable one

Maximize the posterior probability of the
model given the data:

arg max
M

P(M|D) = arg max
M

P(D|M)P(M)

Search Strategy

After the scoring follows the search
algorithm which is responsible for
choosing the maximum score

In the case of Hillclimbing approach the
model with the highest score is chosen.

Search Strategy

Within the Hillclimbing approach is the
K2

K2 has domain specific approach
where the variables are arranged in
ancestral order before the search
algorithm

K2 is a dedicated search algorithm
where the parent
-
child attribute order is
known.

K2 search algorithm

do
{

(1) find a new BN structure candidate

(2) learn the conditional probabilities

(3) score the candidate using the metric

(4) update the best found structure

}
while
(
not
stop
-
criterion
-
reached)

Inference

Inference computes the probability
distribution of any combination of other
attributes.

P(Rain)

P(Rain | WetGrass = Yes, Cloudy = No)

P(Rain | WetGrass = Yes, Cloudy = No,
Sprinkler = On)

Inference

Exact inference can become intractable
for highly connected Bayesian
networks.

To solve this problem approximate
solutions are conducted through
sampling.

Approximate Inference

T
he importance sampling algorithm is
given an evidence file in which the
states of certain nodes are already
given.

Using this evidence file and the
conditional probability table of each
node, the program generates sample
instantiations of the network (i.e.,
defines the state of each node in the
network).

Approximate Inference

From these sample instantiations, it can
find the approximate probabilities of
each state for each node in the network.

The
examples of approximate inference
solutions

Logic Sampling

Likelihood Weighting
.

Reference

Richard Neopolitan Book on Bayesian
Networks

Software Packages

Weka

Kevin Murphy Matlab BNT