An introduction
to Bayesian
networks
Stochastic Processes Course
Hossein Amirkhani
Spring
2011
2
An introduction to Bayesian networks
Outline
Introduction,
Bayesian Networks
,
Probabilistic Graphical Models,
Conditional Independence,
I

equivalence.
3
An introduction to Bayesian networks
Introduction
Our goal is to
represent a joint distribution
𝑃
over
some set of random variables
=
1
,
…
,
𝑛
.
Even in the
simplest
case where these
variables are
binary

valued,
a joint distribution
requires the
specification of
2
𝑛
−
1
numbers.
The
representation
of the
joint distribution
is
from every
perspective:
Computationally, Cognitively, and Statistically.
4
An introduction to Bayesian networks
Bayesian Networks
Bayesian networks
exploit
properties of the distribution in order to allow a
compact and
natural representation.
They are a specific type of
.
BNs
are directed acyclic graphs (
).
5
An introduction to Bayesian networks
Probabilistic Graphical Models
Nodes
are the random variables in our
domain.
Edges
correspond, intuitively, to direct influence of
one node on another.
Factor Graph
Markov Random Field
Bayesian Network
6
An introduction to Bayesian networks
Probabilistic Graphical Models
Graphs are an intuitive way of representing and
visualising
the relationships
between many
variables.
A
graph allows us to abstract out the conditional
independence relationships
between the
variables from
the details of their parametric forms.
Thus
we can answer
questions like
: “Is A dependent on B
given that we know the value of C ?” just by looking
at
the
graph
.
Graphical models allow us to define general
algorithms
that implement
probabilistic
efficiently.
Graphical models = statistics
×
graph theory
×
computer science.
7
An introduction to Bayesian networks
Bayesian Networks
8
An introduction to Bayesian networks
Bayesian
Networks
9
An introduction to Bayesian networks
Conditional Independence: Example 1
tail

to

tail at c
10
An introduction to Bayesian networks
Conditional Independence: Example
1
11
An introduction to Bayesian networks
Conditional Independence: Example 1
Smoking
Lung Cancer
Yellow Teeth
12
An introduction to Bayesian networks
Conditional Independence: Example
2
head

to

tail at c
13
An introduction to Bayesian networks
Conditional Independence: Example
2
14
An introduction to Bayesian networks
Conditional Independence: Example
2
Type of Car
Speed
Amount of speeding Fine
15
An introduction to Bayesian networks
Conditional Independence: Example
3
head

to

head at c
v

structure
16
An introduction to Bayesian networks
Conditional Independence: Example
3
17
An introduction to Bayesian networks
Conditional Independence: Example 3
Ability of team A
Ability of team B
Outcome of A vs. B game
18
An introduction to Bayesian networks
D

separation
•
A, B, and C are non

intersecting subsets of nodes in a directed
graph.
•
A path from A to B is
if it contains a node such that
either
a)
the arrows on the path meet either head

to

tail or tail

to

tail at the node, and the node is in the set C, or
b)
the arrows meet head

to

head at the node, and neither
the node, nor any of its descendants, are in the set C.
•
If all paths from A to B are blocked, A is said to be d

separated
from B by C.
•
If A is d

separated from B by C, the joint distribution over all
variables in the graph satisfies .
19
An introduction to Bayesian networks
I

equivalence
Let
𝑃
be a distribution over
.
We
define
to
be the set of independence assertions
that
hold in
𝑃
.
Two graph structures
𝐾
1
and
𝐾
2
over
are
if
𝐼
𝐾
1
=
𝐼
𝐾
2
.
The
set
of all graphs over
X is partitioned into a set
of mutually exclusive and exhaustive I

equivalence
classes.
20
An introduction to Bayesian networks
The skeleton of a Bayesian network
The
of a Bayesian network graph
𝐺
over
is an undirected graph over
that
contains an
edge
{
,
}
for every edge
(
,
)
in
𝐺
.
21
An introduction to Bayesian networks
Immorality
A
→
←
is an
if
there is no direct edge between X and Y.
22
An introduction to Bayesian networks
Relationship between immorality,
skeleton and I

equivalence
Let
𝐺
1
and
𝐺
2
be two graphs over
. Then
𝐺
1
and
𝐺
2
have
the same
and the same set
of
they are
We can use this theorem to recognize that whether two BNs
are I

equivalent or not.
In addition, this theorem can be used for
the
structure of the Bayesian network related to a distribution.
We can construct the I

equivalence class for a distribution by
determining its
skeleton and
its immoralities from the
independence properties of the given
distribution
.
We
then
use both
of these components to build a
representation of the equivalence class.
23
An introduction to Bayesian networks
Identifying the Undirected Skeleton
The
is to use
independence
queries of the
form
⊥

𝑼
for different
sets of
variables
𝑼
.
If
and
are adjacent in
𝐺
∗
,
we cannot
separate them
with any set of variables
.
Conversely, if
and
are not adjacent in
𝐺
∗
,
we
would hope to be able to find a set of variables that
makes these two variables conditionally independent:
we call this set a
of their independence.
24
An introduction to Bayesian networks
Identifying the Undirected Skeleton
Let
𝐺
∗
be an I

map
of a
distribution
𝑃
, and let
and
be two
variables
that are not
adjacent in
𝐺
∗
. Then either
𝑃
⊨
⊥

𝑃
𝐺
∗
or
𝑃
⊨
⊥

𝑃
𝐺
∗
.
Thus, if
and
are not adjacent in
𝐺
∗
, then we can
find a witness of
.
Thus, if we assume that
𝐺
∗
has bounded
, say
less than or equal to
d,
then we do not need to consider
witness sets larger than
d.
25
An introduction to Bayesian networks
26
An introduction to Bayesian networks
Identifying Immoralities
At this stage we have reconstructed the
undirected
skeleton
. Now, we
want to reconstruct
edge direction
.
Our
goal is to consider
in the
skeleton
and for
each one determine whether it is
indeed an immorality
.
A triplet of variables
X, Z, Y
is
a potential
immorality
if
the skeleton contains
−
−
but does not
contain
an edge
between
X
and
Y
.
A potential immorality
−
−
is an immorality
Z
is not in the witness set(s) for
X
and
Y.
27
An introduction to Bayesian networks
28
An introduction to Bayesian networks
Representing Equivalence Classes
An acyclic graph containing both directed and
undirected edges is called a
or
.
29
An introduction to Bayesian networks
Representing Equivalence Classes
Let
𝐺
be a DAG. A
chain graph
𝐾
is a
of
the
equivalence
class
of
𝐺
if shares the
same skeleton
as
𝐺
, and contains a directed edge
→
all
𝐺
′
that are
I

equivalent
to
𝐺
contain the edge
→
.
If the edge is directed, then
the members of the
equivalence class agree on the
orientation of
the edge.
If
the edge is undirected, there are
DAGs in the
equivalence class that
disagree with
the orientation of
the edge
.
30
An introduction to Bayesian networks
Representing Equivalence Classes
Is the output of Mark

Immoralities the class PDAG?
Clearly
, edges involved in immoralities must be
directed in
K.
The obvious question is whether
K
can contain directed
edges that are not involved in immoralities.
In other words
, can there be additional edges whose
direction is necessarily the same in every
member of
the equivalence class?
31
An introduction to Bayesian networks
Rules
32
An introduction to Bayesian networks
33
An introduction to Bayesian networks
Example
34
An introduction to Bayesian networks
References
D.
Koller
and N. Friedman:
Probabilistic
Graphical Models. MIT
Press, 2009.
C. M.
Bishop:
Pattern Recognition and Machine
Learning. Springer
,
2006
.
35
An introduction to Bayesian networks
Comments 0
Log in to post a comment