# P - IIIA - CSIC

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 6 μήνες)

75 εμφανίσεις

Bayesian Networks

A
causal probabilistic network
, or
Bayesian network
,

is an directed acyclic graph (DAG) where nodes

represent
variables

dependency

relations
,

e.g. of the type cause
-
effect, between variables

and quantified by (conditional)
probabilities

Qualitative component + quantitative component

Bayesian Networks

Qualitative component

:

relations of conditional dependence / independence

I(A, B | C)
: A and B are independent given C

I(A, B)

= I(A, B | Ø): A and B are
a priori

independent

Formal study of the properties of the ternary relation
I

A Bayesian network may encode
three fundamental types

of relations

among neighbour variables.

Qualitative Relations : type I

F

G

H

Ex
: F: smoke,

G: bronchitis,

H: respiratory problems (dyspnea)

Relations:

¬ I(F, H)

I(F, H | G)

Qualitative Relations : type II

E

F

G

Ex
: F: smoke,

G: bronchitis,

E: lung cancer

Relations:

¬ I(E, G)

I(E, G | F)

Qualitative Relations : type III

B

C

E

Ex
: C: alarm,

B: movement detection,

E: rain

Relations:

I(B, E)

¬ I(B, E | C)

Probabilistic

component

Qualitative knowledge
: a directed acyclic graph
G

(DAG)

Nodes(G) = V = {X
1
, …, X
n
}
--

discrete variables
--

Edges(G)

V
x
V

Parents(X
i
) = {X
i

: (X
j
, X
i
)

Edges(G)}

Probabilistic knowledge
: P(X
i

| parents(X
i
))

These probabilities determine a
joint probability distribution

P

over V = {X
1
, …, X
n
}:

P(X
1
, …, X
n
) = P(X
1

| parents(X
1
)) ∙ ∙ ∙ P(X
n

| parents(X
n
))

Bayesian Network =
(
G
,
P
)

Joint Distribution

P(X
1
,X
2
,...X
n
) = P(X
n
|X
n
-
1
,...X
1
) ... P(X
3
|X
2
,X
1
) P(X
2
|X
1
) P(X
1
).

Independence relations

of each variable X
i

with the set of
predecessor variables of the parents of X
i
:

P(X
i

|
parents
(X
i
), Y
1
,.., Y
k
) = P(X
i

|
parents
(X
i
))

P(X
1
, X
2
, ..., X
n
) =

i=1,n

P(X
i

|
parents
(X
i
))

• to have in each node X
i

the conditional probability
distribution P(X
i

| parents(X
i
)) is enough to determine the full
joint probability distribution P(X
1
,X
2
,...,X
n
)

Example

P(A)
:

P(a)

=

0
.
01

P(B

|

A)
:

P(b

|

a)

=

0
.
05
,

P(b

|

¬a)

=

0
.
01

P(C

|

B,E)
:

P(c

|

b,

e)

=

1
,

P(c

|

b,

¬e)

=

1
,

P(c

|

¬b,

e)

=

1
,

P(c

|

¬b,

¬e)

=

0

P(F)
:

P(f)

=

0
.
5

P(D

|

C)
:

P(d

|

c)

=

.
98
,

P(d

|

¬c)

=

0
.
05

P(E

|

F)
:

P(e

|

f)

=

0
.
1
,

P(e

|

¬f)

=

0
.
01

P(G

|

F)
:

P(g

|

f)

=

0
.
6
,

P(g

|

¬f)

=

0
.
3

P(H

|

C,

G)
:

P(h

|

c,g)

=
0
.
9

,

P(h

|

c,¬g)

=

0
.
7
,

P(h

|

¬c,g)

=

0
.
8
,

P(h

|

¬c,¬g)

=

0
.
1
,

P(A,B,C,D,E,F,G,H)

=

P(D

|

C)

P(H

|

C,

G)

P(C

|

B,

E)

P(G

|

F)

P(E

|

F)

P(F)

P(B

|

A)

P(A)

P(a,¬b,c,¬d,e,f,g,¬h)

=

P(¬d

|c)

P(¬h

|c,g)

P(c

|

¬b,e)

P(g

|

f)

P(e

|

f)

P(f)

P(¬b

|

a)

P(a)

=

(
1
-

0
.
98
)

(
1
-
0
.
9
)

1

0
.
6

0
.
1

0
.
5

(
1
-
0
.
05
)

0
.
01

=

5
,
7

10
-
7
.

A: visit to Asia

B: tuberculosis

F: smoke

E: lung cancer

G: bronchitis

C: B or E

D: X
-
ray

H: dyspnea

D
-
separation relations and

probabilistic independence

Goal
: precesely determine which independence relations
(graphically) are defined by one DAG.

Previous definitions:

A

path

is

a

sequence

of

connected

nodes

in

the

graph
.

A

non

directed

path

is

a

path

without

taking

into

account

the

directions

of

the

arrows
.

A

-
to
-

in

a

node

is

a

(non

directed)

path

of

the

form

x

y

w,

the

node

y

is

clalled

a

-
to
-

node
.

D
-
separation

A

path

c

is

called

to

be

activated

by

a

set

of

nodes

Z

if

the

following

two

conditions

are

satisfied
:

1)
Every

node

in

c

with

-
to
-

is

in

Z

or

it

has

a

descendent

in

Z
.

2)
Any

other

node

in

c

does

not

belong

to

Z
.

Otherwise,

the

path

c

is

called

to

be

blocked

by

Z
.

Definition
.
If X, Y and Z are three disjoint subsets of nodes
disjunts in a DAG G, then Z
d
-
separates

X from Y, or
equivalently
X and Y are graphically independent given Z
,
when all the paths between any node from X and any node
from Y are blocked by Z

D
-
separation

Theorem
. Let G be a DAG and let X,Y and Z be subsets of

nodes such that X and Y are d
-
separated by Z.

Then, X and Y are conditionally independent from Z

for any probability P such that (G, P) is a causal network over G,

that is, s.t. P(X | Y,Z) = P(X | Z) and P(Y | X,Z) = P(Y | Z).

{B} and {C} are d
-
separated by {A}:

Path B
-
E
-
C: E,G

{A}

-

{A} blocks the path B
-
E
-
C

Path B
-
A
-
C:

-

{A} blocks the path B
-
A
-
C

Inference in Bayesian Networks

Knowledge about a domain encoded by a Bayesian
network XB = (G, P).

Inference = updating probabilities
:
evidence E on values
taken by some variables modify the probabilities of the rest
of variables

P(X)
---

> P’(X) = P(X | E)

Direct Method
:

XB = < G = {A,B,C,D,E}, P(A,B,C,D,E) >

Evidence: A = a
i
, B = b
j

P(C = c
k

| A = a
i
, B = b
j
) =

Inference in Bayesian Networks

Bayesian networks allow
local computations
, which
exploit the indepence relations among variables

explictly
induced by the corresponding DAG of the networks.

They allow updating the probability of a variable using
only the probabilities of the immediat predecessor nodes
(parents), and in this way, step by step to update the
probabilities of all non
-
instantiated variables in the network
---
>
propagation methods

Two main propagation methods
:

Pearl method: message passing over the DAG

Lauritzen & Spiegelhalter method: previous
transformation of the DAG in a tree of
cliques

Propagation method

in trees of cliques

1)
transformation of initial network in another graphical
structure, a tree of cliques (subsets de nodes)

equivalent probabilistic information

BN = (G, P)

----
>

[Tree, P]

2)
propagation algorithm over the new structure

Graphical Transformation

Definition
: a “clique” in a non
-
directed graph is a complete
and maximal subgraph

To transform a DAG G in a tree of cliques
:

1)
Delete directions in edges of G: G’

2)
Moralization of G’: add edges between nodes with
common sons in the original DAG G: G’’

3)
Triangularization of G’’ : G*

4)
Identification of the cliques in G*

5)
Suitable enumeration of the cliques (Running Inters.
Prop.)

6)
Construction of the tree according to the enumeration

Example

(1)

1)

2)

Example (2):
triangularization

3)

3)

Example (3): cliques

Cliques
:

{A,B},

{B,C,E},

{E,F,G},

{C,E,G},

{C,G,H},

{C,D}

Cliques
:

4)

Ordering of cliques

Enumeration of cliques Clq
1
, Clq
2
, …, Clq
n

such that the
following property holds:

Running Intersection Property
:

for all i=1,…, n there exists j < i such that S
i


Clq
j

, where

S
i
= Clq
i

(Clq
1

Clq
2

...

Clq
i
-
1
).

This

property

is

guaranteed

if
:

(i)

nodes

of

the

graph

are

enumerated

following

the

criterion

of

“maximum

cardinality

search”

(ii)

cliques

are

ordered

according

to

the

node

of

the

clique

with

a

highest

ranking

in

the

former

enumaration
.

Example (4): ordering cliques

1

2

4

8

7

3

6

5

Clq
1

= {A,B}, Clq
2

= {B,E,C}, Clq
3

= {E,C,G},

Clq
4

= {E,G,F}, Clq
5

= {C,G,H}, Clq
6

= {C,D}

Tree Construction

Let [Clq
1
, Clq
2
, …, Clq
n
] be an ordering satisfying R.I.P.

For each clique
Clq
i
, define

S
i

=

Clq
i

(Clq
1

Clq
2

...

Clq
i
-
1
)

R
i

=

Clq
i
-
S
i
.

Tree of cliques
:

-

(hyper) nodes: cliques

-

root: Clq
1

-

for each clique Clq
i
, its “father” candidates are

cliques
Clq
k

with k < i and s.t.
S
i

Clq
k

(if more than one candidate, random selection)

Example (5): trees

S
2

=

Clq
2

Clq
1

{

Clq
1

S
3

=

Clq
3

(Clq
1

Clq
2
)

{E,C

Clq
2

S
4

=

Clq
4

(Clq
1

Clq
2

Clq
3
)

{

G

Clq
3

S
5

=

Clq
5

(Clq
1

Clq
2

Clq
3
.

Clq
4
)

{C,G

Clq
3

S
6

=

Clq
6

(

Clq
1

Clq
2

Clq
3
.

Clq
4

Clq
5
)

{C

Clq
2
,

Clq
3
,

Clq
5

Propagation Algorithm

Potential Representation

of the distribution P(X1, …, Xn):

([W
1
...W
p
],

) is a potential representation of P, where the W
i

are subsets of V = {X1, …, Xn}, if
P(V) =

In

a

Bayesian

network

(
G
,

P
)
:

P(X
1
,

...
,

X
n
)

=

P(X
n
|

parents
(X
n
))∙
...

P(X
1
|

parents
(X
1
))

P(X
1
, ..., X
n
) =

(Clq
1
) ∙

(Clq
2
) ∙ ...

(Clq
m
)

with

(Clq
i
)=

∏{P(X
j

|

parents
(X
j
))

|

X
j

Clq
i
,

parents
(X
j
)


Clq
i

,

Propagation Algorithm (2)

Fundamental property

of the potential representations:

Let ([W
1
, ..., W
m
],

) be a potential representation for P.

Evidence: X
3

= a and X
5

= b.

Problem: update the probabilitaty

P’(X
1
, ..., X
n
) = P(X
1
, ..., X
n
| X
3
=a,X
5

= b)

??

Define
:

W^
i

= W
i

-

{X
3
, X
5
}

^(W^
i
) =

(W
i

(X
3
=a, X
5
=b))

([W^
1
, ..., W^
m
],

^) is a potential representation for P'
.

Example (6): potentials

Clq
1

= {A,B}, Clq
2

= {B,E,C},

Clq
3

= {E,C,G}, Clq
4

= {E,G,F},

Clq
5

= {C,G,H}, Clq
6

= {C,D}

(Clq
1
) = P(A)∙ P(B | A)

(Clq
2
) = P(C | B,E),

(Clq
3
) = 1

(Clq
4
) = P(F).P(E | F).P(G | F),

(Clq
5
) = P(H | C, G)

(Clq
6
) = P(D | C)

P(A,B,C,D,E,F,G,H)

=

P(D

|

C)

P(H

|

C,

G)

P(C

|

B,

E)

P(G

|

F)

P(E

|

F)

P(F)

P(B

|

A)

P(A)

P(A,B,C,D,E,F,G,H) =

(Clq
1
)

….

(Clq
6
)

Example(6): potentials

(Clq
1
) = P(A)∙ P(B | A)

(a,b)

=

P(a)

P(b

|

a)

=

0
.
005

(¬a,b)

=

P(¬a)

P(b

|

¬a)

=

0
.
0099

(a,¬b)

=

P(a)

P(¬b

|

a)

=

0
.
0095

(¬a,¬b)

=

P(¬a)

P(¬b

|

¬a)

=

0
.
9801

(Clq
5
) = P(H | C, G)

(c,g,h)

=

P(h

|

c,g)

=

0
.
9

(c,g,¬h)

=

P(¬h

|

c,g)

=

0
.
1

(c,¬g,h)

=

P(h

|

c,¬g)

=

0
.
7

(c,¬g,¬h)

=

P(¬h

|

c,¬g)

=

0
.
3

(¬c,g,h)

=

P(h

|

¬c,g)

=

0
.
8

(¬c,g,¬h)

=

P(¬h

|

¬c,g)

=

0
.
2

(¬c,¬g,h)

=

P(h

|

¬c,¬g)

=

0
.
1

(¬c,¬g,¬h)

=

P(¬h

|

¬c,¬g)

=

0
.
9

Propagation algorithm:

theoretical resultats

Causal network (
G
,
P
)

([Clq
1
, ..., Clq
p
],

) is a potential representation for P

1) P(Clq
i
) = P(R
i
|S
i
).P(S
i
)

2) P(R
p
|S
p
) = , where is the marginal

of the function

with respect to the variables of R
p
.

3) If father(Clq
p
) = Clq
j,

then ([Clq
1
,...Clq
p
-
1
],

') is a potential
representation for the marginal distribution of P(V
-
R
p
) where:

'(Clq
i
)=

Clq
i
) for all i

j, i < p

'(Clq
j
)=

Clq
j
)

Propagation algorithm:

step by step (2)

Goal
: to compute P(Clq
i
) for all cliques.

Two graph traversals: one
bottom
-
up

and one
top
-
down

BU
p

. Combining properties 2 i 3 we
have, an iterative form of computing the conditional
distributions P(R
i
|S
i
) in each clique until reaching the root
clique Clq
1
.

Root: P(Clq
1
)=P(R
1
|S
1
).

TD
) P(S
2
)= , and from there P(S
i
)=

--
we can always compute in a clique Clq
i

the distribution P(S
i
) whenever
we have already computed the distribution of its father clique Clq
j
--

P(R
i

| S
i
)

P(S
i
)

P(Clq
i
) = P(R
i
,S
i
) = P(R
i

| S
i
) P(S
i
)

Clq
i

P(R
i
|S
i
) = =

(
Clq
i
)

Ri

(
Clq
i
)

(
Clq
i
)

(
Clq
i
)

’(
Clq
i
) =

(
Clq
i
)

j
(S
j
)

k
(S
k
)

Clq
i

Clq
j

Clq
k

Clq
i

Clq
j

Clq
k

(
Clq
i
)

i
(
S
i
)

Case 1)

Case 2)

6
(
S
6
)

5
(
S
5
)

4
(
S
4
)

3
(
S
3
)

2
(
S
2
)

Example (7)

A)
Bottom
-
up traversal
: passing

k
(S
k
) =

Rk

(Clq
k
),

Clique Clq
6

= {C,D}

(R
6
= {D}, S
6

= {C}).

P(R
6
|S
6
) = P(D | C) =

6
(c) =

(c, d) +

(c, ¬d) = 0.98 + 0.02 = 1

6
(¬c) =

(¬c, d) +

(¬c, ¬d) = 0.05 + 0.95 = 1,

P(d | c) =

P(¬d | c) = 0.02

P(d | ¬c) =

P(¬d | ¬c) = 0.95

Example (7)

Clique

Clq
5

=

{C,

G,

H}

(R
5

=

{H},

S
5

=

{C,

G})
.

This

node

is

clique

Clq
6
’s

father
.

According

to

point

[
3
],

we

modify

the

potential

function

of

the

clique

Clq
5
:

'(Clq
5
)=

Clq
5
)

P(R
5

|

S
5
)

=

P(H

|

C,G)

=

where

5
(C,G)

=

5
(c,g)

=

'(c,

g,

h)

+

'(e,

g,

¬h)

=

0
.
9

+

0
.
1

=

1

5
(c,¬g)

=

'(c,

¬g,

f)

+

'(c,

¬g,

¬h)

=

0
.
7

+

0
.
3

=

1

5
(¬c,g)

=

=

5
(c,¬g)

=

...
=

1

Exemple (7)

Clique Clq
3

= {E,C,G}

(R
3

= {G}, S
3

= {E,C})

Clq
3

is father of two cliques: Clq
4

and Clq
5
,

'(Clq
3
) =

Clq
3
)

R


(Clq
4
) ∙

R


(Clq
5
)

=

(Clq
5
) ∙

4
(S
4
) ∙

5
(S
5
)

'(E,C,G) =

E,C,G) ∙

4
(E,G) ∙

5
(C,G)

P(R
3
| S
3
) = P(G | E, C) =

where

3
(E,C) =

Example (7)

Root: Clique Clq
1

= {A, B}

(R
1

= {A, B}, S
1

=

).

'(A,B)=

A,B) ∙

2
(B)

P(R
1
) = P(R
1
| S
1
) =

where

1

=

'(a,b) +

'(a,¬b)+

'(¬a,b)+

'(¬a,¬b).

P(A,B) =

A,B) :

P(a,b) = 0.005, P(a, ¬b) = 0.0095,

P(¬a, b) = 0.099, P(¬a, ¬b) = 0.9801

Clq
i

Clq
j

Clq
k

P(Clq
i
) = P(R
i
|S
i
).P(S
i
)

P(S
k
) =

Clq
i
-
S
k
P(
Clq
i
) =

i
(S
k
)

P(S
j
) =

Clq
i
-
S
j
P(
Clq
i
) =

i
(S
j
)

5
(
S
6
)

3
(
S
5
)

3
(
S
4
)

2
(
S
3
)

1
(
S
2
)

Example (7)

A)
Top
-
down traversal
:

Clique Clq
2

= {B,E,C}

(R
2

= {E,C}, S
2

= {B}).

P(B) = P(S
2
) =

P(b) = P(a, b) + P(¬a, b) = 0.005 + 0.099 = 0.104 ,

P(¬b) = P(a, ¬b) + P(¬a, ¬b) = 1
-

0.104 = 0.896

*** P(Clq
2
) = P(R
2

| S
2
) ∙ P(S
2
)

Example (7)

Clique

Clq
3

=

{E,C,G}

(R
3

=

G,

S
3

=

{E,C})
.

we

have

to

compute

P(S
3
)

i

P(Clq
3
)

Clique

Clq
4

={E,

G,

F}

(R
4

=

{F},

S
4

=

{E,G})
.

we

have

to

compute

P(S
4
)

i

P(Clq
4
)

Clique

Clq
5

=

{C,

G,

H}

(R
5

=

{H},

S
5

=

{C,

G})
.

we

have

to

compute

P(S
5
)

i

P(Clq
5
)

Clique

Clq
6

=

{C,D}

(R
6
=

{D},

S
6

=

{C})
.

we

have

to

compute

P(S
6
)

i

P(Clq
6
)

Summary

Given a Bayesian network BN = (G, P), we have seen how

1)

To
transform

G into a tree of cliques and factorize P as

P(X
1
, ..., X
n
) =

(Clq
1
) ∙

(Clq
2
) ∙...

(Clq
m
)

where

(Clq
i
)= ∏{P(X
j

|
parents
(X
j
)) | X
j

Clq
i
, parents
(X
j
)


Clq
i

,

2)

To
compute the probabilty distributions

P(Clq
i
)

with a
propagation algorithm, and from there, to compute the
probabilities P(X
j
) for X
j

Clq
i
, by marginalization.

Probability updating

It remains to see how to
perform inference
,

i.e. how to update probabilities P(X
j
) when some information
(evidence E) is available about some variables:

P(X
j
)

---

>
P*(X
j
) = P(X
j

| E)

The
updating mechanism

is based in a fundamental
property of the potential representations when applied to
P(X
1
, ..., X
n
)
and its potential representation in terms of
cliques:

P(X
1
, ..., X
n
) =

(Clq
1
) ∙

(Clq
2
) ∙...

(Clq
m
)

Updating mechanism

Recall
:

Let ([Clq
1
, ..., Clq
m
],

) be a potential representation for

P(X
1
, …, X
n
).

We observe: X
3

= a and X
5

= b.

Actualització de la probabilitat:

P*
(X
1
,X
2
,X
4
,X
6
,..., X
n
) = P(X
1
, ...,X
n
| X
3
=a,X
5

= b)

Define:

Clq^
i

= Clq
i

-

{X
3
, X
5
}

^(Clq^
i
) =

(Clq
i

(X
3
=a, X
5
=b))

([Clq^
1
, ..., Clq^
m
],

^) is a potential representation for
P*
.

Updating mechanism

Based

on

three

steps
:

A)
build

the

new

tree

of

cliques

obtained

by

deleting

from

the

original

tree

the

instantiated

variables,

B)

re
-
compute

the

new

potential

functions

^

corresponding

to

the

new

cliques

and,

finally,

C)

apply

the

propagation

algorithm

over

the

new

tree

of

cliques

and

potential

functions
.

A,B

B,E,C

E,C,G

E,G,F

C,G,H

C,D

Clq
1

Clq
2

Clq
3

Clq
4

Clq
5

Clq
6

B

B,E,C

E,C,G

E,G,F

C,G

C,D

Clq’
1

Clq’
2

Clq’
3

Clq’
4

Clq’
5

Clq’
6

A = a, H = b

P(X
j
)

P*(X
j
) = P(X
j

| X=a,H=h)

A = a, H = b

A = a, H = b

P(D = d | A = a, H = h) ?