P - IIIA - CSIC

ocelotgiantΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 1 μέρα)

69 εμφανίσεις

Bayesian Networks



A
causal probabilistic network
, or
Bayesian network
,


is an directed acyclic graph (DAG) where nodes


represent
variables

and links represent
dependency

relations
,


e.g. of the type cause
-
effect, between variables


and quantified by (conditional)
probabilities












Qualitative component + quantitative component

Bayesian Networks



Qualitative component

:


relations of conditional dependence / independence


I(A, B | C)
: A and B are independent given C

I(A, B)

= I(A, B | Ø): A and B are
a priori

independent





Formal study of the properties of the ternary relation
I





A Bayesian network may encode
three fundamental types


of relations

among neighbour variables.

Qualitative Relations : type I

F

G

H



Ex
: F: smoke,


G: bronchitis,


H: respiratory problems (dyspnea)



Relations:



¬ I(F, H)





I(F, H | G)

Qualitative Relations : type II

E

F

G


Ex
: F: smoke,


G: bronchitis,



E: lung cancer



Relations:



¬ I(E, G)





I(E, G | F)

Qualitative Relations : type III

B


C


E


Ex
: C: alarm,


B: movement detection,



E: rain



Relations:




I(B, E)




¬ I(B, E | C)

Probabilistic

component



Qualitative knowledge
: a directed acyclic graph
G

(DAG)


Nodes(G) = V = {X
1
, …, X
n
}
--

discrete variables
--

Edges(G)


V
x
V


Parents(X
i
) = {X
i

: (X
j
, X
i
)


Edges(G)}



Probabilistic knowledge
: P(X
i

| parents(X
i
))

These probabilities determine a
joint probability distribution

P

over V = {X
1
, …, X
n
}:

P(X
1
, …, X
n
) = P(X
1

| parents(X
1
)) ∙ ∙ ∙ P(X
n

| parents(X
n
))

Bayesian Network =
(
G
,
P
)

Joint Distribution



P(X
1
,X
2
,...X
n
) = P(X
n
|X
n
-
1
,...X
1
) ... P(X
3
|X
2
,X
1
) P(X
2
|X
1
) P(X
1
).



Independence relations

of each variable X
i

with the set of
predecessor variables of the parents of X
i
:


P(X
i

|
parents
(X
i
), Y
1
,.., Y
k
) = P(X
i

|
parents
(X
i
))


P(X
1
, X
2
, ..., X
n
) =

i=1,n

P(X
i

|
parents
(X
i
))


• to have in each node X
i

the conditional probability
distribution P(X
i

| parents(X
i
)) is enough to determine the full
joint probability distribution P(X
1
,X
2
,...,X
n
)

Example

P(A)
:


P(a)

=

0
.
01

P(B

|

A)
:

P(b

|

a)

=

0
.
05
,

P(b

|

¬a)

=

0
.
01

P(C

|

B,E)
:


P(c

|

b,

e)

=

1
,

P(c

|

b,

¬e)

=

1
,

P(c

|

¬b,

e)

=

1
,

P(c

|

¬b,

¬e)

=

0

P(F)
:


P(f)

=

0
.
5

P(D

|

C)
:

P(d

|

c)

=

.
98
,

P(d

|

¬c)

=

0
.
05

P(E

|

F)
:


P(e

|

f)

=

0
.
1
,

P(e

|

¬f)

=

0
.
01

P(G

|

F)
:

P(g

|

f)

=

0
.
6
,

P(g

|

¬f)

=

0
.
3

P(H

|

C,

G)
:

P(h

|

c,g)

=
0
.
9

,

P(h

|

c,¬g)

=

0
.
7
,

P(h

|

¬c,g)

=

0
.
8
,

P(h

|

¬c,¬g)

=

0
.
1
,


P(A,B,C,D,E,F,G,H)

=

P(D

|

C)

P(H

|

C,

G)

P(C

|

B,

E)

P(G

|

F)

P(E

|

F)

P(F)

P(B

|

A)

P(A)


P(a,¬b,c,¬d,e,f,g,¬h)

=

P(¬d

|c)

P(¬h

|c,g)

P(c

|

¬b,e)

P(g

|

f)

P(e

|

f)

P(f)

P(¬b

|

a)

P(a)



=

(
1
-

0
.
98
)



(
1
-
0
.
9
)



1



0
.
6



0
.
1



0
.
5



(
1
-
0
.
05
)



0
.
01

=

5
,
7



10
-
7
.


A: visit to Asia

B: tuberculosis

F: smoke

E: lung cancer

G: bronchitis

C: B or E

D: X
-
ray


H: dyspnea

D
-
separation relations and


probabilistic independence

Goal
: precesely determine which independence relations
(graphically) are defined by one DAG.


Previous definitions:




A

path

is

a

sequence

of

connected

nodes

in

the

graph
.




A

non

directed

path

is

a

path

without

taking

into

account

the

directions

of

the

arrows
.




A


head
-
to
-
head


link

in

a

node

is

a

(non

directed)

path

of

the

form

x

y

w,

the

node

y

is

clalled

a


head
-
to
-
head


node
.


D
-
separation



A

path

c

is

called

to

be

activated

by

a

set

of

nodes

Z

if

the

following

two

conditions

are

satisfied
:


1)
Every

node

in

c

with

links

head
-
to
-
head

is

in

Z

or

it

has

a

descendent

in

Z
.


2)
Any

other

node

in

c

does

not

belong

to

Z
.

Otherwise,

the

path

c

is

called

to

be

blocked

by

Z
.


Definition
.
If X, Y and Z are three disjoint subsets of nodes
disjunts in a DAG G, then Z
d
-
separates

X from Y, or
equivalently
X and Y are graphically independent given Z
,
when all the paths between any node from X and any node
from Y are blocked by Z

D
-
separation

Theorem
. Let G be a DAG and let X,Y and Z be subsets of

nodes such that X and Y are d
-
separated by Z.

Then, X and Y are conditionally independent from Z

for any probability P such that (G, P) is a causal network over G,

that is, s.t. P(X | Y,Z) = P(X | Z) and P(Y | X,Z) = P(Y | Z).

{B} and {C} are d
-
separated by {A}:


Path B
-
E
-
C: E,G


{A}


-

{A} blocks the path B
-
E
-
C

Path B
-
A
-
C:


-

{A} blocks the path B
-
A
-
C

Inference in Bayesian Networks

Knowledge about a domain encoded by a Bayesian
network XB = (G, P).

Inference = updating probabilities
:
evidence E on values
taken by some variables modify the probabilities of the rest
of variables

P(X)
---

> P’(X) = P(X | E)

Direct Method
:

XB = < G = {A,B,C,D,E}, P(A,B,C,D,E) >

Evidence: A = a
i
, B = b
j


P(C = c
k

| A = a
i
, B = b
j
) =

Inference in Bayesian Networks



Bayesian networks allow
local computations
, which
exploit the indepence relations among variables

explictly
induced by the corresponding DAG of the networks.



They allow updating the probability of a variable using
only the probabilities of the immediat predecessor nodes
(parents), and in this way, step by step to update the
probabilities of all non
-
instantiated variables in the network
---
>
propagation methods



Two main propagation methods
:



Pearl method: message passing over the DAG



Lauritzen & Spiegelhalter method: previous
transformation of the DAG in a tree of
cliques

Propagation method

in trees of cliques

1)
transformation of initial network in another graphical
structure, a tree of cliques (subsets de nodes)


equivalent probabilistic information


BN = (G, P)

----
>

[Tree, P]


2)
propagation algorithm over the new structure


Graphical Transformation

Definition
: a “clique” in a non
-
directed graph is a complete
and maximal subgraph

To transform a DAG G in a tree of cliques
:

1)
Delete directions in edges of G: G’

2)
Moralization of G’: add edges between nodes with
common sons in the original DAG G: G’’

3)
Triangularization of G’’ : G*

4)
Identification of the cliques in G*

5)
Suitable enumeration of the cliques (Running Inters.
Prop.)

6)
Construction of the tree according to the enumeration

Example

(1)

1)

2)

Example (2):
triangularization

3)

3)

Example (3): cliques

Cliques
:

{A,B},

{B,C,E},

{E,F,G},

{C,E,G},

{C,G,H},

{C,D}

Cliques
:

4)

Ordering of cliques

Enumeration of cliques Clq
1
, Clq
2
, …, Clq
n

such that the
following property holds:

Running Intersection Property
:

for all i=1,…, n there exists j < i such that S
i


Clq
j

, where

S
i
= Clq
i

(Clq
1

Clq
2

...

Clq
i
-
1
).


This

property

is

guaranteed

if
:


(i)

nodes

of

the

graph

are

enumerated

following

the

criterion

of

“maximum

cardinality

search”

(ii)

cliques

are

ordered

according

to

the

node

of

the

clique

with

a

highest

ranking

in

the

former

enumaration
.

Example (4): ordering cliques

1

2

4

8

7

3

6

5

Clq
1

= {A,B}, Clq
2

= {B,E,C}, Clq
3

= {E,C,G},

Clq
4

= {E,G,F}, Clq
5

= {C,G,H}, Clq
6

= {C,D}

Tree Construction

Let [Clq
1
, Clq
2
, …, Clq
n
] be an ordering satisfying R.I.P.


For each clique
Clq
i
, define




S
i

=

Clq
i

(Clq
1

Clq
2

...

Clq
i
-
1
)



R
i

=

Clq
i
-
S
i
.


Tree of cliques
:


-

(hyper) nodes: cliques


-

root: Clq
1


-

for each clique Clq
i
, its “father” candidates are



cliques
Clq
k

with k < i and s.t.
S
i


Clq
k



(if more than one candidate, random selection)

Example (5): trees


S
2

=

Clq
2


Clq
1

{

Clq
1

S
3

=

Clq
3

(Clq
1

Clq
2
)

{E,C

Clq
2

S
4

=

Clq
4

(Clq
1

Clq
2


Clq
3
)

{

G

Clq
3

S
5

=

Clq
5

(Clq
1

Clq
2


Clq
3
.

Clq
4
)

{C,G

Clq
3

S
6

=

Clq
6

(

Clq
1

Clq
2


Clq
3
.

Clq
4

Clq
5
)

{C

Clq
2
,

Clq
3
,

Clq
5

Propagation Algorithm



Potential Representation

of the distribution P(X1, …, Xn):


([W
1
...W
p
],

) is a potential representation of P, where the W
i


are subsets of V = {X1, …, Xn}, if
P(V) =




In

a

Bayesian

network

(
G
,

P
)
:




P(X
1
,

...
,

X
n
)

=

P(X
n
|

parents
(X
n
))∙
...


P(X
1
|

parents
(X
1
))




admits a potential representation


P(X
1
, ..., X
n
) =

(Clq
1
) ∙

(Clq
2
) ∙ ...





(Clq
m
)



with


(Clq
i
)=

∏{P(X
j

|

parents
(X
j
))

|

X
j

Clq
i
,

parents
(X
j
)


Clq
i


,

Propagation Algorithm (2)

Fundamental property

of the potential representations:



Let ([W
1
, ..., W
m
],

) be a potential representation for P.




Evidence: X
3

= a and X
5

= b.



Problem: update the probabilitaty



P’(X
1
, ..., X
n
) = P(X
1
, ..., X
n
| X
3
=a,X
5

= b)

??


Define
:

W^
i

= W
i

-

{X
3
, X
5
}




^(W^
i
) =


(W
i



(X
3
=a, X
5
=b))

([W^
1
, ..., W^
m
],

^) is a potential representation for P'
.



Example (6): potentials

Clq
1

= {A,B}, Clq
2

= {B,E,C},

Clq
3

= {E,C,G}, Clq
4

= {E,G,F},

Clq
5

= {C,G,H}, Clq
6

= {C,D}


(Clq
1
) = P(A)∙ P(B | A)


(Clq
2
) = P(C | B,E),


(Clq
3
) = 1




(Clq
4
) = P(F).P(E | F).P(G | F),



(Clq
5
) = P(H | C, G)


(Clq
6
) = P(D | C)

P(A,B,C,D,E,F,G,H)

=


P(D

|

C)

P(H

|

C,

G)

P(C

|

B,

E)


P(G

|

F)

P(E

|

F)

P(F)

P(B

|

A)

P(A)


P(A,B,C,D,E,F,G,H) =


(Clq
1
)


….



(Clq
6
)

Example(6): potentials


(Clq
1
) = P(A)∙ P(B | A)



(a,b)

=

P(a)



P(b

|

a)

=

0
.
005



(¬a,b)

=

P(¬a)



P(b

|

¬a)

=

0
.
0099



(a,¬b)

=

P(a)



P(¬b

|

a)

=

0
.
0095



(¬a,¬b)

=

P(¬a)



P(¬b

|

¬a)

=

0
.
9801




(Clq
5
) = P(H | C, G)



(c,g,h)

=

P(h

|

c,g)

=

0
.
9


(c,g,¬h)

=

P(¬h

|

c,g)

=

0
.
1



(c,¬g,h)

=

P(h

|

c,¬g)

=

0
.
7


(c,¬g,¬h)

=

P(¬h

|

c,¬g)

=

0
.
3



(¬c,g,h)

=

P(h

|

¬c,g)

=

0
.
8


(¬c,g,¬h)

=

P(¬h

|

¬c,g)

=

0
.
2



(¬c,¬g,h)

=

P(h

|

¬c,¬g)

=

0
.
1


(¬c,¬g,¬h)

=

P(¬h

|

¬c,¬g)

=

0
.
9



Propagation algorithm:

theoretical resultats

Causal network (
G
,
P
)

([Clq
1
, ..., Clq
p
],

) is a potential representation for P

1) P(Clq
i
) = P(R
i
|S
i
).P(S
i
)

2) P(R
p
|S
p
) = , where is the marginal


of the function


with respect to the variables of R
p
.

3) If father(Clq
p
) = Clq
j,

then ([Clq
1
,...Clq
p
-
1
],

') is a potential
representation for the marginal distribution of P(V
-
R
p
) where:




'(Clq
i
)=

Clq
i
) for all i

j, i < p


'(Clq
j
)=

Clq
j
)

Propagation algorithm:

step by step (2)

Goal
: to compute P(Clq
i
) for all cliques.

Two graph traversals: one
bottom
-
up

and one
top
-
down

BU
) start with clique Clq
p

. Combining properties 2 i 3 we
have, an iterative form of computing the conditional
distributions P(R
i
|S
i
) in each clique until reaching the root
clique Clq
1
.

Root: P(Clq
1
)=P(R
1
|S
1
).

TD
) P(S
2
)= , and from there P(S
i
)=


--
we can always compute in a clique Clq
i

the distribution P(S
i
) whenever
we have already computed the distribution of its father clique Clq
j
--



P(R
i

| S
i
)

P(S
i
)

P(Clq
i
) = P(R
i
,S
i
) = P(R
i

| S
i
) P(S
i
)



Clq
i

P(R
i
|S
i
) = =


(
Clq
i
)


Ri

(
Clq
i
)


(
Clq
i
)


(
Clq
i
)


’(
Clq
i
) =


(
Clq
i
)

j
(S
j
)

k
(S
k
)

Clq
i

Clq
j

Clq
k

Clq
i

Clq
j

Clq
k


(
Clq
i
)



i
(
S
i
)

Case 1)

Case 2)



6
(
S
6
)



5
(
S
5
)



4
(
S
4
)



3
(
S
3
)



2
(
S
2
)

Example (7)

A)
Bottom
-
up traversal
: passing

k
(S
k
) =

Rk

(Clq
k
),


Clique Clq
6

= {C,D}

(R
6
= {D}, S
6

= {C}).


P(R
6
|S
6
) = P(D | C) =



6
(c) =

(c, d) +

(c, ¬d) = 0.98 + 0.02 = 1


6
(¬c) =

(¬c, d) +

(¬c, ¬d) = 0.05 + 0.95 = 1,


P(d | c) =




P(¬d | c) = 0.02



P(d | ¬c) =




P(¬d | ¬c) = 0.95


Example (7)

Clique

Clq
5

=

{C,

G,

H}

(R
5

=

{H},

S
5

=

{C,

G})
.



This

node

is

clique

Clq
6
’s

father
.

According

to

point

[
3
],

we

modify

the

potential

function

of

the

clique

Clq
5
:



'(Clq
5
)=

Clq
5
)



P(R
5

|

S
5
)

=

P(H

|

C,G)

=




where


5
(C,G)

=




5
(c,g)

=


'(c,

g,

h)

+


'(e,

g,

¬h)

=

0
.
9

+

0
.
1

=

1


5
(c,¬g)

=


'(c,

¬g,

f)

+


'(c,

¬g,

¬h)

=

0
.
7

+

0
.
3

=

1


5
(¬c,g)

=



=


5
(c,¬g)

=

...
=

1

Exemple (7)

Clique Clq
3

= {E,C,G}

(R
3

= {G}, S
3

= {E,C})

Clq
3

is father of two cliques: Clq
4

and Clq
5
,
both already processed


'(Clq
3
) =

Clq
3
)

R


(Clq
4
) ∙

R


(Clq
5
)



=

(Clq
5
) ∙

4
(S
4
) ∙

5
(S
5
)


'(E,C,G) =

E,C,G) ∙

4
(E,G) ∙

5
(C,G)

P(R
3
| S
3
) = P(G | E, C) =


where

3
(E,C) =

Example (7)

Root: Clique Clq
1

= {A, B}

(R
1

= {A, B}, S
1

=

).


'(A,B)=

A,B) ∙

2
(B)

P(R
1
) = P(R
1
| S
1
) =


where

1

=

'(a,b) +

'(a,¬b)+

'(¬a,b)+

'(¬a,¬b).

P(A,B) =

A,B) :

P(a,b) = 0.005, P(a, ¬b) = 0.0095,



P(¬a, b) = 0.099, P(¬a, ¬b) = 0.9801





Clq
i

Clq
j

Clq
k

P(Clq
i
) = P(R
i
|S
i
).P(S
i
)

P(S
k
) =

Clq
i
-
S
k
P(
Clq
i
) =

i
(S
k
)


P(S
j
) =

Clq
i
-
S
j
P(
Clq
i
) =

i
(S
j
)




5
(
S
6
)



3
(
S
5
)



3
(
S
4
)



2
(
S
3
)



1
(
S
2
)

Example (7)

A)
Top
-
down traversal
:

Clique Clq
2

= {B,E,C}

(R
2

= {E,C}, S
2

= {B}).


P(B) = P(S
2
) =






P(b) = P(a, b) + P(¬a, b) = 0.005 + 0.099 = 0.104 ,

P(¬b) = P(a, ¬b) + P(¬a, ¬b) = 1
-

0.104 = 0.896


*** P(Clq
2
) = P(R
2

| S
2
) ∙ P(S
2
)

Example (7)

Clique

Clq
3

=

{E,C,G}

(R
3

=

G,

S
3

=

{E,C})
.




we

have

to

compute

P(S
3
)

i

P(Clq
3
)


Clique

Clq
4

={E,

G,

F}

(R
4

=

{F},

S
4

=

{E,G})
.





we

have

to

compute

P(S
4
)

i

P(Clq
4
)


Clique

Clq
5

=

{C,

G,

H}

(R
5

=

{H},

S
5

=

{C,

G})
.






we

have

to

compute

P(S
5
)

i

P(Clq
5
)


Clique

Clq
6

=

{C,D}

(R
6
=

{D},

S
6

=

{C})
.






we

have

to

compute

P(S
6
)

i

P(Clq
6
)

Summary

Given a Bayesian network BN = (G, P), we have seen how


1)

To
transform

G into a tree of cliques and factorize P as



P(X
1
, ..., X
n
) =

(Clq
1
) ∙

(Clq
2
) ∙...





(Clq
m
)


where


(Clq
i
)= ∏{P(X
j

|
parents
(X
j
)) | X
j

Clq
i
, parents
(X
j
)


Clq
i

,


2)

To
compute the probabilty distributions

P(Clq
i
)

with a
propagation algorithm, and from there, to compute the
probabilities P(X
j
) for X
j


Clq
i
, by marginalization.



Probability updating

It remains to see how to
perform inference
,

i.e. how to update probabilities P(X
j
) when some information
(evidence E) is available about some variables:


P(X
j
)

---

>
P*(X
j
) = P(X
j

| E)


The
updating mechanism

is based in a fundamental
property of the potential representations when applied to
P(X
1
, ..., X
n
)
and its potential representation in terms of
cliques:

P(X
1
, ..., X
n
) =

(Clq
1
) ∙

(Clq
2
) ∙...





(Clq
m
)

Updating mechanism

Recall
:



Let ([Clq
1
, ..., Clq
m
],

) be a potential representation for


P(X
1
, …, X
n
).



We observe: X
3

= a and X
5

= b.



Actualització de la probabilitat:


P*
(X
1
,X
2
,X
4
,X
6
,..., X
n
) = P(X
1
, ...,X
n
| X
3
=a,X
5

= b)


Define:

Clq^
i

= Clq
i

-

{X
3
, X
5
}




^(Clq^
i
) =


(Clq
i



(X
3
=a, X
5
=b))

([Clq^
1
, ..., Clq^
m
],

^) is a potential representation for
P*
.

Updating mechanism

Based

on

three

steps
:



A)
build

the

new

tree

of

cliques

obtained

by

deleting

from

the

original

tree

the

instantiated

variables,


B)

re
-
compute

the

new

potential

functions


^

corresponding

to

the

new

cliques

and,

finally,



C)

apply

the

propagation

algorithm

over

the

new

tree

of

cliques

and

potential

functions
.


A,B

B,E,C

E,C,G

E,G,F

C,G,H

C,D

Clq
1

Clq
2

Clq
3

Clq
4

Clq
5

Clq
6

B

B,E,C

E,C,G

E,G,F

C,G

C,D

Clq’
1

Clq’
2

Clq’
3

Clq’
4

Clq’
5

Clq’
6

A = a, H = b

P(X
j
)

P*(X
j
) = P(X
j

| X=a,H=h)

A = a, H = b

A = a, H = b

P(D = d | A = a, H = h) ?