# Bayessian Networks

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 6 μήνες)

102 εμφανίσεις

Bayesian Networks

Introduction

A problem domain is modeled by a list of
variables X
1
, …, X
n

Knowledge about the problem domain is
represented by a joint probability

P(X
1
, …, X
n
)

Introduction

Example: Alarm

The story: In LA burglary and earthquake are not
uncommon. They both can cause alarm. In case
of alarm, two neighbors John and Mary may call

Problem: Estimate the probability of a burglary
based who has or has not called

Variables: Burglary (B), Earthquake (E), Alarm
(A), JohnCalls (J), MaryCalls (M)

Knowledge required to solve the problem:

P(B, E, A, J, M)

Introduction

What is the probability of burglary given
that Mary called, P(B = y | M = y)?

Compute marginal probability:

P(B , M) =

E, A, J

P(B, E, A, J, M)

Use the definition of conditional

probability

Introduction

Difficulty: Complexity in model
construction and inference

In Alarm example:

31 numbers needed

Computing P(B = y | M = y) takes 29 additions

In general

P(X
1
, … X
n
) needs at least 2
n

1numbers to
specify the joint probability

Exponential storage and inference

Conditional Independence

Overcome the problem of exponential size
by exploiting conditional independence

The chain rule of probabilities:

Conditional Independence

Conditional independence in the problem
domain:

Domain usually allows to identify a subset
pa(X
i
)
µ
{X
1
, …, X
i

1
} such that given
pa(X
i
), X
i

is independent of all variables in
{X
1
, …, X
i
-

1
}
\

pa{X
i
}, i.e.

P(X
i

| X
1
, …, X
i

1
) = P(X
i

| pa(X
i
))

Then

Conditional Independence

As a result, the joint probability

P(X
1
, …, X
n
) can be represented as the
conditional probabilities P(X
i

| pa(X
i
))

Example continued:

P(B, E, A, J, M)

=P(B)P(E|B)P(A|B,E)P(J|A,B,E)P(M|B,E,A,J)

=P(B)P(E)P(A|B,E)P(J|A)P(M|A)

pa(B) = {}, pa(E) = {}, pa(A) = {B, E},

pa{J} = {A}, pa{M} = {A}

Conditional probability table specifies: P(B),
P(E), P(A | B, E), P(M | A), P(J | A)

Conditional Independence

As a result:

Model size reduced

Model construction easier

Inference easier

Graphical Representation

To graphically represent the conditional
independence relationships, construct a directed
graph by drawing an arc from X
j

to X
i

iff

X
j

pa(X
i
)

pa(B) = {}, pa(E) = {}, pa(A) = {B, E}, pa{J} = {A}, pa{M} = {A}

A

B

E

J

M

Graphical Representation

We also attach the conditional probability
table P(X
i

| pa(X
i
)) to node X
i

The result: Bayesian network

A

B

E

J

M

P(B)

P(E)

P(J | A)

P(M | A)

P(A | B, E)

Formal Definition

A Bayesian network is:

An acyclic directed graph (DAG), where

Each node represents a random variable

And is associated with the conditional
probability of the node given its parents

Intuition

A BN can be understood as a DAG where arcs
represent direct probability dependence

Absence of arc indicates probability
independence: a variable is

conditionally independent of all its
nondescendants given its parents

From the graph: B
?
E, J
?
B | A, J
?
E | A

A

B

E

J

M

Construction

Procedure for constructing BN:

Choose a set of variables describing the
application domain

Choose an ordering of variables

variables to the network one by one
according to the ordering

Construction

-
th variable X
i
:

Determine pa(X
i
) of variables already in the
network (X
1
, …, X
i

1
) such that

P(X
i

| X
1
, …, X
i

1
) = P(X
i

| pa(X
i
))

(domain knowledge is needed there)

Draw an arc from each variable in pa(X
i
) to X
i

Example

Order: B, E, A, J, M

pa(B)=pa(E)={},
pa(A)={B,E}, pa(J)={A},
pa{M}={A}

Order: M, J, A, B, E

pa{M}={}, pa{J}={M},
pa{A}={M,J}, pa{B}={A},
pa{E}={A,B}

Order: M, J, E, B, A

Fully connected graph

A

B

E

J

M

A

B

E

J

M

A

B

E

J

M

Construction

Which variable order?

Naturalness of probability
assessment

M, J, E, B, A is bad because of

P(B | J, M, E) is not natural

Minimize number of arcs

M, J, E, B, A is bad (too many arcs),
the first is good

Use casual relationship: cause come
before their effects

M, J, E, B, A is bad because M and
J are effects of A but come before A

A

B

E

J

M

A

B

E

J

M

VS

Casual Bayesian Networks

A causal Bayesian network, or simply
causal networks, is a Bayesian network
whose arcs are interpreted as indicating
cause
-
effect relationships

Build a causal network:

Choose a set of variables that describes the
domain

Draw an arc to a variable from each of its
direct

causes (Domain knowledge required)

Example

Visit Africa

Tuberculosis

X
-
Ray

Smoking

Lung Cancer

Bronchitis

Dyspnea

Tuberculosis or

Lung Cancer

Casual BN

Causality is not a well
understood concept.

No widely accepted denition.

No consensus on whether it is a
property of the world or a concept
in our minds

Sometimes causal relations are
obvious:

Alarm causes people to leave
building.

Lung Cancer causes mass on
chest X
-
ray.

At other times, they are not that
clear.

Doctors believe smoking
causes lung cancer but
the tobacco industry has
a different story:

S

C

Surgeon General (1964)

*

C

S

Tobacco Industry

Inference

Posterior queries to BN

We have observed the values of some
variables

What are the posterior probability distributions
of other variables?

Example: Both John and Mary reported
alarm

What is the probability of burglary
P(B|J=y,M=y)?

Inference

General form of query P(Q | E = e) = ?

Q is a list of query variables

E is a list of evidence variables

e denotes observed variables

Inference Types

Diagnostic inference: P(B | M = y)

Predictive/Casual Inference: P(M | B = y)

Intercasual inference (between causes of
a common effect) P(B | A = y, E = y)

Mixed inference (combining two or more
above) P(A | J = y, E = y) (diagnostic and
casual)

All the types are handled in the same way

Naïve Inference

Naïve algorithm for solving P(Q|E = e) in BN

Get probability distribution P(X) over all
variables X by multiplying conditional
probabilities

BN structure is not used, for many
variables the algorithm is not practical

Generally exact inference is NP
-
hard

Inference

Though generally exact inference is NP
-
hard, in some cases the problem is
tractable, e.g. if BN has a (poly)
-
tree
structure efficient algorithm exists

(a poly tree is a directed acyclic graph in which no two
nodes have more than one path between them)

Another practical approach: Stochastic
Simulation

A general sampling algorithm

For i = 1 to n

1.
Find parents of X
i

(X
p(i, 1)
, …, X
p(i, n)
)

2.
Recall the values that those parents where
randomly given

3.
Look up the table for

P(X
i

| X
p(i, 1)
= x
p(i, 1)
, …, X
p(i, n)
= x
p(i, n)
)

4.
Randomly set x
i

according to this probability

Stochastic Simulation

We want to know P(Q = q| E = e)

Do a lot of random samplings and count

N
c
: Num. samples in which E = e

N
s
: Num. samples in which Q = q and E = e

N: number of random samples

If N is big enough

N
c

/ N is a good estimate of P(E = e)

N
s

/ N is a good estimate of P(Q = q, E = e)

N
s

/ N
c

is then a good estimate of P(Q = q | E = e)

Parameter Learning

Example:

given a BN structure

A dataset

Estimate conditional probabilities P(X
i

| pa(X
i
))

X
1

X
3

X
5

X
4

X
2

X
1

X
2

X
3

X
4

X
5

0

0

1

1

0

1

0

0

1

0

0

?

0

0

?

? means missing values

Parameter Learning

We consider cases with full data

Use maximum likelihood (ML) algorithm
and bayesian estimation

Mode of learning:

Sequential learning

Batch learning

Bayesian estimation is suitable both for
sequential and batch learning

ML is suitable only for batch learning

ML in BN with Complete Data

n variables X
1
, …, X
n

Number of states of X
i
: r
i

= |

X
i
|

Number of configurations of parents of X
i
:
q
i

= |

pa(X
i
)
|

Parameters to be estimated:

ijk
=P(X
i

= j | pa(X
i
) = k),

i = 1, …, n; j = 1, …, r
i
; k = 1, …, q
i

ML in BN with Complete Data

Example: consider a BN. Assume all
variables are binary taking values 1, 2.

ijk
=P(X
i

= j | pa(X
i
) = k)

X
1

X
3

X
2

Number of parents configuration

ML in BN with Complete Data

A complete case: D
l

is a vector of values,
one for each variable (all data is known).

Example: D
l

= (X
1

= 1, X
2

= 2, X
3

= 2)

Given:

A set of complete cases: D = {D
1
, …, D
m
}

Find: the ML estimate of the parameters

ML in BN with Complete Data

Loglikelihood:

l(

| D) = log L(

| D) = log P(D |

)

= log

l

P(D
l

|

) =

l

log P(D
l

|

)

The term log P(D
l

|

):

D
4

= (1, 2, 2)

log P(D
4

|

) = log P(X
1

= 1, X
2

= 2, X
3

= 2 |

)

= log P(X
1
=1 |

) P(X
2
=2 |

) P(X
3
=2 | X
1
=1, X
2
=2,

)

= log

111

+ log

221

+ log

322

Recall:

={

111
,

121
,

211
,

221
,

311
,

312
,

313
,

314
,

321
,

322
,

323
,

324
}

X
1

X
3

X
2

ML in BN with Complete Data

Define the characteristic function of D
l
:

When l = 4, D
4

= {1, 2, 2}

(1,1,1:D
4
)=

(2,2,1:D
4
)=

(3,2,2:D
4
)=1,

(i, j, k: D
4
) = 0 for all other i, j, k

So, log P(D
4

|

) =

ijk

(i, j, k: D
4
) log

ijk

In general,

log P(D
l

|

) =

ijk

(i, j, k: D
l
) log

ijk

X
1

X
3

X
2

ML in BN with Complete Data

Define: m
ijk

=

l

(i, j, k: D
l
)

the number of data cases when

X
i

= j and pa(X
i
) = k

Then l(

| D) =

l

log P(D
l

|

)

=

l

i, j, k

(i, j, k : D
l
) log

ijk

=

i, j, k

l

(i, j, k : D
l
) log

ijk

=

i, j, k

m
ijk

log

ijk
=

i,k

j

m
ijk

log

ijk

ML in BN with Complete Data

We want to find:

argmax l(

| D) = argmax

i,k

j

m
ijk

log

ijk

ijk

Assume that

ijk

= P(X
i

= j | pa(X
i
) = k) is not
related to

i’j’k’
provided that i

i’ OR k

k’

Consequently we can maximize separately each
term in the summation

i, k
[…]

argmax

j

m
ijk

log

ijk

ijk

ML in BN with Complete Data

As a result we have:

In words, the ML estimate for

ijk

= P( X
i

= j | pa(X
i
) = k) is

number of cases where X
i
=j and pa(X
i
) = k

number of cases where pa(X
i
) = k

More to do with BN

Learning parameters with some values
missing

Learning the structure of BN from training
data

Many more…

References

Pearl, Judea,
Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference
, Morgan Kaufmann, San Mateo,
CA, 1988.

Heckerman, David, "A Tutorial on Learning with Bayesian
Networks," Technical Report MSR
-
TR
-
95
-
06, Microsoft Research,
1995.

www.ai.mit.edu/~murphyk/Software

http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

R. G. Cowell, A. P. Dawid, S. L. Lauritzen and D. J. Spiegelhalter.
"Probabilistic Networks and Expert Systems".

Springer
-
Verlag.
1999.

http://www.ets.org/research/conferences/almond2004.html#software