# Intelligent Systems (2II40)

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 8 μήνες)

123 εμφανίσεις

Intelligent Systems (2II40)

C7

Alexandra I. Cristea

October

200
5

Computing posterior probability
from Full joint distribution

P(
Cavity|Toothache
)=?

Basic rules

Conditional probability
:

P(A
|
B)= P(A

B) / P(B) if P(B)≠0

Product rule
:

P(A

B) = P(A|B) P(B)

Bayes’ Rule
:

P(A|B)= P(B|A)P(A) / P(B)

P(A|B) = P(A

B) / P(B)

Basic rules

P(A|B) = P(A

B) / P(B)

P(Cavity|Toothache) = P(Cavity

Toothache
) / P(Toothache)

P(Cavity

Toothache
) = ? : read from table:

P(Cavity

Toothache
) = 0.04

P(Toothache) = ?

Full joint distribution

cont.

P(Toothache) =

c=Cavity, t for “Toothache=true”

P(c,t)

P(Toothache) = 0.04 + 0.01 = 0.05

P(Cavity|Toothache) = P(Cavity

Toothache
) / P(Toothache) =

= 0.04 / 0.05 = 0.8

Inference from joint distribution

VI.2. Probabilistic reasoning

A.
Conditional independence

B.
Bayesian networks: syntax and semantics

C.
Exact inference

D.
Approximate inference

Independence

Two random variables A B are (absolutely)
independent

iff

P(A,B)=P(A)P(B)

If
n

Boolean variables are independent, the full
joint is P(X1,…,Xn)=

i
P(X
i
)

Two random variables A B given C are
conditionally independent

iff
P(A,B|C)=P(A|C)P(B|C)

Conditional independence example

VI.2.B. Belief networks

(Bayesian networks)

Belief network example

Neighbors John and Mary promised to call
if the alarm goes off; sometimes it starts
because of earthquakes. Is there a burglar?

Variables:
Burglary
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls
(n=5 variables)

Network topology reflects “causal”
knowledge

Belief network example

cont.

for k parents

O(d
k
n) numbers vs. O(d
n
)

Semantics in belief networks

“Global” semantics

defines the full joint
distribution as the product of the local conditional
distributions:

P(X1,…,Xn)=

n

i=1
P(Xi|Parents(Xi))

e.g., P(J

M

A

B

E
) is given by??

= P(

B
)P(

E
)P(A|

B

E
)P(J|A)P(M|A) =

= 0,999 x 0,998 x 0,001 x 0,90 x 0,70 =

= 0,000062

“Local” semantics
: each node is conditionally
independent of its nondescendents given its parents

Theorem: Local semantics

global semantics

Markov blanket

Each node is
conditionally
independent

of all
others given its
Markov blanket
:

parents +

children +
children’s parents

Homework 7

1.
In the Burglary Example
*)

compute the
following:

a)
The Markov blanket for node ‘Alarm’

b)
The probability of Mary to call, given there is a
Burglary:
P(M=true | B=true)
.

2.
Continue with Steps 14, 15 of your project.

3.
Perform Step 16 of your project (Final
Presentation).

*)

Point 1) of this homework is not obligatory; deadline before project
cannot lower your average, only increase it)

Constructing

belief networks

locally

testable assertions of conditional
independence

global

semantics

1.
Choose an ordering of variables X1,…,Xn

2.
For i=1 to n

select parents from X1,…,Xi
-
1 such that

P(Xi|Parents(Xi)) = P(Xi|X1,…,Xi
-
1)

This choice of parents guarantees the global semantics:

P(X1,…,Xn) =

n
i=1
P(Xi|X1,…,Xi
-
1)) = #
chain rule

=

n
i=1
P(Xi|Parents(Xi)) by construction

Constructing belief networks
example

Constructing belief networks
example

Constructing belief networks
example

Constructing belief networks
example

Constructing belief networks
example

Example: car diagnosis

initial evidence

testable

vars

diagnosis vars

hidden vars

Example: car insurance

predict claim

application

form data

Efficient conditional distributions

CPT grows exponentially w. no. of parents

CPT becomes infinite w. continuous
variables

Other, more compact methods are needed

Compact conditional distributions

-

cont.

Noisy
-
OR

distributions model multiple non
-
interacting
causes

1.
Parents
U
1 …
U
k include all causes (can add
leak

node)

2.
Independent failure probability

qi

for each cause alone

P(X|
U
1…
U
j,

U
j+1,…

U
k)=1
-

j
i=1
qi

Number of parameters: linear in number of parents

Hybrid (discrete + continuous)
networks

Discrete (
Subsidy?

and
);

Continuous (
Harvest

and

Cost
)

How to deal with this?

Probability density

functions (PDF)

for
continuous variables

Ex.: let
X

denote tomorrow’s maximum
temperature in the summer in Eindhoven

Belief that
X

is distributed
uniformly

between 18
and 26 degree Celsius:

P(
X
=
x
) =
U
[18,26](
x
)

P(
X
=20,5) =
U
[18,26](20,5)=0,125/C

Probability density functions (PDF)

Cumulative density functions (CDF)

Continuous Random Variable

Probability distribution (density function)
over continuous values

X

[0,10]
P
(
x
)

0

0
10

P
(
x

)
dx

= 1

P
(5

x

7) =

5
7

P
(
x

)
dx

more on PDF at:

http://people.hofstra.edu/faculty/Stefan_Waner/cprob/cprob2.html

Hybrid (discrete + continuous)
networks

Discrete (
Subsidy?

and
);

Continuous (
Harvest

and
Cost
)

How to deal with this?

Hybrid (discrete + continuous)
networks

Discrete (
Subsidy?

and
);

Continuous (
Harvest

and
Cost
)

Option 1:
discretization

possibly large errors, large CPTs

Option 2:
finitely parameterized
canonical (PDF) families

a)
Continuous variable,
discrete + continuous
parents (e.g.,
Cost
)

b)
Discrete variable,
continuous parents (e.g.,
)

a) Continuous child variables

conditional density

functions for child var

e.g., linear Gaussian

:

mean
Cost

varies linearly w.
Harvest

variance is fixed

linear variation is unreasonable over full range

(why?)

Continuous child variables

ex.

All
-
continuous network w. LG distribution

full joint is a multivariate Gaussian

Discrete + continuous LG network is a
conditional
Gaussian network
, i.e., a
multivariate Gaussian

over all
continuous variables for each combination of discrete
variable values

b) Discrete child, continuous parent


((
-
c +

) /

)

with

-

Probit distribution
:

-

the
integral

on the standard normal distribution

Logit distribution
:

Uses the
sigmoid

function

x
dx
x
N
x
)
)(
1
,
0
(
)
(
x
e
x
2
1
1
)
(

VI.2. Probabilistic reasoning

A.
Conditional independence

B.
Bayesian networks: syntax and semantics

C.
Exact inference

i.
Exact inference by enumeration

ii.
Exact inference by variable elimination

D.
Approximate inference

i.
Approximate inference by stochastic simulation

ii.
Approximate inference by Markov chain Monte
Carlo

Exact inference w. enumeration

y
y
e
X
P
e
X
e
X
P
)
,
,
(
)
,
(
)
|
(

P

e
a
e
a
a
m
P
a
j
P
e
b
a
P
e
P
b
P
m
j
a
e
b
P
m
j
b
P
m
j
b
P
)
|
(
)
|
(
)
,
|
(
)
(
)
(
)
,
,
,
,
(
)
,
,
(
)
,
|
(

e
a
m
j
a
e
B
P
m
j
B
m
j
B
P
)
,
,
,
,
(
)
,
,
(
)
,
|
(

P
n;d
n
=
n;2
n

d
n
= 2
n

Enumeration algorithm

Exhaustive depth
-
first enumeration: O(n) space, O(d
n
) time

Inference by variable elimination

Enumeration is inefficient: repeated computation

e.g., computes P(
J
=
true
|
a
)P(M=
true
|
a
) for each value of
e

Variable elimination: summation from right to left, storing
intermediate results (
factors
) to avoid recomputation

Variable elimination: basic operations

Pointwise product

of factors f1 and f2:

f1(x1,…,xj,y1,…,yk)


f2(y1,…,yk,z1,…,zl)

= f(x1,…,xj,y1,…,yk,z1,…,zl)

e.g., f1(a,b)


f2(b,c) = f(a,b,c)

Summing out

a variable from a product of factors:

move any constant factors outside the summation:

x
f1

fk = f1

fi

x
f
i+1

fk = f1

fi f
X’

assuming f1,…fi do not depend on X

Example pointwise product

A

B

f1(A,B)

B

C

f2(B,C)

A

B

C

f3(A,B,C)

T

T

.3

T

T

.2

T

T

T

T

F

.7

T

F

.8

T

T

F

F

T

.9

F

T

.6

T

F

T

F

F

.1

F

F

.4

T

F

F

F

T

T

F

T

F

F

F

T

F

F

F

Example pointwise product

A

B

f1(A,B)

B

C

f2(B,C)

A

B

C

f3(A,B,C)

T

T

.3

T

T

.2

T

T

T

.3 x .2

T

F

.7

T

F

.8

T

T

F

.3 x .8

F

T

.9

F

T

.6

T

F

T

.7 x .6

F

F

.1

F

F

.4

T

F

F

.7 x .4

F

T

T

.9 x .2

F

T

F

.9 x .8

F

F

T

.1 x .6

F

F

F

.1 x .4

Variable elimination algorithm

Complexity of exact inference

Polytrees

(singly connected network) = network in
which there is
at most one undirected path

between any two nodes

Time, space complexity of exact inference on polytrees
=
linear

in size of network

Multiply connected networks

≠polytrees

Variable elimination can have exponential time and
space complexity

inference in Bayesian networks is
NP
-
hard

includes
inference in propositional logics

as special case

VI.2. D. Approximate inference

Inference by stochastic simulation

Basic idea:

1.
Draw
N

samples

from a
sampling distribution

S

2.
Compute
approximate posterior probability

P’

3.
Show it
converges

to the true probability
P

VI.2. D. Approximate inference

i.
Sampling

from an empty network

ii.
Rejection

sampling: reject samples
disagreeing w. evidence

iii.
Likelihood weighting
: use evidence to
weight samples

iv.
MCMC
: sample from a stochastic
process whose stationary distribution is
the true posterior

i. Sampling from an empty network

function PRIOR
-
SAMPLE(
bn
)

returns an event sampled from the prior specified by
bn

x

an event w.
n

elements

for
i
=1 to
n

do

xi

a random sample from P(
Xi
|
parents
(
Xi
))

return
x

P(
Cloudy
)= <0.5,0.5>

i. Sampling from an empty network

cont.

Probability that PRIOR
-
SAMPLE generates a particular event:

S
PS
(
x
1, …
x
n)

=

n

i=1
P(
X
i|Parents(
X
i))=P(
x
1,…
x
n)

N
PS

(
Y
=
y
)

no. of samples generated for which
Y
=
y

for
any set of variables
Y
.

Then,
P’(
Y
=
y
)

= N
PS
(
Y
=
y
)/
N

and

lim
N


P’(
Y
=
y
) =

h
S
PS
(
Y
=
y
,
H
=
h
) =

=

h
P(
Y
=
y
,
H
=
h
) =

=
P(
Y
=
y
)

estimates derived from PRIOR
-
SAMPLE are
consistent

ii. Rejection sampling example

Estimate
P
(
Rain
|
Sprinkler
=
true
) = ? using

100

samples

27

samples have
Sprinkler
=
true
; out of these,

8

have
Rain
=
true

and

19

have
Rain
=
false
.

P
’(
Rain
|
Sprinkler
=
true
) =

= NORMALIZE(<8,19>) = <0.296,0.704>

Similar to a basic real
-
world empirical estimation procedure

ii. Rejection sampling

P’(
X
|
e
) is estimated from samples agreeing with evidence
e

PROBLEM:

a lot of collected samples are thrown away!!

iii. Likelihood weighting

Idea:

fix evidence variables
E
,

sample only nonevidence

var.,
X
,
Y

weight


sample by
likelihood it
accords to evidence
E

iii. Likelihood weighting example

Estimate
P
(
Rain
|
Sprinkler
=
true
,
WetGrass
=
true
)

iii. Likelihood weighting example

Sample generation process:

1.
w

1.0

2.
Sample
P
(
Cloudy
)=<0.5,0.5>; say
true

3.
Sprinkler

has value
true
, so

w

w

P(
Sprinkler
=
true

|
Cloudy
=
true
) = 0.1

4.
Sample
P
(
Rain
|
Cloudy
=
true
)=<0.8,0.2>; say
true

5.
WetGrass

has value
true
, so

w

w

P(
WetGrass
=
true
|
Sprinkler
=
true
,
Rain
=
true
)=0.099

iii. Likelihood weighting function

iii. Likelihood weighting analysis

Sampling probability for WEIGHTED
-
SAMPLE is

S
WS
(y,e) =

l

i=1
P(yi|
Parents
(
Yi
))

Note: pays attention to
evidence in ancestors only

somewhere “in between”
prior

and
posterior

distribution

Weight for a given sample y,e, is

w
(y,e) =

n

i=1
P(ei|
Parents
(
Ei
))

Weighted sampling probability is

S
WS
(y,e) w(y,e) =

l

i=1
P(
yi
|
Parents
(
Yi
)
)

m

i=1
P(
ei
|
Parents
(
Ei
)
)
=
P
(y,e)

# by standard global semantics of network

Hence, likelihood weighting is
consistent

But performance still degrades w. many evidence variables

iv. MCMC inference

State
” of network = current assignment to
all variables

Generate next state by
sampling one variable
given Markov blanket

Sample each variable in turn, keeping
evidence fixed

Approaches
stationary distribution
: long
-
run
fraction of time spent in each state is exactly
proportional to its posterior probability

Markov blanket
-

reminder

Each node is
conditionally
independent of all
others given its
Markov blanket
:

parents +

children +
children’s parents

MCMC algorithm

Conclusion on Uncertainty

We discussed:

Decision theory basics: Uncertainty, Probability, Syntax,
Semantics, Inference Rules

Probabilistic reasoning: Conditional independence, Bayesian
networks, Exact inference, Approximate inference

Just as in reasoning without uncertainty, prior knowledge
can be used to diminish the state space (see inference
with Joint Distribution vs. Bayesian networks)

If probabilities are unknown, or if computation has to be
reduced, they can be estimated with Approximate
Inference

Questions?

Information Final Project
Presentations

Info:

http://www.win.tue.nl/~acristea/IS/Informatio
nProjectPresentation.txt

Appointments:

http://www.win.tue.nl/~acristea/IS/FinalPrese
ntationAppointments.txt

Information Exam

Subject: course material
(till constructing belief networks inclusively)

& homeworks

Samples (&some Solutions) at:
http://wwwis.win.tue.nl/~acristea/HTML/IS/CoursePowerpoints&
Demos/OldExams/

Check Sample Homework Solutions, compare with your own!

Place & time on course site (but ALWAYS check
http://owinfo.tue.nl/

as well)

Closed book (
simple

calculators

allowed

NOT computers
or phones, PDA, communication devices, etc. !!)

No communication

Any Questions?

Good luck!