Reasoning under uncertainty

lettuceescargatoireΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 2 μέρες)

70 εμφανίσεις

Baye’s Rule

(b)
1

where
)
(
)
|
(
)
(
)
(
)
|
(
)
|
(
P
P
P
P
P
P
P





a
a
b
b
a
a
b
b
a
)
|
(
)
|
(
)
|
(
)
(
)
|
(
)
|
(
a
c
a
b
a
c
b
a
a
c
b
c
b
a
P
P
P
P
P
P







)
|
(
)
(
)
,...,
,
(
stribution
ability Di
Joint Prob

Full
-

s
Baye'
Naive
1
Cause
Effect
Cause
Effect
Effect
Cause
i
i
n


P
P
P
Baye’s Rule and Reasoning

Allows use of uncertain causal knowledge


Knowledge: given a cause what is the likelihood of
seeing particular effects (conditional probabilities)


Reasoning: Seeing some effects, how do we infer the
likelihood of a cause.





This can be very complicated: need joint probability
distribution of
(k+1)

variables, i.e.,
2
k+1

numbers.


Use conditional independence to simplify expressions.


Allows sequential step by step computation

)
...
(
)
(
)
|
...
(
)
...
|
(
2
1
2
1
2
1
k
k
k
e
e
e
H
H
e
e
e
e
e
e
H







P
P
P
P
Bayesian/Belief Network

To avoid problems of enumerating large joint
probabilities


Use causal knowledge and independence to simplify
reasoning, and draw inferences


)
(
)
|
(
).......
,....,
|
(
)
,....,
|
(
)
,....,
(
)
,....,
|
(
)
,....,
,
(
1
3
2
2
1
2
2
1
2
1
n
n
n
n
n
n
n
n
X
P
X
X
P
X
X
X
P
X
X
X
P
X
X
P
X
X
X
P
X
X
X
P



Bayesian Networks

Also called Belief Network or probabilistic
network


Nodes


random variables, one variable per node


Directed Links between pairs of nodes.
A

B

A

has
a direct influence on
B


With
no

directed cycles


A conditional distribution for each node given its
parents






))
(
|
(
i
i
X
Parents
X
P
Cavity

Toothache

Catch

Weather

Must determine the

Domain specific topology.

Bayesian Networks

Next step is to determine the conditional
probability distribution for each variable.


Represented as a conditional probability table
(CPT) giving the distribution over
X
i

for each
combination of the parent value.

Once CPT is determined, the full joint probability

distribution is represented by the network.

The network provides a complete description of a domain.

Belief Networks: Example

If you go to college, this will effect the likelihood that
you will study and the likelihood that you will party.
Studying and partying effect your chances of exam
success, and partying effects your chances of having fun.

Variables:
College, Study, Party, Exam (success), Fun

Causal Relations:


College will affect studying


College will affect parting


Studying and partying will affect exam success


Partying affects having fun.

College

Party

Study

Fun

Exam

College example: CPTs

CPT

Discrete Variables only in this format

College

Party

Study

Fun

Exam

P(C)

0.2

C

P(S)

True

0.8

False

0.2

C

P(P)

True

0.6

False

0.5

S

P

P(E)

True

True

0.6

True

False

0.9

False

True

0.1

False

False

0.2

P

P(F)

True

0.9

False

0.7

Belief Networks: Compactness

A CPT for Boolean variable
X
i

with
k

Boolean
parents is
2
k

rows for combinations of parent
values

Each row requires one number
p

for
X
i

= true


(the number
X
i

= false is
1
-
p
)


Row must sum to 1.


Conditional Probability

If each variable had no more than
k
parents,
then complete network requires
O(n2
k
)

numbers


i.e., the numbers grow linearly in
n

vs.
O(2
n
)

for the
full joint distribution


College net has 1+2+2+4+2=11 numbers

Belief Networks:


Joint Probability Distribution Calculation

Global semantics defines the full joint
distribution as the product of local
distributions:

)
)
(
|
(
)
,...,
,
(
1
2
1



n
i
i
i
n
X
Parents
X
X
X
X
P
P

)
|
(
)
,
|
(
)
|
(
)
|
(
)
(

)
(
fun.

have
or
party
not
but

exams
your
on

successful
be

and
study

will
that you
and

college

to
going
of
y
Probabilit
P
F
P
P
S
E
P
C
P
P
C
S
P
C
P
F
E
P
S
C
P










0.2*0.8*0.4*0.9*0.3 = 0.01728

Can use the networks to make inferences.

College

Party

Study

Fun

Exam

Every value in a full joint probability distribution

can be calculated.

College example: CPTs

College

Party

Study

Fun

Exam

P(C)

0.2

C

P(S)

True

0.8

False

0.2

C

P(P)

True

0.6

False

0.5

S

P

P(E)

True

True

0.6

True

False

0.9

False

True

0.1

False

False

0.2

P

P(F)

True

0.9

False

0.7


)
|
(
)
,
|
(
)
|
(
)
|
(
)
(
P
F
P
P
S
E
P
C
P
P
C
S
P
C
P




0.2*0.8*0.4*0.9*0.3 = 0.01728

)
(
F
E
P
S
C
P






Network Construction

Must ensure network and distribution are
good representations of the domain.


Want to rely on conditional independence
relationships.


First, rewrite the joint distribution in terms of
the conditional probability.



Repeat for each conjunctive probability

)
,...,
(
)
,...,
|
(
)
,...,
(
1
1
1
1
1
x
x
P
x
x
x
P
x
x
P
n
n
n
n










n
i
n
n
n
n
x
x
x
P
x
P
x
x
P
x
x
P
x
x
x
P
x
x
P
i
i
1
1
1
1
2
1
1
1
1
1
)
,...,
|
(


)
(
)
|
(
)...
,...,
(
)
,...,
|
(
)
,...,
(
1
Chain Rule

Network Construction


Note is
equivalent to:



where the partial order is defined by the graph
structure.





n
i
i
i
n
x
x
x
P
x
x
P
1
1
1
1
)
,...,
|
(
)
,...,
(
))
(
|
(
)
,...,
|
(
1
1
i
i
i
i
X
Parents
X
X
X
X
P
P


}
,...,
{
)
(
1
1
X
X
X
Parents
i
i


The above equation says that the network correctly represents the domain

only if each node is conditionally independent of its predecessors in the

node ordering, given the node’s parents.


Means: Parents of
X
i

needs to contain all nodes in
X
1
,…,X
i
-
1

that have


a direct influence on
X
i
.

College example:

P
(F|C, S, P, E) =

P
(
F|P
)

College

Party

Study

Fun

Exam

P(C)

0.2

C

P(S)

True

0.8

False

0.2

C

P(P)

True

0.6

False

0.5

S

P

P(E)

True

True

0.6

True

False

0.9

False

True

0.1

False

False

0.2

P

P(F)

True

0.9

False

0.7

Compact Networks

Bayesian networks are sparse, therefore, much
more compact than full joint distribution.


Sparse: each subcomponent interacts directly with a
bounded number of other nodes independent of the
total number of components.


Usually linearly bounded complexity.


College net has 1+2+2+4+2=11 numbers


Fully connected domain = full joint distribution.


Must determine the correct network topology.


Add “root causes” first then the variables that they
influence.

Network Construction

Need a method such that a series of locally testable
assertions of conditional independence guarantees the
required global semantics

1.
Choose an ordering of variables
X
1
, …., X
n

2.
For
i = 1

to
n



add
X
i

to network



select parents from
X
1
, …, X
i
-
1

such that



P
(X
i

|Parents(X
i
)) =
P
(X
i

| X
1
,… X
i
-
1

)

The choice of parents guarantees the global semantics


on
constructi
by
X
Parents
X
rule
chain
X
X
X
X
X
i
i
n
i
i
i
n
i
n









))
(
|
(
)
,...,
|
(
)
,...,
(
1
1
1
1
1
P
P
P
Constructing Baye’s networks:
Example

Choose an ordering
F, E, P, S, C

Party

Study

College

Fun

Exam

P(E|F)=P(E)?

P(S|F,E)=P(S|E)?

P(C|F,E,P,S)=P(C|P,S)?

P(C|F,E,P,S)=P(C)?

Note that this network has additional dependencies

P(S|F,E)=P(S)?

P(P|F)=P(P)?

Compact Networks

Party

Study

College

Fun

Exam

College

Party

Study

Fun

Exam

Network Construction: Alternative

Start with topological semantics that
specifies the conditional independence
relationships.


Defined by either:


A node is conditionally independent of its non
-
descendants, given its parents.


A node is conditionally independent of all other
nodes given its parents, children, and children’s
parents:
Markov Blanket.

Then reconstruct the CPTs.

X

Network Construction: Alternative

Each node is conditionally
independent of its non
-
descendants given its
parents

Local semantics




Global semantics

Exam

is independent of College,

given the values of Study and Party.

Network Construction: Alternative

Each node is conditionally
independent of its parents,
children and children’s
parents.


Markov Blanket


U
1

U
m





X

Z
1j

Z
nj

Y
1

Y
n

College is independent of fun,

given Party.

Canonical Distribution

Completing a node’s CPT requires up to
O(2
k
)

numbers. (
k



number of parents
)


If the parent child relationship is arbitrary,
than can be difficult to do.


Standard patterns can be named along
with a few parameters to satisfy the CPT.


Canonical distribution


Deterministic Nodes

Simplest form is to use
deterministic

nodes.


A value is specified exactly by its parent’s values.


No uncertainty.


But what about relationships that are uncertain?


If someone has a fever do they have a cold, the flu, or
a stomach bug?

Can you have a cold or stomach bug without a fever?


Noisy
-
Or Relationships

A
Noisy
-
or relationship

permits uncertainty
related to the each parent causing a child
to be true.


The causal relationship may be inhibited.


Assumes:


All possible causes are known.


Can have a miscellaneous category if necessary (leak
node)


Inhibition of a particular parent is independent of
inhibiting other parents.

Can you have a cold or stomach bug without a fever?


Fever is true iff cold, Flu,
or

Malaria is true.

Example

Given:


1
.
0
)
,
,
|
(
2
.
0
)
,
,
|
(
6
.
0
)
,
,
|
(












malaria
flu
cold
fever
P
malaria
flu
cold
fever
P
malaria
flu
cold
fever
P
Example

Cold

Flu

Malaria

P( Fever)

F

F

F

1.0

F

F

T

0.1

F

T

F

0.2

F

T

T

T

F

F

0.6

T

F

T

T

T

F

T

T

T


0.2 * 0.1 = 0.02

0.6 * 0.1 = 0.06

0.6 * 0.2 = 0.12

0.6 * 0.2 * 0.1 = 0.012

Requires
O(k)

parameters rather than
O(2
k
)

Networks with Continuous
Variables

How are continuous variables represented?


Discretization using intervals


Can result in loss of accuracy and large CPTs


Define probability density functions specified
by a finite number of parameters.


i.e. Gaussian distribution

Hybrid Bayesian Networks

Contains both discrete and continuous
variables.


Specification of such a network requires:


Conditional distribution for a continuous
variable with discrete or continuous parents.


Conditional distribution for a discrete variable
with continuous parents.

Example

subsidy

harvest

Cost

Buys

Discrete parent

Continuous parent

Discrete parent is

Explicitly enumerated.

Continuous parent is
represented as a distribution.

Cost
c

depends on the

distribution function for
h
.

A linear Gaussian distribution

can be used.

Have to define the distribution for

both values of
subsidy
.

Continuous child with a discrete parent and a continuous parent

Example

subsidy

harvest

Cost

Buys

Discrete child

Continuous parent

Discrete child with a continuous parent

Set a threshold for cost.


Can use a integral of the

standard normal distribution.


Underlying decision process

has a hard threshold but the

Threshold’s location moves

based upon random

Gaussian noise.



Probit Distribution

Example

Probit distribution


Usually a better fit for real problems

Logit distribution


Uses sigmoid function to determine
threshold.


Can be mathematically easier to work with.

Baye’s Networks and Exact Inference

Notation


X
: Query variable


E
:
set of evidence variables
E
1
,…E
m


e
: a particular observed event


Y
: set of nonevidence variables
Y
1
,…Y
m


Also called
hidden variables
.


The complete set of variables:


A query:
P
(X|
e
)


Y
E
X



}
{
X
College example: CPTs

College

Party

Study

Fun

Exam

P(C)

0.2

C

P(S)

True

0.8

False

0.2

C

P(P)

True

0.6

False

0.5

S

P

P(E)

True

True

0.6

True

False

0.9

False

True

0.1

False

False

0.2

P

P(F)

True

0.9

False

0.7

Example Query

If you succeeded on an exam and had
fun, what is the probability of partying?


P
(Party|Exam=true, Fun=true)

Inference by Enumeration

From Chap 13 we know:



From this Chapter we have:




P(x,
b
,
y
)

in the joint distribution can be
represented as products of the conditional
probabilities.





y
X
X
X
)
,
,
(
)
,
(
)
|
(
y
e
P
e
P
e
P


)
)
(
|
(
)
,...,
,
(
1
2
1



n
i
i
i
n
X
Parents
X
X
X
X
P
P
Inference by Enumeration

A query can be answered using a Baye’s
Net by computing the sums of products of
the conditional probabilities from the
network.

Example Query

If you succeeded on an exam and had
fun, what is the probability of partying?


P
(Party|Exam=true, Fun=true)



What are the hidden variables?

Example Query

Let:


C = College


PR = Party


S = Study


E = Exam


F =Fun

Then we have from eq. 13.6 (p.476):






C
S
C
S
f
e
pr
P
f
e
pr
f
e
pr
)
,
,
,
,
(
)
,
,
(
)
,
|
(


P
P
Example Query

Using




we can put



in terms of the CPT entries.




)
)
(
|
(
)
,...,
,
(
1
2
1



n
i
i
i
n
X
Parents
X
X
X
X
P
P



C
S
C
P
pr
f
P
pr
S
e
P
C
S
P
C
pr
P
f
e
pr
)
(
)
|
(
)
,
|
(
)
|
(
)
|
(
)
,
|
(

P
The worst case complexity of this equation is:
O(n2
n
)

for

n

variables.





C
S
C
S
f
e
pr
f
e
pr
f
e
pr
)
,
,
,
,
(
)
,
,
(
)
,
|
(
P
P
P


Example Query

Improving the calculation


P(f|pr)

is a constant so it can be moved out of
the summation over

C
and

S
.




The move the elements that only involve
C
and not

S
to outside the summation over
S
.




C
S
C
P
pr
S
e
P
C
S
P
C
pr
P
pr
f
P
f
e
pr
)
(
)
,
|
(
)
|
(
)
|
(
)
|
(
)
,
|
(

P



C
S
pr
S
e
P
C
S
P
C
pr
P
C
P
pr
f
P
f
e
pr
)
,
|
(
)
|
(
)
|
(
)
(
)
|
(
)
,
|
(

P
College example:

College

Party

Study

Fun

Exam

P(C)

0.2

C

P(S)

True

0.8

False

0.2

C

P(PR)

True

0.6

False

0.5

S

PR

P(E)

True

True

0.6

True

False

0.9

False

True

0.1

False

False

0.2

PR

P(F)

True

0.9

False

0.7

Example Query

+

+

+

P(f|pr)

.9

P(c)

.2

P(s|c)

.8

P(e|s,pr)

.6

2
.
)
|
(
c
s
P

1
.
)
,
|
(
pr
s
e
P

8
.
)
|
(
c
s
P


8
.
)
(
c
P

.48 + .02 = .5

.12 + .08 = .2

.06 + .08 = .14

.126

Similarly for
P( pr|e,f).


Still O(2
n
)

2
.
)
|
(
c
s
P

P(pr|c)

.6

5
.
)
|
(
c
pr
P

P(e|s,pr)

.6

1
.
)
,
|
(
pr
s
e
P










c
s
c
s
pr
s
e
P
c
s
P
c
pr
P
c
P
pr
f
P
pr
s
e
P
c
s
P
c
pr
P
c
P
pr
f
P
f
e
PR
)
,
|
(
)
|
(
)
|
(
)
(
)
|
(



,
)
,
|
(
)
|
(
)
|
(
)
(
)
|
(
)
,
|
(


P



C
S
pr
S
e
P
C
S
P
C
pr
P
C
P
pr
f
P
f
e
pr
)
,
|
(
)
|
(
)
|
(
)
(
)
|
(
)
,
|
(

P
Variable Elimination

A problem with the enumeration method is that
particular products can be computed multiple
times, thus reducing efficiency.


Reduce the number of duplicate calculations by doing
the calculation once and saving it for later.


Variable elimination evaluates expressions from
right to left, stores the intermediate results and
sums over each variable for the portions of the
expression dependent upon the variable.

Variable Elimination

First, factor the equation.



Second, store the factor for E


A 2x2 matrix
f
E
(S,PR).


Third, store the factor for S.


A 2x2 matrix
.


F

C

S

E




c
s
PR
S
e
P
C
S
P
C
PR
P
C
P
PR
f
P
f
e
PR
)
,
|
(
)
|
(
)
|
(
)
(
)
|
(
)
,
|
(

P
PR














)
|
(
)
|
(
)
|
(
)
|
(
)
,
(
c
s
P
c
s
P
c
s
P
c
s
P
C
S
S
f
Variable Elimination

Fourth, Sum out S from the product of the first
two factors.





This is called a pointwise product


It creates a new factor whose variables are the union of
the two factors in the product.




Any factor that does not depend on the variable to be
summed out can be moved outside the summation.



)
,
(

*

)
,
(


)
,
(

*

)
,
(



)
,
(

*

)
,
(
)
,
(
PR
s
C
s
PR
s
C
s
PR
S
C
S
PR
C
S
E
S






E
S
E
s
E
S
f
f
f
f
f
f
f
)
...
,
...
(
*
)
...
,
...
(
)
...
,
...
,
...
(
1
1
1
1
1
1
1
l
k
k
j
l
k
j
Z
Z
Y
Y
f
Y
Y
X
X
f
Z
Z
Y
Y
X
X
f

Variable Elimination


Fifth, store the factor for PR


A 2x2 matrix.



Sixth, Store the factor for C.





)
,
(
)
|
(
)
(
)
|
(
)
,
|
(
PR
C
C
PR
P
C
P
PR
f
P
f
e
PR
E
S
C
f



P













)
|
(
)
|
(
)
|
(
)
|
(
)
,
(
c
pr
P
c
pr
P
c
pr
P
c
pr
P
C
PR
PR
f










)
(
)
(
)
(
c
P
c
P
C
C
f
Variable Elimination


Seventh, sum out C from the product of
the factors where




)
,
(
*

)
,
(

*

)
(




)
,
(
*

)
,
(

*

)
(



)
,
(
*

)
,
(

*

)
(
)
(
PR
c
c
PR
c
PR
c
c
PR
c
PR
C
C
PR
C
PR
S
S
S
c
E
S
PR
C







E
PR
C
E
PR
C
E
PR
C
f
f
f
f
f
f
f
f
f
f
)
,
(
)
|
(
)
(
)
|
(
)
,
|
(
PR
C
C
PR
P
C
P
PR
f
P
f
e
PR
E
S
c
f



P
Variable Elimination


Next, store the factor for F.



Finally, calculate the final result


)
(
)
|
(
)
,
|
(
PR
PR
f
P
f
e
PR
E
S
PR
C
f


P
)
(
)
,
|
(
PR
PR
f
e
PR
E
S
PR
C
)f
(
f
F


P










)
|
(
)
|
(
)
(
pr
f
P
pr
f
P
PR
F
f
Elimination Simplification

Any leaf node that is not a query variable
or an evidence variable can be removed.

Every variable that is not an ancestor of a
query variable or an evidence variable is
irrelevant to the query and can be
eliminated.


Elimination Simplification

Book Example:


What is the probability that John calls if
there is a burglary?





e
a
m
a
m
P
a
J
P
e
b
a
P
e
P
b
P
b
J
P
)
|
(
)
|
(
)
,
|
(
)
(
)
(
)
|
(

Does this matter?

Burglary

Alarm

Earthquake

MaryCalls

JohnCalls

Complexity of Exact Inference

Variable elimination is more efficient than
enumeration.


Time and space requirements are dominated
by the size of the largest factor constructed
which is determined by the order of variable
elimination and the network structure.

Polytrees

Polytrees are singly connected networks


At most one directed path between any two
nodes.


Time and space requirements are linear in the
size of the network.


Size is the number of CPT entries.

Polytrees

Burglary

Alarm

Earthquake

MaryCalls

JohnCalls

College

Party

Study

Fun

Exam

Are these trees polytrees?

Applying variable elimination

to multiply connected networks

has worst case exponential

time and space complexity.