Baye’s Rule
(b)
1
where
)
(
)

(
)
(
)
(
)

(
)

(
P
P
P
P
P
P
P
a
a
b
b
a
a
b
b
a
)

(
)

(
)

(
)
(
)

(
)

(
a
c
a
b
a
c
b
a
a
c
b
c
b
a
P
P
P
P
P
P
)

(
)
(
)
,...,
,
(
stribution
ability Di
Joint Prob
Full

s
Baye'
Naive
1
Cause
Effect
Cause
Effect
Effect
Cause
i
i
n
P
P
P
Baye’s Rule and Reasoning
Allows use of uncertain causal knowledge
Knowledge: given a cause what is the likelihood of
seeing particular effects (conditional probabilities)
Reasoning: Seeing some effects, how do we infer the
likelihood of a cause.
This can be very complicated: need joint probability
distribution of
(k+1)
variables, i.e.,
2
k+1
numbers.
Use conditional independence to simplify expressions.
Allows sequential step by step computation
)
...
(
)
(
)

...
(
)
...

(
2
1
2
1
2
1
k
k
k
e
e
e
H
H
e
e
e
e
e
e
H
P
P
P
P
Bayesian/Belief Network
To avoid problems of enumerating large joint
probabilities
Use causal knowledge and independence to simplify
reasoning, and draw inferences
)
(
)

(
).......
,....,

(
)
,....,

(
)
,....,
(
)
,....,

(
)
,....,
,
(
1
3
2
2
1
2
2
1
2
1
n
n
n
n
n
n
n
n
X
P
X
X
P
X
X
X
P
X
X
X
P
X
X
P
X
X
X
P
X
X
X
P
Bayesian Networks
Also called Belief Network or probabilistic
network
Nodes
–
random variables, one variable per node
Directed Links between pairs of nodes.
A
B
A
has
a direct influence on
B
With
no
directed cycles
A conditional distribution for each node given its
parents
))
(

(
i
i
X
Parents
X
P
Cavity
Toothache
Catch
Weather
Must determine the
Domain specific topology.
Bayesian Networks
Next step is to determine the conditional
probability distribution for each variable.
Represented as a conditional probability table
(CPT) giving the distribution over
X
i
for each
combination of the parent value.
Once CPT is determined, the full joint probability
distribution is represented by the network.
The network provides a complete description of a domain.
Belief Networks: Example
If you go to college, this will effect the likelihood that
you will study and the likelihood that you will party.
Studying and partying effect your chances of exam
success, and partying effects your chances of having fun.
Variables:
College, Study, Party, Exam (success), Fun
Causal Relations:
College will affect studying
College will affect parting
Studying and partying will affect exam success
Partying affects having fun.
College
Party
Study
Fun
Exam
College example: CPTs
CPT
Discrete Variables only in this format
College
Party
Study
Fun
Exam
P(C)
0.2
C
P(S)
True
0.8
False
0.2
C
P(P)
True
0.6
False
0.5
S
P
P(E)
True
True
0.6
True
False
0.9
False
True
0.1
False
False
0.2
P
P(F)
True
0.9
False
0.7
Belief Networks: Compactness
A CPT for Boolean variable
X
i
with
k
Boolean
parents is
2
k
rows for combinations of parent
values
Each row requires one number
p
for
X
i
= true
(the number
X
i
= false is
1

p
)
Row must sum to 1.
Conditional Probability
If each variable had no more than
k
parents,
then complete network requires
O(n2
k
)
numbers
i.e., the numbers grow linearly in
n
vs.
O(2
n
)
for the
full joint distribution
College net has 1+2+2+4+2=11 numbers
Belief Networks:
Joint Probability Distribution Calculation
Global semantics defines the full joint
distribution as the product of local
distributions:
)
)
(

(
)
,...,
,
(
1
2
1
n
i
i
i
n
X
Parents
X
X
X
X
P
P
)

(
)
,

(
)

(
)

(
)
(
)
(
fun.
have
or
party
not
but
exams
your
on
successful
be
and
study
will
that you
and
college
to
going
of
y
Probabilit
P
F
P
P
S
E
P
C
P
P
C
S
P
C
P
F
E
P
S
C
P
0.2*0.8*0.4*0.9*0.3 = 0.01728
Can use the networks to make inferences.
College
Party
Study
Fun
Exam
Every value in a full joint probability distribution
can be calculated.
College example: CPTs
College
Party
Study
Fun
Exam
P(C)
0.2
C
P(S)
True
0.8
False
0.2
C
P(P)
True
0.6
False
0.5
S
P
P(E)
True
True
0.6
True
False
0.9
False
True
0.1
False
False
0.2
P
P(F)
True
0.9
False
0.7
)

(
)
,

(
)

(
)

(
)
(
P
F
P
P
S
E
P
C
P
P
C
S
P
C
P
0.2*0.8*0.4*0.9*0.3 = 0.01728
)
(
F
E
P
S
C
P
Network Construction
Must ensure network and distribution are
good representations of the domain.
Want to rely on conditional independence
relationships.
First, rewrite the joint distribution in terms of
the conditional probability.
Repeat for each conjunctive probability
)
,...,
(
)
,...,

(
)
,...,
(
1
1
1
1
1
x
x
P
x
x
x
P
x
x
P
n
n
n
n
n
i
n
n
n
n
x
x
x
P
x
P
x
x
P
x
x
P
x
x
x
P
x
x
P
i
i
1
1
1
1
2
1
1
1
1
1
)
,...,

(
)
(
)

(
)...
,...,
(
)
,...,

(
)
,...,
(
1
Chain Rule
Network Construction
Note is
equivalent to:
where the partial order is defined by the graph
structure.
n
i
i
i
n
x
x
x
P
x
x
P
1
1
1
1
)
,...,

(
)
,...,
(
))
(

(
)
,...,

(
1
1
i
i
i
i
X
Parents
X
X
X
X
P
P
}
,...,
{
)
(
1
1
X
X
X
Parents
i
i
The above equation says that the network correctly represents the domain
only if each node is conditionally independent of its predecessors in the
node ordering, given the node’s parents.
Means: Parents of
X
i
needs to contain all nodes in
X
1
,…,X
i

1
that have
a direct influence on
X
i
.
College example:
P
(FC, S, P, E) =
P
(
FP
)
College
Party
Study
Fun
Exam
P(C)
0.2
C
P(S)
True
0.8
False
0.2
C
P(P)
True
0.6
False
0.5
S
P
P(E)
True
True
0.6
True
False
0.9
False
True
0.1
False
False
0.2
P
P(F)
True
0.9
False
0.7
Compact Networks
Bayesian networks are sparse, therefore, much
more compact than full joint distribution.
Sparse: each subcomponent interacts directly with a
bounded number of other nodes independent of the
total number of components.
Usually linearly bounded complexity.
College net has 1+2+2+4+2=11 numbers
Fully connected domain = full joint distribution.
Must determine the correct network topology.
Add “root causes” first then the variables that they
influence.
Network Construction
Need a method such that a series of locally testable
assertions of conditional independence guarantees the
required global semantics
1.
Choose an ordering of variables
X
1
, …., X
n
2.
For
i = 1
to
n
add
X
i
to network
select parents from
X
1
, …, X
i

1
such that
P
(X
i
Parents(X
i
)) =
P
(X
i
 X
1
,… X
i

1
)
The choice of parents guarantees the global semantics
on
constructi
by
X
Parents
X
rule
chain
X
X
X
X
X
i
i
n
i
i
i
n
i
n
))
(

(
)
,...,

(
)
,...,
(
1
1
1
1
1
P
P
P
Constructing Baye’s networks:
Example
Choose an ordering
F, E, P, S, C
Party
Study
College
Fun
Exam
P(EF)=P(E)?
P(SF,E)=P(SE)?
P(CF,E,P,S)=P(CP,S)?
P(CF,E,P,S)=P(C)?
Note that this network has additional dependencies
P(SF,E)=P(S)?
P(PF)=P(P)?
Compact Networks
Party
Study
College
Fun
Exam
College
Party
Study
Fun
Exam
Network Construction: Alternative
Start with topological semantics that
specifies the conditional independence
relationships.
Defined by either:
A node is conditionally independent of its non

descendants, given its parents.
A node is conditionally independent of all other
nodes given its parents, children, and children’s
parents:
Markov Blanket.
Then reconstruct the CPTs.
X
Network Construction: Alternative
Each node is conditionally
independent of its non

descendants given its
parents
Local semantics
Global semantics
Exam
is independent of College,
given the values of Study and Party.
Network Construction: Alternative
Each node is conditionally
independent of its parents,
children and children’s
parents.
–
Markov Blanket
U
1
U
m
…
…
X
Z
1j
Z
nj
Y
1
Y
n
College is independent of fun,
given Party.
Canonical Distribution
Completing a node’s CPT requires up to
O(2
k
)
numbers. (
k
–
number of parents
)
If the parent child relationship is arbitrary,
than can be difficult to do.
Standard patterns can be named along
with a few parameters to satisfy the CPT.
Canonical distribution
Deterministic Nodes
Simplest form is to use
deterministic
nodes.
A value is specified exactly by its parent’s values.
No uncertainty.
But what about relationships that are uncertain?
If someone has a fever do they have a cold, the flu, or
a stomach bug?
Can you have a cold or stomach bug without a fever?
Noisy

Or Relationships
A
Noisy

or relationship
permits uncertainty
related to the each parent causing a child
to be true.
The causal relationship may be inhibited.
Assumes:
All possible causes are known.
Can have a miscellaneous category if necessary (leak
node)
Inhibition of a particular parent is independent of
inhibiting other parents.
Can you have a cold or stomach bug without a fever?
Fever is true iff cold, Flu,
or
Malaria is true.
Example
Given:
1
.
0
)
,
,

(
2
.
0
)
,
,

(
6
.
0
)
,
,

(
malaria
flu
cold
fever
P
malaria
flu
cold
fever
P
malaria
flu
cold
fever
P
Example
Cold
Flu
Malaria
P( Fever)
F
F
F
1.0
F
F
T
0.1
F
T
F
0.2
F
T
T
T
F
F
0.6
T
F
T
T
T
F
T
T
T
0.2 * 0.1 = 0.02
0.6 * 0.1 = 0.06
0.6 * 0.2 = 0.12
0.6 * 0.2 * 0.1 = 0.012
Requires
O(k)
parameters rather than
O(2
k
)
Networks with Continuous
Variables
How are continuous variables represented?
Discretization using intervals
Can result in loss of accuracy and large CPTs
Define probability density functions specified
by a finite number of parameters.
i.e. Gaussian distribution
Hybrid Bayesian Networks
Contains both discrete and continuous
variables.
Specification of such a network requires:
Conditional distribution for a continuous
variable with discrete or continuous parents.
Conditional distribution for a discrete variable
with continuous parents.
Example
subsidy
harvest
Cost
Buys
Discrete parent
Continuous parent
Discrete parent is
Explicitly enumerated.
Continuous parent is
represented as a distribution.
Cost
c
depends on the
distribution function for
h
.
A linear Gaussian distribution
can be used.
Have to define the distribution for
both values of
subsidy
.
Continuous child with a discrete parent and a continuous parent
Example
subsidy
harvest
Cost
Buys
Discrete child
Continuous parent
Discrete child with a continuous parent
Set a threshold for cost.
Can use a integral of the
standard normal distribution.
Underlying decision process
has a hard threshold but the
Threshold’s location moves
based upon random
Gaussian noise.
Probit Distribution
Example
Probit distribution
Usually a better fit for real problems
Logit distribution
Uses sigmoid function to determine
threshold.
Can be mathematically easier to work with.
Baye’s Networks and Exact Inference
Notation
X
: Query variable
E
:
set of evidence variables
E
1
,…E
m
e
: a particular observed event
Y
: set of nonevidence variables
Y
1
,…Y
m
Also called
hidden variables
.
The complete set of variables:
A query:
P
(X
e
)
Y
E
X
}
{
X
College example: CPTs
College
Party
Study
Fun
Exam
P(C)
0.2
C
P(S)
True
0.8
False
0.2
C
P(P)
True
0.6
False
0.5
S
P
P(E)
True
True
0.6
True
False
0.9
False
True
0.1
False
False
0.2
P
P(F)
True
0.9
False
0.7
Example Query
If you succeeded on an exam and had
fun, what is the probability of partying?
P
(PartyExam=true, Fun=true)
Inference by Enumeration
From Chap 13 we know:
From this Chapter we have:
P(x,
b
,
y
)
in the joint distribution can be
represented as products of the conditional
probabilities.
y
X
X
X
)
,
,
(
)
,
(
)

(
y
e
P
e
P
e
P
)
)
(

(
)
,...,
,
(
1
2
1
n
i
i
i
n
X
Parents
X
X
X
X
P
P
Inference by Enumeration
A query can be answered using a Baye’s
Net by computing the sums of products of
the conditional probabilities from the
network.
Example Query
If you succeeded on an exam and had
fun, what is the probability of partying?
P
(PartyExam=true, Fun=true)
What are the hidden variables?
Example Query
Let:
C = College
PR = Party
S = Study
E = Exam
F =Fun
Then we have from eq. 13.6 (p.476):
C
S
C
S
f
e
pr
P
f
e
pr
f
e
pr
)
,
,
,
,
(
)
,
,
(
)
,

(
P
P
Example Query
Using
we can put
in terms of the CPT entries.
)
)
(

(
)
,...,
,
(
1
2
1
n
i
i
i
n
X
Parents
X
X
X
X
P
P
C
S
C
P
pr
f
P
pr
S
e
P
C
S
P
C
pr
P
f
e
pr
)
(
)

(
)
,

(
)

(
)

(
)
,

(
P
The worst case complexity of this equation is:
O(n2
n
)
for
n
variables.
C
S
C
S
f
e
pr
f
e
pr
f
e
pr
)
,
,
,
,
(
)
,
,
(
)
,

(
P
P
P
Example Query
Improving the calculation
P(fpr)
is a constant so it can be moved out of
the summation over
C
and
S
.
The move the elements that only involve
C
and not
S
to outside the summation over
S
.
C
S
C
P
pr
S
e
P
C
S
P
C
pr
P
pr
f
P
f
e
pr
)
(
)
,

(
)

(
)

(
)

(
)
,

(
P
C
S
pr
S
e
P
C
S
P
C
pr
P
C
P
pr
f
P
f
e
pr
)
,

(
)

(
)

(
)
(
)

(
)
,

(
P
College example:
College
Party
Study
Fun
Exam
P(C)
0.2
C
P(S)
True
0.8
False
0.2
C
P(PR)
True
0.6
False
0.5
S
PR
P(E)
True
True
0.6
True
False
0.9
False
True
0.1
False
False
0.2
PR
P(F)
True
0.9
False
0.7
Example Query
+
+
+
P(fpr)
.9
P(c)
.2
P(sc)
.8
P(es,pr)
.6
2
.
)

(
c
s
P
1
.
)
,

(
pr
s
e
P
8
.
)

(
c
s
P
8
.
)
(
c
P
.48 + .02 = .5
.12 + .08 = .2
.06 + .08 = .14
.126
Similarly for
P( pre,f).
Still O(2
n
)
2
.
)

(
c
s
P
P(prc)
.6
5
.
)

(
c
pr
P
P(es,pr)
.6
1
.
)
,

(
pr
s
e
P
c
s
c
s
pr
s
e
P
c
s
P
c
pr
P
c
P
pr
f
P
pr
s
e
P
c
s
P
c
pr
P
c
P
pr
f
P
f
e
PR
)
,

(
)

(
)

(
)
(
)

(
,
)
,

(
)

(
)

(
)
(
)

(
)
,

(
P
C
S
pr
S
e
P
C
S
P
C
pr
P
C
P
pr
f
P
f
e
pr
)
,

(
)

(
)

(
)
(
)

(
)
,

(
P
Variable Elimination
A problem with the enumeration method is that
particular products can be computed multiple
times, thus reducing efficiency.
Reduce the number of duplicate calculations by doing
the calculation once and saving it for later.
Variable elimination evaluates expressions from
right to left, stores the intermediate results and
sums over each variable for the portions of the
expression dependent upon the variable.
Variable Elimination
First, factor the equation.
Second, store the factor for E
A 2x2 matrix
f
E
(S,PR).
Third, store the factor for S.
A 2x2 matrix
.
F
C
S
E
c
s
PR
S
e
P
C
S
P
C
PR
P
C
P
PR
f
P
f
e
PR
)
,

(
)

(
)

(
)
(
)

(
)
,

(
P
PR
)

(
)

(
)

(
)

(
)
,
(
c
s
P
c
s
P
c
s
P
c
s
P
C
S
S
f
Variable Elimination
Fourth, Sum out S from the product of the first
two factors.
This is called a pointwise product
It creates a new factor whose variables are the union of
the two factors in the product.
Any factor that does not depend on the variable to be
summed out can be moved outside the summation.
)
,
(
*
)
,
(
)
,
(
*
)
,
(
)
,
(
*
)
,
(
)
,
(
PR
s
C
s
PR
s
C
s
PR
S
C
S
PR
C
S
E
S
E
S
E
s
E
S
f
f
f
f
f
f
f
)
...
,
...
(
*
)
...
,
...
(
)
...
,
...
,
...
(
1
1
1
1
1
1
1
l
k
k
j
l
k
j
Z
Z
Y
Y
f
Y
Y
X
X
f
Z
Z
Y
Y
X
X
f
Variable Elimination
Fifth, store the factor for PR
A 2x2 matrix.
Sixth, Store the factor for C.
)
,
(
)

(
)
(
)

(
)
,

(
PR
C
C
PR
P
C
P
PR
f
P
f
e
PR
E
S
C
f
P
)

(
)

(
)

(
)

(
)
,
(
c
pr
P
c
pr
P
c
pr
P
c
pr
P
C
PR
PR
f
)
(
)
(
)
(
c
P
c
P
C
C
f
Variable Elimination
Seventh, sum out C from the product of
the factors where
)
,
(
*
)
,
(
*
)
(
)
,
(
*
)
,
(
*
)
(
)
,
(
*
)
,
(
*
)
(
)
(
PR
c
c
PR
c
PR
c
c
PR
c
PR
C
C
PR
C
PR
S
S
S
c
E
S
PR
C
E
PR
C
E
PR
C
E
PR
C
f
f
f
f
f
f
f
f
f
f
)
,
(
)

(
)
(
)

(
)
,

(
PR
C
C
PR
P
C
P
PR
f
P
f
e
PR
E
S
c
f
P
Variable Elimination
Next, store the factor for F.
Finally, calculate the final result
)
(
)

(
)
,

(
PR
PR
f
P
f
e
PR
E
S
PR
C
f
P
)
(
)
,

(
PR
PR
f
e
PR
E
S
PR
C
)f
(
f
F
P
)

(
)

(
)
(
pr
f
P
pr
f
P
PR
F
f
Elimination Simplification
Any leaf node that is not a query variable
or an evidence variable can be removed.
Every variable that is not an ancestor of a
query variable or an evidence variable is
irrelevant to the query and can be
eliminated.
Elimination Simplification
Book Example:
What is the probability that John calls if
there is a burglary?
e
a
m
a
m
P
a
J
P
e
b
a
P
e
P
b
P
b
J
P
)

(
)

(
)
,

(
)
(
)
(
)

(
Does this matter?
Burglary
Alarm
Earthquake
MaryCalls
JohnCalls
Complexity of Exact Inference
Variable elimination is more efficient than
enumeration.
Time and space requirements are dominated
by the size of the largest factor constructed
which is determined by the order of variable
elimination and the network structure.
Polytrees
Polytrees are singly connected networks
At most one directed path between any two
nodes.
Time and space requirements are linear in the
size of the network.
Size is the number of CPT entries.
Polytrees
Burglary
Alarm
Earthquake
MaryCalls
JohnCalls
College
Party
Study
Fun
Exam
Are these trees polytrees?
Applying variable elimination
to multiply connected networks
has worst case exponential
time and space complexity.
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο