Guidance: Assignment 3 Part 1
matlab
functions in statistics toolbox
betacdf
,
betapdf
,
betarnd
,
betastat
,
betafit
Guidance: Assignment 3 Part 2
You will explore the role of the priors.
The Weiss model showed that priors play an important
role when
observations are noisy
observations don’t provide strong constraints
there aren’t many observations.
Guidance: Assignment 3 Part 3
Implement model a bit like Weiss et al. (2002)
Goal: infer motion (velocity) of a rigid shape from
observations at two instances in time.
Assume
distinctive features
that make it easy to identify
the location of the feature at successive times.
Assignment 2 Guidance
Bx
: the x displacement of the blue square (= delta x in one
unit of time)
By
: the y displacement of the blue square
Rx
: the x displacement of the red square
Ry
: the y displacement of the red square
These observations are corrupted by measurement noise.
Gaussian, mean zero, std deviation
σ
D
: direction of motion (up, down, left, right)
Assume only possibilities are one unit of motion in any direction
Assignment 2: Generative Model
Same assumptions for
Bx
, By.
Rx conditioned
on D=up is
drawn from a
Gaussian
Assignment 2 Math
Conditional independence
Assignment 2 Implementation
Quiz: do we need worry about the Gaussian density function
normalization term?
Introduction To
Bayes
Nets
(Stuff stolen from
Kevin Murphy, UBC, and
Nir
Friedman, HUJI)
What Do You Need To Do Probabilistic Inference
In A Given Domain?
Joint probability
distribution over all
variables in domain
Qualitative part
Directed acyclic graph
(DAG)
•
Nodes: random vars.
•
Edges: direct influence
Quantitative
part
Set of conditional probability
distributions
0.9
0.1
e
b
e
0.2
0.8
0.01
0.99
0.9
0.1
b
e
b
b
e
B
E
P(A  E,B)
Family of
Alarm
Earthquake
Radio
Burglary
Alarm
Call
Compact representation of joint probability
distributions via conditional independence
Together
Define a unique distribution in a
factored form
Bayes Nets (a.k.a. Belief Nets)
Figure from N. Friedman
What
Is A
Bayes
Net?
Earthquake
Radio
Burglary
Alarm
Call
A node is conditionally independent of its
ancestors given its
parents.
E.g., C is conditionally independent of R, E, and B
given A
Notation: C? R,B,E  A
Quiz: What sort of parameter reduction do we get?
From 2
5
–
1 = 31 parameters to 1+1+2+4+2=10
Conditional Distributions Are Flexible
E.g., Earthquake and Burglary
might have independent effects
on Alarm
A.k.a. noisy

or
where
p
B
and
p
E
are probabilities of burglary and
earthquake alone
This constraint reduces # free parameters to 8!
Earthquake
Burglary
Alarm
Why Are
Bayes
Nets Useful?
Factored representation may have exponentially fewer
parameters than full joint
Lower time complexity (i.e., easier inference)
Lower sample complexity (i.e., less data for learning)
Graph structure supports
Modular representation of knowledge
Local, distributed algorithms for inference and learning
Intuitive (possibly causal) interpretation
Strong theory about the nature of cognition or the generative
process that produces observed data
Can’t represent arbitrary contingencies among variables, so theory can be
rejected by data
Inference
•
Computing posterior probabilities
–
Probability of hidden events given any evidence
•
Most likely explanation
–
Scenario that explains evidence
•
Rational decision making
–
Maximize expected utility
–
Value of Information
•
Effect of intervention
–
Causal analysis
Earthquake
Radio
Burglary
Alarm
Call
Radio
Call
Figure from N. Friedman
Explaining away effect
Domain: Monitoring Intensive

Care Patients
•
37 variables
•
509 parameters
…instead of 2
37
PCWP
CO
HRBP
HREKG
HRSAT
ERRCAUTER
HR
HISTORY
CATECHOL
SAO2
EXPCO2
ARTCO2
VENTALV
VENTLUNG
VENITUBE
DISCONNECT
MINVOLSET
VENTMACH
KINKEDTUBE
INTUBATION
PULMEMBOLUS
PAP
SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2
PRESS
INSUFFANESTH
TPR
LVFAILURE
ERRBLOWOUTPUT
STROEVOLUME
LVEDVOLUME
HYPOVOLEMIA
CVP
BP
A
Real
Bayes
Net
: Alarm
Figure from N. Friedman
More
Real

World Bayes Net Applications
“Microsoft’s competitive advantage lies in its
expertise in Bayesian networks”

Bill Gates, quoted in LA Times, 1996
MS Answer Wizards, (printer)
troubleshooters
Medical diagnosis
Speech recognition (HMMs)
Gene sequence/expression analysis
Turbocodes
(channel coding)
Conditional Independence
A node is conditionally independent
of its ancestors given its parents.
What about conditional
independence between variables
that aren’t directly connected (e.g., Burglary
and Radio)?
Earthquake
Radio
Burglary
Alarm
Call
d

separation
Criterion for deciding if nodes are conditionally
independent.
A path from node u to node v is d

separated by a set of
nodes Z if the path matches one of these templates:
u
z
v
u
z
v
u
z
v
u
z
v
z
Conditional Independence
Nodes u and v are conditionally independent given set Z
if all (undirected) paths between u and v are d

separated by Z.
E.g.,
u
v
z
z
z
d

separation Along Paths
For paths involving > 1 intermediate node, the
path is d

separated if the outer two nodes of
any
triple are d

separated.
u
z
v
u
z
v
u
z
v
u
z
v
z
u
v
z
z
u
v
z
z
u
v
z
z
d separated
d separated
Not
d separated
PCWP
CO
HRBP
HREKG
HRSAT
ERRCAUTER
HR
HISTORY
CATECHOL
SAO2
EXPCO2
ARTCO2
VENTALV
VENTLUNG
VENITUBE
DISCONNECT
MINVOLSET
VENTMACH
KINKEDTUBE
INTUBATION
PULMEMBOLUS
PAP
SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2
PRESS
INSUFFANESTH
TPR
LVFAILURE
ERRBLOWOUTPUT
STROEVOLUME
LVEDVOLUME
HYPOVOLEMIA
CVP
BP
PCWP
CO
HRBP
HREKG
HRSAT
ERRCAUTER
HR
HISTORY
CATECHOL
SAO2
EXPCO2
ARTCO2
VENTALV
VENTLUNG
VENITUBE
DISCONNECT
MINVOLSET
VENTMACH
KINKEDTUBE
INTUBATION
PULMEMBOLUS
PAP
SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2
PRESS
INSUFFANESTH
TPR
LVFAILURE
ERRBLOWOUTPUT
STROEVOLUME
LVEDVOLUME
HYPOVOLEMIA
CVP
BP
Sufficiency For Conditional Independence:
Markov Blanket
The Markov blanket of node u consists of the parents,
children, and children’s parents of u
P(
uMB
(u),v) = P(
uMB
(u))
u
Probabilistic
Models
Probabilistic models
Directed
Undirected
Graphical models
Alarm network
State

space models
HMMs
Naïve
Bayes
classifier
PCA/ ICA
Markov Random Field
Boltzmann machine
Ising
model
Max

ent
model
Log

linear models
(Bayesian belief nets)
(Markov nets)
Turning A Directed Graphical Model Into An
Undirected Model Via Moralization
Moralization: connect all parents of each node and
remove arrows
Toy
Example Of A Markov Net
X
1
X
2
X
5
X
3
X
4
e.g.,
X
1
?
X
4
, X
5

X
2
, X
3
X
i
?
X
rest

X
nbrs
Potential function
Partition function
Clique: (largest) subset of
vertices such that each pair
is connected by an edge
Clique
1
2
3
3
A
Real
Markov
Net
•
Estimate P(x
1
, …,
x
n
 y
1
, …,
y
n
)
•
Ψ
(x
i
,
y
i
) = P(
y
i
 x
i
): local evidence likelihood
•
Ψ
(x
i
,
x
j
) = exp(

J(x
i
,
x
j
)): compatibility matrix
Observed pixels
Latent causes
Example Of Image Segmentation With MRFs
Sziranyi
et al. (2000)
Graphical Models Are A Useful Formalism
E.g., Naïve
Bayes
model
D
Rx
Ry
Bx
By
Marginalizing over D
Definition of conditional probability
Graphical Models Are A Useful Formalism
E.g.,
feedforward
neural net
with noise, sigmoid belief net
Hidden layer
Input layer
Output layer
Graphical Models Are A Useful Formalism
E.g., Restricted Boltzmann machine (Hinton)
Also known as Harmony network (
Smolensky
)
Hidden units
Visible units
Graphical Models Are A Useful Formalism
E.g., Gaussian Mixture Model
Graphical Models Are A Useful Formalism
E.g., dynamical (time varying) models in which data
arrives sequentially or output is produced as a
sequence
Dynamic
Bayes nets (DBNs) can be used to model
such time

series (sequence) data
Special cases of DBNs include
Hidden Markov Models (HMMs)
State

space models
Hidden Markov
Model
(HMM)
Y
1
Y
3
X
1
X
2
X
3
Y
2
Phones/ words
acoustic signal
transition
matrix
Gaussian
observations
State

Space Model
(SSM)/
Linear Dynamical System (LDS)
Y
1
Y
3
X
1
X
2
X
3
Y
2
“True” state
Noisy observations
Example: LDS
For
2D
Tracking
Q
3
R
1
R
3
R
2
Q
1
Q
2
X
1
X
1
X
2
X
2
X
1
X
2
y
1
y
1
y
2
y
2
y
2
y
1
o
o
o
o
sparse linear

Gaussian system
Kalman
Filtering
(Recursive State Estimation In An
LDS)
Y
1
Y
3
X
1
X
2
X
3
Y
2
Estimate P(X
t
y
1:t
) from P(X
t

1
y
1:t

1
) and y
t
•
Predict: P(X
t
y
1:t

1
) =
s
Xt

1
P(X
t
X
t

1
) P(X
t

1
y
1:t

1
)
•
Update: P(X
t
y
1:t
)
/
P(y
t
X
t
) P(X
t
y
1:t

1
)
Mike’s Project of the Week
G
X
student
trial
α
P
δ
problem
IRT model
Mike’s Project of the Week
X
student
trial
L
0
T
τ
G
S
BKT model
Mike’s Project of the Week
X
γ
σ
student
trial
L
0
T
τ
α
P
δ
problem
η
G
S
IRT+BKT model
Why Are
Bayes
Nets Useful?
Factored representation may have exponentially fewer
parameters than full joint
Lower time complexity (i.e., easier inference)
Lower sample complexity (i.e., less data for learning)
Graph structure supports
Modular representation of knowledge
Local, distributed algorithms for inference and learning
Intuitive (possibly causal) interpretation
Strong theory about the nature of cognition or the generative
process that produces observed data
Can’t represent arbitrary contingencies among variables, so theory can be
rejected by data
Comments 0
Log in to post a comment