# Guidance: Assignment 3 Part 1

AI and Robotics

Oct 19, 2013 (4 years and 8 months ago)

88 views

Guidance: Assignment 3 Part 1

matlab

functions in statistics toolbox

betacdf
,
betapdf
,
betarnd
,
betastat
,
betafit

Guidance: Assignment 3 Part 2

You will explore the role of the priors.

The Weiss model showed that priors play an important
role when

observations are noisy

observations don’t provide strong constraints

there aren’t many observations.

Guidance: Assignment 3 Part 3

Implement model a bit like Weiss et al. (2002)

Goal: infer motion (velocity) of a rigid shape from
observations at two instances in time.

Assume
distinctive features

that make it easy to identify
the location of the feature at successive times.

Assignment 2 Guidance

Bx
: the x displacement of the blue square (= delta x in one
unit of time)

By
: the y displacement of the blue square

Rx
: the x displacement of the red square

Ry
: the y displacement of the red square

These observations are corrupted by measurement noise.

Gaussian, mean zero, std deviation
σ

D
: direction of motion (up, down, left, right)

Assume only possibilities are one unit of motion in any direction

Assignment 2: Generative Model

Same assumptions for
Bx
, By.

Rx conditioned

on D=up is

drawn from a

Gaussian

Assignment 2 Math

Conditional independence

Assignment 2 Implementation

Quiz: do we need worry about the Gaussian density function
normalization term?

Introduction To
Bayes

Nets

(Stuff stolen from

Kevin Murphy, UBC, and

Nir

Friedman, HUJI)

What Do You Need To Do Probabilistic Inference
In A Given Domain?

Joint probability
distribution over all
variables in domain

Qualitative part

Directed acyclic graph

(DAG)

Nodes: random vars.

Edges: direct influence

Quantitative
part

Set of conditional probability
distributions

0.9

0.1

e

b

e

0.2

0.8

0.01

0.99

0.9

0.1

b

e

b

b

e

B

E

P(A | E,B)

Family of
Alarm

Earthquake

Burglary

Alarm

Call

Compact representation of joint probability
distributions via conditional independence

Together

Define a unique distribution in a
factored form

Bayes Nets (a.k.a. Belief Nets)

Figure from N. Friedman

What
Is A
Bayes

Net?

Earthquake

Burglary

Alarm

Call

A node is conditionally independent of its

ancestors given its
parents.

E.g., C is conditionally independent of R, E, and B

given A

Notation: C? R,B,E | A

Quiz: What sort of parameter reduction do we get?

From 2
5

1 = 31 parameters to 1+1+2+4+2=10

Conditional Distributions Are Flexible

E.g., Earthquake and Burglary

might have independent effects

on Alarm

A.k.a. noisy
-
or

where
p
B

and
p
E

are probabilities of burglary and
earthquake alone

This constraint reduces # free parameters to 8!

Earthquake

Burglary

Alarm

Why Are
Bayes

Nets Useful?

Factored representation may have exponentially fewer
parameters than full joint

Lower time complexity (i.e., easier inference)

Lower sample complexity (i.e., less data for learning)

Graph structure supports

Modular representation of knowledge

Local, distributed algorithms for inference and learning

Intuitive (possibly causal) interpretation

Strong theory about the nature of cognition or the generative
process that produces observed data

Can’t represent arbitrary contingencies among variables, so theory can be
rejected by data

Inference

Computing posterior probabilities

Probability of hidden events given any evidence

Most likely explanation

Scenario that explains evidence

Rational decision making

Maximize expected utility

Value of Information

Effect of intervention

Causal analysis

Earthquake

Burglary

Alarm

Call

Call

Figure from N. Friedman

Explaining away effect

Domain: Monitoring Intensive
-
Care Patients

37 variables

509 parameters

37

PCWP

CO

HRBP

HREKG

HRSAT

ERRCAUTER

HR

HISTORY

CATECHOL

SAO2

EXPCO2

ARTCO2

VENTALV

VENTLUNG

VENITUBE

DISCONNECT

MINVOLSET

VENTMACH

KINKEDTUBE

INTUBATION

PULMEMBOLUS

PAP

SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTH

TPR

LVFAILURE

ERRBLOWOUTPUT

STROEVOLUME

LVEDVOLUME

HYPOVOLEMIA

CVP

BP

A
Real
Bayes

Net
: Alarm

Figure from N. Friedman

More
Real
-
World Bayes Net Applications

“Microsoft’s competitive advantage lies in its
expertise in Bayesian networks”

--

Bill Gates, quoted in LA Times, 1996

troubleshooters

Medical diagnosis

Speech recognition (HMMs)

Gene sequence/expression analysis

Turbocodes

(channel coding)

Conditional Independence

A node is conditionally independent

of its ancestors given its parents.

independence between variables

that aren’t directly connected (e.g., Burglary

Earthquake

Burglary

Alarm

Call

d
-
separation

Criterion for deciding if nodes are conditionally
independent.

A path from node u to node v is d
-
separated by a set of
nodes Z if the path matches one of these templates:

u

z

v

u

z

v

u

z

v

u

z

v

z

Conditional Independence

Nodes u and v are conditionally independent given set Z
if all (undirected) paths between u and v are d
-
separated by Z.

E.g.,

u

v

z

z

z

d
-
separation Along Paths

For paths involving > 1 intermediate node, the

path is d
-
separated if the outer two nodes of
any

triple are d
-
separated.

u

z

v

u

z

v

u

z

v

u

z

v

z

u

v

z

z

u

v

z

z

u

v

z

z

d separated

d separated

Not
d separated

PCWP

CO

HRBP

HREKG

HRSAT

ERRCAUTER

HR

HISTORY

CATECHOL

SAO2

EXPCO2

ARTCO2

VENTALV

VENTLUNG

VENITUBE

DISCONNECT

MINVOLSET

VENTMACH

KINKEDTUBE

INTUBATION

PULMEMBOLUS

PAP

SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTH

TPR

LVFAILURE

ERRBLOWOUTPUT

STROEVOLUME

LVEDVOLUME

HYPOVOLEMIA

CVP

BP

PCWP

CO

HRBP

HREKG

HRSAT

ERRCAUTER

HR

HISTORY

CATECHOL

SAO2

EXPCO2

ARTCO2

VENTALV

VENTLUNG

VENITUBE

DISCONNECT

MINVOLSET

VENTMACH

KINKEDTUBE

INTUBATION

PULMEMBOLUS

PAP

SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTH

TPR

LVFAILURE

ERRBLOWOUTPUT

STROEVOLUME

LVEDVOLUME

HYPOVOLEMIA

CVP

BP

Sufficiency For Conditional Independence:
Markov Blanket

The Markov blanket of node u consists of the parents,
children, and children’s parents of u

P(
u|MB
(u),v) = P(
u|MB
(u))

u

Probabilistic
Models

Probabilistic models

Directed

Undirected

Graphical models

Alarm network

State
-
space models

HMMs

Naïve
Bayes

classifier

PCA/ ICA

Markov Random Field

Boltzmann machine

Ising

model

Max
-
ent

model

Log
-
linear models

(Bayesian belief nets)

(Markov nets)

Turning A Directed Graphical Model Into An
Undirected Model Via Moralization

Moralization: connect all parents of each node and
remove arrows

Toy
Example Of A Markov Net

X
1

X
2

X
5

X
3

X
4

e.g.,

X
1

?

X
4
, X
5

|
X
2
, X
3

X
i

?

X
rest

|
X
nbrs

Potential function

Partition function

Clique: (largest) subset of

vertices such that each pair

is connected by an edge

Clique

1

2

3

3

A
Real
Markov
Net

Estimate P(x
1
, …,
x
n

| y
1
, …,
y
n
)

Ψ
(x
i
,
y
i
) = P(
y
i

| x
i
): local evidence likelihood

Ψ
(x
i
,
x
j
) = exp(
-
J(x
i
,
x
j
)): compatibility matrix

Observed pixels

Latent causes

Example Of Image Segmentation With MRFs

Sziranyi

et al. (2000)

Graphical Models Are A Useful Formalism

E.g., Naïve
Bayes

model

D

Rx

Ry

Bx

By

Marginalizing over D

Definition of conditional probability

Graphical Models Are A Useful Formalism

E.g.,
feedforward

neural net

with noise, sigmoid belief net

Hidden layer

Input layer

Output layer

Graphical Models Are A Useful Formalism

E.g., Restricted Boltzmann machine (Hinton)

Also known as Harmony network (
Smolensky
)

Hidden units

Visible units

Graphical Models Are A Useful Formalism

E.g., Gaussian Mixture Model

Graphical Models Are A Useful Formalism

E.g., dynamical (time varying) models in which data
arrives sequentially or output is produced as a
sequence

Dynamic
Bayes nets (DBNs) can be used to model
such time
-
series (sequence) data

Special cases of DBNs include

Hidden Markov Models (HMMs)

State
-
space models

Hidden Markov
Model
(HMM)

Y
1

Y
3

X
1

X
2

X
3

Y
2

Phones/ words

acoustic signal

transition

matrix

Gaussian

observations

State
-
Space Model
(SSM)/

Linear Dynamical System (LDS)

Y
1

Y
3

X
1

X
2

X
3

Y
2

“True” state

Noisy observations

Example: LDS
For
2D
Tracking

Q
3

R
1

R
3

R
2

Q
1

Q
2

X
1

X
1

X
2

X
2

X
1

X
2

y
1

y
1

y
2

y
2

y
2

y
1

o

o

o

o

sparse linear
-
Gaussian system

Kalman

Filtering

(Recursive State Estimation In An
LDS)

Y
1

Y
3

X
1

X
2

X
3

Y
2

Estimate P(X
t
|y
1:t
) from P(X
t
-
1
|y
1:t
-
1
) and y
t

Predict: P(X
t
|y
1:t
-
1
) =
s
Xt
-
1

P(X
t
|X
t
-
1
) P(X
t
-
1
|y
1:t
-
1
)

Update: P(X
t
|y
1:t
)
/

P(y
t
|X
t
) P(X
t
|y
1:t
-
1
)

Mike’s Project of the Week

G

X

student

trial

α

P

δ

problem

IRT model

Mike’s Project of the Week

X

student

trial

L
0

T

τ

G

S

BKT model

Mike’s Project of the Week

X

γ

σ

student

trial

L
0

T

τ

α

P

δ

problem

η

G

S

IRT+BKT model

Why Are
Bayes

Nets Useful?

Factored representation may have exponentially fewer
parameters than full joint

Lower time complexity (i.e., easier inference)

Lower sample complexity (i.e., less data for learning)

Graph structure supports

Modular representation of knowledge

Local, distributed algorithms for inference and learning

Intuitive (possibly causal) interpretation

Strong theory about the nature of cognition or the generative
process that produces observed data

Can’t represent arbitrary contingencies among variables, so theory can be
rejected by data