Guidance: Assignment 3 Part 1

muscleblouseΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

67 εμφανίσεις

Guidance: Assignment 3 Part 1







matlab

functions in statistics toolbox


betacdf
,
betapdf
,
betarnd
,
betastat
,
betafit

Guidance: Assignment 3 Part 2


You will explore the role of the priors.


The Weiss model showed that priors play an important
role when


observations are noisy


observations don’t provide strong constraints


there aren’t many observations.


Guidance: Assignment 3 Part 3


Implement model a bit like Weiss et al. (2002)


Goal: infer motion (velocity) of a rigid shape from
observations at two instances in time.


Assume
distinctive features

that make it easy to identify
the location of the feature at successive times.

Assignment 2 Guidance


Bx
: the x displacement of the blue square (= delta x in one
unit of time)


By
: the y displacement of the blue square


Rx
: the x displacement of the red square


Ry
: the y displacement of the red square


These observations are corrupted by measurement noise.


Gaussian, mean zero, std deviation
σ


D
: direction of motion (up, down, left, right)


Assume only possibilities are one unit of motion in any direction

Assignment 2: Generative Model








Same assumptions for
Bx
, By.

Rx conditioned

on D=up is

drawn from a

Gaussian

Assignment 2 Math


Conditional independence

Assignment 2 Implementation

Quiz: do we need worry about the Gaussian density function
normalization term?

Introduction To
Bayes

Nets

(Stuff stolen from

Kevin Murphy, UBC, and

Nir

Friedman, HUJI)

What Do You Need To Do Probabilistic Inference
In A Given Domain?


Joint probability
distribution over all
variables in domain

Qualitative part


Directed acyclic graph

(DAG)


Nodes: random vars.


Edges: direct influence

Quantitative
part

Set of conditional probability
distributions

0.9

0.1

e

b

e

0.2

0.8

0.01


0.99

0.9

0.1

b

e

b

b

e

B

E

P(A | E,B)

Family of
Alarm

Earthquake

Radio

Burglary

Alarm

Call

Compact representation of joint probability
distributions via conditional independence

Together


Define a unique distribution in a
factored form

Bayes Nets (a.k.a. Belief Nets)

Figure from N. Friedman

What
Is A
Bayes

Net?

Earthquake

Radio

Burglary

Alarm

Call

A node is conditionally independent of its

ancestors given its
parents.


E.g., C is conditionally independent of R, E, and B

given A


Notation: C? R,B,E | A

Quiz: What sort of parameter reduction do we get?

From 2
5



1 = 31 parameters to 1+1+2+4+2=10

Conditional Distributions Are Flexible


E.g., Earthquake and Burglary

might have independent effects

on Alarm


A.k.a. noisy
-
or




where
p
B

and
p
E

are probabilities of burglary and
earthquake alone


This constraint reduces # free parameters to 8!


Earthquake

Burglary

Alarm

Why Are
Bayes

Nets Useful?


Factored representation may have exponentially fewer
parameters than full joint


Lower time complexity (i.e., easier inference)


Lower sample complexity (i.e., less data for learning)


Graph structure supports


Modular representation of knowledge


Local, distributed algorithms for inference and learning


Intuitive (possibly causal) interpretation


Strong theory about the nature of cognition or the generative
process that produces observed data


Can’t represent arbitrary contingencies among variables, so theory can be
rejected by data


Inference


Computing posterior probabilities


Probability of hidden events given any evidence


Most likely explanation


Scenario that explains evidence


Rational decision making


Maximize expected utility


Value of Information


Effect of intervention


Causal analysis

Earthquake

Radio

Burglary

Alarm

Call

Radio

Call

Figure from N. Friedman

Explaining away effect

Domain: Monitoring Intensive
-
Care Patients


37 variables


509 parameters


…instead of 2
37

PCWP

CO

HRBP

HREKG

HRSAT

ERRCAUTER

HR

HISTORY

CATECHOL

SAO2

EXPCO2

ARTCO2

VENTALV

VENTLUNG

VENITUBE

DISCONNECT

MINVOLSET

VENTMACH

KINKEDTUBE

INTUBATION

PULMEMBOLUS

PAP

SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTH

TPR

LVFAILURE

ERRBLOWOUTPUT

STROEVOLUME

LVEDVOLUME

HYPOVOLEMIA

CVP

BP

A
Real
Bayes

Net
: Alarm

Figure from N. Friedman

More
Real
-
World Bayes Net Applications


“Microsoft’s competitive advantage lies in its
expertise in Bayesian networks”

--

Bill Gates, quoted in LA Times, 1996


MS Answer Wizards, (printer)
troubleshooters


Medical diagnosis


Speech recognition (HMMs)


Gene sequence/expression analysis


Turbocodes

(channel coding)




Conditional Independence


A node is conditionally independent

of its ancestors given its parents.


What about conditional

independence between variables

that aren’t directly connected (e.g., Burglary

and Radio)?



Earthquake

Radio

Burglary

Alarm

Call

d
-
separation


Criterion for deciding if nodes are conditionally
independent.


A path from node u to node v is d
-
separated by a set of
nodes Z if the path matches one of these templates:


u

z

v

u

z

v

u

z

v

u

z

v

z

Conditional Independence


Nodes u and v are conditionally independent given set Z
if all (undirected) paths between u and v are d
-
separated by Z.


E.g.,

u

v

z

z

z

d
-
separation Along Paths


For paths involving > 1 intermediate node, the

path is d
-
separated if the outer two nodes of
any

triple are d
-
separated.


u

z

v

u

z

v

u

z

v

u

z

v

z

u

v

z

z

u

v

z

z

u

v

z

z

d separated

d separated

Not
d separated

PCWP

CO

HRBP

HREKG

HRSAT

ERRCAUTER

HR

HISTORY

CATECHOL

SAO2

EXPCO2

ARTCO2

VENTALV

VENTLUNG

VENITUBE

DISCONNECT

MINVOLSET

VENTMACH

KINKEDTUBE

INTUBATION

PULMEMBOLUS

PAP

SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTH

TPR

LVFAILURE

ERRBLOWOUTPUT

STROEVOLUME

LVEDVOLUME

HYPOVOLEMIA

CVP

BP

PCWP

CO

HRBP

HREKG

HRSAT

ERRCAUTER

HR

HISTORY

CATECHOL

SAO2

EXPCO2

ARTCO2

VENTALV

VENTLUNG

VENITUBE

DISCONNECT

MINVOLSET

VENTMACH

KINKEDTUBE

INTUBATION

PULMEMBOLUS

PAP

SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTH

TPR

LVFAILURE

ERRBLOWOUTPUT

STROEVOLUME

LVEDVOLUME

HYPOVOLEMIA

CVP

BP

Sufficiency For Conditional Independence:
Markov Blanket

The Markov blanket of node u consists of the parents,
children, and children’s parents of u






P(
u|MB
(u),v) = P(
u|MB
(u))

u

Probabilistic
Models

Probabilistic models

Directed

Undirected

Graphical models

Alarm network

State
-
space models

HMMs

Naïve
Bayes

classifier

PCA/ ICA

Markov Random Field

Boltzmann machine

Ising

model

Max
-
ent

model

Log
-
linear models

(Bayesian belief nets)

(Markov nets)

Turning A Directed Graphical Model Into An
Undirected Model Via Moralization


Moralization: connect all parents of each node and
remove arrows

Toy
Example Of A Markov Net

X
1

X
2

X
5

X
3

X
4

e.g.,

X
1

?

X
4
, X
5

|
X
2
, X
3

X
i

?

X
rest

|
X
nbrs

Potential function

Partition function

Clique: (largest) subset of

vertices such that each pair

is connected by an edge

Clique

1

2

3

3

A
Real
Markov
Net


Estimate P(x
1
, …,
x
n

| y
1
, …,
y
n
)



Ψ
(x
i
,
y
i
) = P(
y
i

| x
i
): local evidence likelihood



Ψ
(x
i
,
x
j
) = exp(
-
J(x
i
,
x
j
)): compatibility matrix

Observed pixels

Latent causes

Example Of Image Segmentation With MRFs

Sziranyi

et al. (2000)

Graphical Models Are A Useful Formalism


E.g., Naïve
Bayes

model






D

Rx

Ry

Bx

By

Marginalizing over D

Definition of conditional probability

Graphical Models Are A Useful Formalism


E.g.,
feedforward

neural net


with noise, sigmoid belief net






Hidden layer

Input layer

Output layer

Graphical Models Are A Useful Formalism


E.g., Restricted Boltzmann machine (Hinton)


Also known as Harmony network (
Smolensky
)






Hidden units

Visible units

Graphical Models Are A Useful Formalism


E.g., Gaussian Mixture Model

Graphical Models Are A Useful Formalism


E.g., dynamical (time varying) models in which data
arrives sequentially or output is produced as a
sequence


Dynamic
Bayes nets (DBNs) can be used to model
such time
-
series (sequence) data


Special cases of DBNs include


Hidden Markov Models (HMMs)


State
-
space models

Hidden Markov
Model
(HMM)

Y
1

Y
3

X
1

X
2

X
3

Y
2

Phones/ words

acoustic signal

transition

matrix

Gaussian

observations

State
-
Space Model
(SSM)/

Linear Dynamical System (LDS)

Y
1

Y
3

X
1

X
2

X
3

Y
2

“True” state

Noisy observations

Example: LDS
For
2D
Tracking

Q
3

R
1

R
3

R
2

Q
1

Q
2

X
1

X
1

X
2

X
2

X
1

X
2

y
1

y
1

y
2

y
2

y
2

y
1

o

o

o

o


sparse linear
-
Gaussian system

Kalman

Filtering

(Recursive State Estimation In An
LDS)

Y
1

Y
3

X
1

X
2

X
3

Y
2

Estimate P(X
t
|y
1:t
) from P(X
t
-
1
|y
1:t
-
1
) and y
t


Predict: P(X
t
|y
1:t
-
1
) =
s
Xt
-
1

P(X
t
|X
t
-
1
) P(X
t
-
1
|y
1:t
-
1
)


Update: P(X
t
|y
1:t
)
/

P(y
t
|X
t
) P(X
t
|y
1:t
-
1
)

Mike’s Project of the Week

G

X

student

trial

α

P

δ

problem

IRT model

Mike’s Project of the Week

X

student

trial

L
0

T

τ

G

S

BKT model

Mike’s Project of the Week


X

γ

σ

student

trial

L
0

T

τ

α

P

δ

problem

η

G

S

IRT+BKT model

Why Are
Bayes

Nets Useful?


Factored representation may have exponentially fewer
parameters than full joint


Lower time complexity (i.e., easier inference)


Lower sample complexity (i.e., less data for learning)


Graph structure supports


Modular representation of knowledge


Local, distributed algorithms for inference and learning


Intuitive (possibly causal) interpretation


Strong theory about the nature of cognition or the generative
process that produces observed data


Can’t represent arbitrary contingencies among variables, so theory can be
rejected by data