Kalman
Filters and
Dynamic Bayesian Networks
Markoviana
Reading Group
Srinivas
Vadrevu
Arizona State University
Source 1
2
Introduction to Kalman Filters
CEE 6430: Probabilistic Methods
in Hydroscienecs
Fall 2008
Acknowledgements: Numerous sources on WWW,
book, papers
Source 2
Markoviana Reading Group: Week3
Outline
Introduction
Gaussian Distribution
–
Introduction
–
Examples (Linear and Multivariate)
Kalman Filters
–
General Properties
–
Updating Gaussian Distributions
–
One

dimensional Example
–
Notes about general case
–
Applicability of Kalman Filtering
Dynamic Bayesian Networks (DBNs)
–
Introduction
–
DBNs and HMMs
–
DBNs and HMMs
–
Constructing DBNs
4
A
“
䡹摲d
”
䕸慭灬p
Suppose you have a hydrologic model that predicts river
water level every hour (using the usual inputs).
You know that your model is not perfect and you don’t
trust it 100%. So you want to send someone to check
the river level in person.
However, the river level can only be checked once a day
around noon and not every hour.
Furthermore, the person who measures the river level
can not be trusted 100% either.
So how do you combine both outputs of river level (from
model and from measurement) so that you get a ‘fused’
and better estimate?
–
Kalman filtering
5
Graphically speaking
6
What is a Filter by the way?
Other applications of Kalman
Filtering (or Filtering in general):
1)
Your Car GPS (predict and
update location)
2)
Surface to Air Missile (hitting
the target)
3)
Ship or Rocket navigation
(Appollo 11 used some sort of
filtering to make sure it didn’t
miss the Moon!)
7
The Problem in General
(let
’
猠来琠愠汩瑴汥潲攠瑥捨湩捡氩
System state cannot be measured directly
Need to estimate
“
optimally
”
from
measurements
Measuring
Devices
Estimator
Measurement
Error Sources
System State
(desired but not
known)
External
Controls
Observed
Measurements
Optimal
Estimate of
System State
System
Error Sources
System
Black Box
Sometimes the system
state and the
measurement may be two
different things (not like
river level example)
8
What is a Kalman Filter?
Recursive
data processing algorithm
Generates
optimal
estimate of desired quantities
given the set of measurements
Optimal?
–
For linear system and white Gaussian errors, Kalman
filter is
“
best
”
estimate based on all previous
measurements
–
For non

linear system optimality is
‘
qualified
’
Recursive?
–
Doesn
’
t need to store all previous measurements
and reprocess all data each time step
9
Conceptual Overview
Simple example to motivate the workings
of the Kalman Filter
The essential equations you need to know
(Kalman Filtering for Dummies!)
Examples: Prediction and Correction
10
Conceptual Overview
Lost on the 1

dimensional line (imagine that you are
guessing your position by looking at the stars using
sextant)
Position
–
y(t)
Assume Gaussian distributed measurements
y
11
Conceptual Overview
0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
•
Sextant Measurement at t
1
: Mean = z
1
and Variance =
z1
•
Optimal estimate of position is:
ŷ(t
1
) = z
1
•
Variance of error in estimate:
2
x
(t
1
) =
2
z1
•
Boat in same position
at time t
2

Predicted
position is z
1
State space
–
position
Measurement

position
Sextant is not
perfect
12
0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
•
So we have the prediction
ŷ

(t
2
)
•
GPS Measurement at t
2
: Mean = z
2
and Variance =
z2
•
Need to
correct
the prediction by Sextant due to measurement to
get
ŷ
(t
2
)
•
Closer to more trusted measurement
–
should we do linear
interpolation?
prediction ŷ

(t
2
)
State
(by looking
at the stars at t2)
Measurement
usign GPS z(t
2
)
13
0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
•
Corrected mean is the new optimal estimate of position (basically
you ha
ve
‘
updated
’
the predicted position by Sextant using GPS
•
New variance is smaller than either of the previous two variances
measurement
z(t
2
)
corrected optimal
estimate ŷ(t
2
)
prediction ŷ

(t
2
)
Kalman filter helps
you fuse
measurement and
prediction on the
basis of how much
you trust each
(I would trust the
GPS more than the
sextant)
14
Conceptual Overview
(The Kalman Equations)
Lessons so far:
Make prediction based on previous data

ŷ

,

Take measurement
–
z
k
,
z
Optimal estimate (ŷ) = Prediction + (Kalman Gain) * (Measurement

Prediction
)
Variance of estimate = Variance of prediction * (1
–
Kalman Gain
)
15
0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
•
At time t
3
, boat moves with velocity dy/dt=u
•
Naïve approach: Shift probability to the right to predict
•
This would work if we knew the velocity exactly (perfect model)
ŷ(t
2
)
Naïve Prediction
(sextant) ŷ

(t
3
)
What if the
boat was now
moving?
16
0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
•
Better to assume imperfect model by adding Gaussian noise
•
dy/dt = u + w
•
Distribution for prediction moves and spreads out
ŷ(t
2
)
Naïve Prediction
ŷ

(t
3
)
Prediction ŷ

(t
3
)
But you may not be so
sure about the exact
velocity
17
0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview
•
Now we take a measurement at t
3
•
Need to once again correct the prediction
•
Same as before
Prediction
ŷ

(t
3
) Sextant
Measurement
z(t
3
) GPS
Corrected optimal estimate ŷ(t
3
) Updated Sextant position using GPS
18
Conceptual Overview
Lessons learnt from conceptual overview:
–
Initial conditions (
ŷ
k

1
and
k

1
)
–
Prediction (
ŷ
k

,
k

)
•
Use initial conditions and model (eg. constant velocity) to
make prediction
–
Measurement (z
k
)
•
Take measurement
–
Correction (
ŷ
k
,
k
)
•
Use measurement to correct prediction by
‘
blending
’
prediction and residual
–
always a case of merging only two
Gaussians
•
Optimal estimate with smaller variance
19
Blending Factor
•
If we are sure about measurements:
–
Measurement error covariance (R) decreases to zero
–
K decreases and weights residual more heavily than prediction
•
If we are sure about prediction
–
Prediction error covariance P

k
decreases to zero
–
K increases and weights prediction more heavily than residual
20
The set of Kalman Filtering
Equations in Detail
ŷ

k
= Ay
k

1
+ Bu
k
P

k
= AP
k

1
A
T
+ Q
Prediction (Time Update)
(1) Project the state ahead
(2) Project the error covariance ahead
Correction (Measurement Update)
(1) Compute the Kalman Gain
(2) Update estimate with measurement z
k
(3) Update Error Covariance
ŷ
k
=
ŷ

k
+ K(z
k

H
ŷ

k
)
K = P

k
H
T
(HP

k
H
T
+ R)

1
P
k
= (I

KH)P

k
21
Assumptions behind Kalman Filter
•
The model you use to predict the
‘
state
’
needs
to be a LINEAR function of the measurement
(so how do we use non

linear rainfall

runoff
models?)
•
The model error and the measurement error
(noise) must be Gaussian with zero mean
Markoviana Reading Group: Week3
HMMs and Kalman Filters
Hidden Markov Models (HMMs)
–
Discrete State Variables
–
Used to model sequence of events
Kalman Filters
–
Continuous State Variables, with Gaussian
Distribution
–
Used to model noisy continuous observations
–
Examples
•
Predict the motion of a bird through dense jungle foliage at
dusk
•
Predict the direction of the missile through intermittent radar
movement observations
Markoviana Reading Group: Week3
Gaussian (Normal) Distribution
Central Limit Theorem: The sum of n statistical independent random
variables converges for n
∞ towards the Gaussian distribution
(
Applet
Illustration)
Unlike the binomial and Poisson distribution, the Gaussian is a
continuous distribution:
–
= mean of distribution (also at the same place as mode and median)
–
2
= variance of distribution
–
y
is a continuous variable (

∞
y
∞
Gaussian distribution is fully defined by its mean and variance
2
2
2
)
(
2
1
)
(
y
e
y
p
p
(
x
)
1
2
e
(
x
)
2
2
2
gaus
s
i
an
Markoviana Reading Group: Week3
Gaussian Distribution: Examples
Linear Gaussian Distribution
–
Mean,
and Variance,
Multivariate Gaussian Distribution
–
For 3 random variables
•
Mean,
= [m1 m2 m3]
•
Covariance Matrix, Sigma = [ v
11
v
12
v
13
v
21
v
22
v
23
v
31
v
32
v
33
]
Markoviana Reading Group: Week3
Kalman Filters: General Properties
Estimate the state and the covariance of the
state at any time T, given observations, x
T
= {x
1
,
…, x
T
}
–
E.g., Estimate the state (location and velocity) of
airplane and its uncertainty, given some
measurements from an array of sensors
The probability of interest is P(y
t
x
T
)
Filtering the state
T = current time, t
Predicting the state
T < current time, t
Smoothing the state
T > current time, t
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Gaussian Noise & Example
Next State is linear function of current
state, plus some Gaussian noise
–
Position Update:
–
Gaussian Noise:
Markoviana Reading Group: Week3
Updating Gaussian Distributions
Linear Gaussian family of distributions remains closed under standard Bayesian network
operations ( this means we end up with Gaussian distributions
–
a very nice property.)
One

step predicted distribution
–
Current distribution P(X
t
e
1:t
) is Gaussian
–
Transition model P(X
t+1
x
t
) is linear Gaussian
The updated distribution
–
Predicted distribution P(X
t+1
e
1:t
) is Gaussian
–
Sensor model P(e
t+1
X
t+1
) is linear Gaussian
Filtering and Prediction (From 15.2):
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
Markoviana Reading Group: Week3
One

dimensional Example
Update Rule (Derivations from Russel & Norvig)
–
Compute new mean and covariance matrix from the previous mean and
covariance matrix
–
Variance update is independent of the observation
Another variation of the update rule (from Max Welling, Caltech)

2
is variance or uncertainty,
K is the Kalman gain

K = 0
no attention to measurement

K = 1
complete attention to measurement

t+1
is weighted mean of new observation Zt+1 and the old mean
t

Observation unreliable
2
z
is large (more attention to old mean)

Old mean unreliable
2
t
is large (more attention to observation)
Markoviana Reading Group: Week3
The General Case
Multivariate Gaussian Distribution
–
Exponent is a quadratic function of the random variables x
i
in
x
Temporal model with Kalman filtering
–
F: linear transition model
–
H: linear sensor model
–
Sigma_x: transition noise covariance
–
Sigma_z: sensor noise covariance
Update equations for mean and covariance
–
K
t+1
: the Kalman gain matrix
–
F
t
: predicted state at t+1
–
HF
t
: the predicted observation
–
Z
t+1
–
HF
t
: error in predicted observation
Markoviana Reading Group: Week3
Illustration
Markoviana Reading Group: Week3
Applicability of Kalman Filtering
Popular applications
–
Navigation, guidance, radar tracking, sonar ranging, satellite
orbit computation, stock price prediction, landing of Eagle on
Moon, gyroscopes in airplanes, etc.
Extended Kalman Filters (EKF) can handle
Nonlinearities in Gaussian distributions
–
Model the system as locally linear in xt in the region of x
t
=
t
–
Works well for smooth, well

behaved systems
Switching Kalman Filters: multiple Kalman filters in
parallel, each using different model of the system
–
A weighted sum of predictions used
Markoviana Reading Group: Week3
Applicability of Kalman Filters
Markoviana Reading Group: Week3
Dynamic Bayesian Networks
Directed graphical models of stochastic processes
Extend HMMs by representing hidden (and observed) state in terms
of state variables, with possible complex interdependencies
Any number of state variables and evidence variables
Dynamic or Temporal Bayesian Network???
–
Model structure does not change over time
–
Parameters do not change over time
–
Extra hidden nodes can be added (mixture of models)
Markoviana Reading Group: Week3
DBNs and HMMs
HMM as a DBN
–
Single state variable and single evidence variable
Discrete variable DBN as an HMM
–
Combine all state variables in DBN into a single state
variable (with all possible values of individual state
variables)
–
Efficient Representation (with 20 boolean state
variables, DBN needs 160 probabilities, whereas
HMM needs roughly a trillion probabilities)
Analogous to Ordinary Bayesian Networks vs
Fully Tabulated Joint Distributions
Markoviana Reading Group: Week3
DBNs and Kalman Filters
Kalman filter as a DBN
–
Continuous variables and linear Gaussian
conditional distributions
DBN as a Kalman Filter
–
Not possible
–
DBN allows any arbitrary distributions
–
Lost keys example
Markoviana Reading Group: Week3
Constructing DBNs
Required information
–
Prior distributions over state variables P(X
0
)
–
The transition model P(X
t+1
X
t
)
–
The sensor model P(E
t
X
t
)
Comments 0
Log in to post a comment