# Kalman Filters and Dynamic Bayesian Networks

AI and Robotics

Nov 7, 2013 (4 years and 7 months ago)

167 views

Kalman

Filters and

Dynamic Bayesian Networks

Markoviana

Srinivas

Arizona State University

Source 1

2

Introduction to Kalman Filters

CEE 6430: Probabilistic Methods
in Hydroscienecs

Fall 2008

Acknowledgements: Numerous sources on WWW,
book, papers

Source 2

Outline

Introduction

Gaussian Distribution

Introduction

Examples (Linear and Multivariate)

Kalman Filters

General Properties

Updating Gaussian Distributions

One
-
dimensional Example

Applicability of Kalman Filtering

Dynamic Bayesian Networks (DBNs)

Introduction

DBNs and HMMs

DBNs and HMMs

Constructing DBNs

4

A

䡹摲d

䕸慭灬p

Suppose you have a hydrologic model that predicts river
water level every hour (using the usual inputs).

You know that your model is not perfect and you don’t
trust it 100%. So you want to send someone to check
the river level in person.

However, the river level can only be checked once a day
around noon and not every hour.

Furthermore, the person who measures the river level
can not be trusted 100% either.

So how do you combine both outputs of river level (from
model and from measurement) so that you get a ‘fused’
and better estimate?

Kalman filtering

5

Graphically speaking

6

What is a Filter by the way?

Other applications of Kalman
Filtering (or Filtering in general):

1)
update location)

2)
Surface to Air Missile (hitting
the target)

3)
(Appollo 11 used some sort of
filtering to make sure it didn’t
miss the Moon!)

7

The Problem in General

(let

System state cannot be measured directly

Need to estimate

optimally

from
measurements

Measuring
Devices

Estimator

Measurement

Error Sources

System State
(desired but not
known)

External
Controls

Observed
Measurements

Optimal
Estimate of
System State

System

Error Sources

System

Black Box

Sometimes the system
state and the
measurement may be two
different things (not like
river level example)

8

What is a Kalman Filter?

Recursive

data processing algorithm

Generates
optimal

estimate of desired quantities
given the set of measurements

Optimal?

For linear system and white Gaussian errors, Kalman
filter is

best

estimate based on all previous
measurements

For non
-
linear system optimality is

qualified

Recursive?

Doesn

t need to store all previous measurements
and reprocess all data each time step

9

Conceptual Overview

Simple example to motivate the workings
of the Kalman Filter

The essential equations you need to know
(Kalman Filtering for Dummies!)

Examples: Prediction and Correction

10

Conceptual Overview

Lost on the 1
-
dimensional line (imagine that you are
guessing your position by looking at the stars using
sextant)

Position

y(t)

Assume Gaussian distributed measurements

y

11

Conceptual Overview

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16

Sextant Measurement at t
1
: Mean = z
1
and Variance =

z1

Optimal estimate of position is:
ŷ(t
1
) = z
1

Variance of error in estimate:

2
x

(t
1
) =

2
z1

Boat in same position
at time t
2

-

Predicted

position is z
1

State space

position

Measurement
-

position

Sextant is not
perfect

12

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview

So we have the prediction
ŷ
-
(t
2
)

GPS Measurement at t
2
: Mean = z
2
and Variance =

z2

Need to
correct

the prediction by Sextant due to measurement to
get
ŷ
(t
2
)

Closer to more trusted measurement

should we do linear
interpolation?

prediction ŷ
-
(t
2
)

State
(by looking
at the stars at t2)

Measurement
usign GPS z(t
2
)

13

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview

Corrected mean is the new optimal estimate of position (basically
you ha
ve

updated

the predicted position by Sextant using GPS

New variance is smaller than either of the previous two variances

measurement
z(t
2
)

corrected optimal
estimate ŷ(t
2
)

prediction ŷ
-
(t
2
)

Kalman filter helps
you fuse
measurement and
prediction on the
basis of how much
you trust each

(I would trust the
GPS more than the
sextant)

14

Conceptual Overview

(The Kalman Equations)

Lessons so far:

Make prediction based on previous data
-

ŷ
-
,

-

Take measurement

z
k
,

z

Optimal estimate (ŷ) = Prediction + (Kalman Gain) * (Measurement
-

Prediction
)

Variance of estimate = Variance of prediction * (1

Kalman Gain
)

15

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview

At time t
3
, boat moves with velocity dy/dt=u

Naïve approach: Shift probability to the right to predict

This would work if we knew the velocity exactly (perfect model)

ŷ(t
2
)

Naïve Prediction
(sextant) ŷ
-
(t
3
)

What if the
boat was now
moving?

16

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview

Better to assume imperfect model by adding Gaussian noise

dy/dt = u + w

Distribution for prediction moves and spreads out

ŷ(t
2
)

Naïve Prediction
ŷ
-
(t
3
)

Prediction ŷ
-
(t
3
)

But you may not be so
velocity

17

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview

Now we take a measurement at t
3

Need to once again correct the prediction

Same as before

Prediction
ŷ
-
(t
3
) Sextant

Measurement
z(t
3
) GPS

Corrected optimal estimate ŷ(t
3
) Updated Sextant position using GPS

18

Conceptual Overview

Lessons learnt from conceptual overview:

Initial conditions (
ŷ
k
-
1
and

k
-
1
)

Prediction (
ŷ
k
-

,

k
-
)

Use initial conditions and model (eg. constant velocity) to
make prediction

Measurement (z
k
)

Take measurement

Correction (
ŷ
k
,

k
)

Use measurement to correct prediction by

blending

prediction and residual

always a case of merging only two
Gaussians

Optimal estimate with smaller variance

19

Blending Factor

If we are sure about measurements:

Measurement error covariance (R) decreases to zero

K decreases and weights residual more heavily than prediction

If we are sure about prediction

Prediction error covariance P
-
k
decreases to zero

K increases and weights prediction more heavily than residual

20

The set of Kalman Filtering
Equations in Detail

ŷ
-
k
= Ay
k
-
1

+ Bu
k

P
-
k
= AP
k
-
1
A
T

+ Q

Prediction (Time Update)

(2) Project the error covariance ahead

Correction (Measurement Update)

(1) Compute the Kalman Gain

(2) Update estimate with measurement z
k

(3) Update Error Covariance

ŷ
k
=
ŷ
-
k

+ K(z
k

-

H
ŷ
-
k

)

K = P
-
k
H
T
(HP
-
k
H
T

+ R)
-
1

P
k
= (I
-

KH)P
-
k

21

Assumptions behind Kalman Filter

The model you use to predict the

state

needs
to be a LINEAR function of the measurement
(so how do we use non
-
linear rainfall
-
runoff
models?)

The model error and the measurement error
(noise) must be Gaussian with zero mean

HMMs and Kalman Filters

Hidden Markov Models (HMMs)

Discrete State Variables

Used to model sequence of events

Kalman Filters

Continuous State Variables, with Gaussian
Distribution

Used to model noisy continuous observations

Examples

Predict the motion of a bird through dense jungle foliage at
dusk

Predict the direction of the missile through intermittent radar
movement observations

Gaussian (Normal) Distribution

Central Limit Theorem: The sum of n statistical independent random
variables converges for n

∞ towards the Gaussian distribution

(
Applet
Illustration)

Unlike the binomial and Poisson distribution, the Gaussian is a
continuous distribution:


= mean of distribution (also at the same place as mode and median)

2

= variance of distribution

y

is a continuous variable (
-


y



Gaussian distribution is fully defined by its mean and variance

2
2
2
)
(
2
1
)
(

y
e
y
p

p
(
x
)

1

2

e

(
x

)
2
2

2
gaus
s
i
an

Gaussian Distribution: Examples

Linear Gaussian Distribution

Mean,

and Variance,

Multivariate Gaussian Distribution

For 3 random variables

Mean,

= [m1 m2 m3]

Covariance Matrix, Sigma = [ v
11

v
12

v
13

v
21

v
22

v
23

v
31

v
32

v
33

]

Kalman Filters: General Properties

Estimate the state and the covariance of the
state at any time T, given observations, x
T

= {x
1
,
…, x
T
}

E.g., Estimate the state (location and velocity) of
airplane and its uncertainty, given some
measurements from an array of sensors

The probability of interest is P(y
t
|x
T
)

Filtering the state

T = current time, t

Predicting the state

T < current time, t

Smoothing the state

T > current time, t

Gaussian Noise & Example

Next State is linear function of current
state, plus some Gaussian noise

Position Update:

Gaussian Noise:

Updating Gaussian Distributions

Linear Gaussian family of distributions remains closed under standard Bayesian network
operations ( this means we end up with Gaussian distributions

a very nice property.)

One
-
step predicted distribution

Current distribution P(X
t
|e
1:t
) is Gaussian

Transition model P(X
t+1
|x
t
) is linear Gaussian

The updated distribution

Predicted distribution P(X
t+1
|e
1:t
) is Gaussian

Sensor model P(e
t+1
|X
t+1
) is linear Gaussian

Filtering and Prediction (From 15.2):

One
-
dimensional Example

Update Rule (Derivations from Russel & Norvig)

Compute new mean and covariance matrix from the previous mean and
covariance matrix

Variance update is independent of the observation

Another variation of the update rule (from Max Welling, Caltech)

-

2

is variance or uncertainty,
K is the Kalman gain

-

K = 0

no attention to measurement

-

K = 1

complete attention to measurement

-

t+1

is weighted mean of new observation Zt+1 and the old mean

t

-
Observation unreliable

2
z

is large (more attention to old mean)

-
Old mean unreliable

2
t

is large (more attention to observation)

The General Case

Multivariate Gaussian Distribution

Exponent is a quadratic function of the random variables x
i

in
x

Temporal model with Kalman filtering

F: linear transition model

H: linear sensor model

Sigma_x: transition noise covariance

Sigma_z: sensor noise covariance

Update equations for mean and covariance

K
t+1
: the Kalman gain matrix

F

t
: predicted state at t+1

HF

t
: the predicted observation

Z
t+1

HF
t
: error in predicted observation

Illustration

Applicability of Kalman Filtering

Popular applications

orbit computation, stock price prediction, landing of Eagle on
Moon, gyroscopes in airplanes, etc.

Extended Kalman Filters (EKF) can handle
Nonlinearities in Gaussian distributions

Model the system as locally linear in xt in the region of x
t

=

t

Works well for smooth, well
-
behaved systems

Switching Kalman Filters: multiple Kalman filters in
parallel, each using different model of the system

A weighted sum of predictions used

Applicability of Kalman Filters

Dynamic Bayesian Networks

Directed graphical models of stochastic processes

Extend HMMs by representing hidden (and observed) state in terms
of state variables, with possible complex interdependencies

Any number of state variables and evidence variables

Dynamic or Temporal Bayesian Network???

Model structure does not change over time

Parameters do not change over time

Extra hidden nodes can be added (mixture of models)

DBNs and HMMs

HMM as a DBN

Single state variable and single evidence variable

Discrete variable DBN as an HMM

Combine all state variables in DBN into a single state
variable (with all possible values of individual state
variables)

Efficient Representation (with 20 boolean state
variables, DBN needs 160 probabilities, whereas
HMM needs roughly a trillion probabilities)

Analogous to Ordinary Bayesian Networks vs
Fully Tabulated Joint Distributions

DBNs and Kalman Filters

Kalman filter as a DBN

Continuous variables and linear Gaussian
conditional distributions

DBN as a Kalman Filter

Not possible

DBN allows any arbitrary distributions

Lost keys example

Constructing DBNs

Required information

Prior distributions over state variables P(X
0
)

The transition model P(X
t+1
|X
t
)

The sensor model P(E
t
|X
t
)