Kalman Filters and Dynamic Bayesian Networks

lettuceescargatoireAI and Robotics

Nov 7, 2013 (3 years and 9 months ago)

57 views

Kalman

Filters and

Dynamic Bayesian Networks

Markoviana

Reading Group

Srinivas

Vadrevu

Arizona State University

Source 1

2

Introduction to Kalman Filters

CEE 6430: Probabilistic Methods
in Hydroscienecs

Fall 2008

Acknowledgements: Numerous sources on WWW,
book, papers

Source 2

Markoviana Reading Group: Week3

Outline

Introduction

Gaussian Distribution


Introduction


Examples (Linear and Multivariate)

Kalman Filters


General Properties


Updating Gaussian Distributions


One
-
dimensional Example


Notes about general case


Applicability of Kalman Filtering

Dynamic Bayesian Networks (DBNs)


Introduction


DBNs and HMMs


DBNs and HMMs


Constructing DBNs

4

A

䡹摲d


䕸慭灬p

Suppose you have a hydrologic model that predicts river
water level every hour (using the usual inputs).

You know that your model is not perfect and you don’t
trust it 100%. So you want to send someone to check
the river level in person.

However, the river level can only be checked once a day
around noon and not every hour.

Furthermore, the person who measures the river level
can not be trusted 100% either.

So how do you combine both outputs of river level (from
model and from measurement) so that you get a ‘fused’
and better estimate?


Kalman filtering


5

Graphically speaking

6

What is a Filter by the way?

Other applications of Kalman
Filtering (or Filtering in general):

1)
Your Car GPS (predict and
update location)

2)
Surface to Air Missile (hitting
the target)

3)
Ship or Rocket navigation
(Appollo 11 used some sort of
filtering to make sure it didn’t
miss the Moon!)

7

The Problem in General

(let

猠来琠愠汩瑴汥潲攠瑥捨湩捡氩

System state cannot be measured directly

Need to estimate

optimally


from
measurements

Measuring
Devices

Estimator

Measurement

Error Sources

System State
(desired but not
known)

External
Controls

Observed
Measurements

Optimal
Estimate of
System State

System

Error Sources

System

Black Box

Sometimes the system
state and the
measurement may be two
different things (not like
river level example)

8

What is a Kalman Filter?

Recursive

data processing algorithm

Generates
optimal

estimate of desired quantities
given the set of measurements

Optimal?


For linear system and white Gaussian errors, Kalman
filter is

best


estimate based on all previous
measurements


For non
-
linear system optimality is

qualified


Recursive?


Doesn

t need to store all previous measurements
and reprocess all data each time step



9

Conceptual Overview

Simple example to motivate the workings
of the Kalman Filter

The essential equations you need to know
(Kalman Filtering for Dummies!)

Examples: Prediction and Correction

10

Conceptual Overview

Lost on the 1
-
dimensional line (imagine that you are
guessing your position by looking at the stars using
sextant)

Position


y(t)

Assume Gaussian distributed measurements

y

11

Conceptual Overview

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16

Sextant Measurement at t
1
: Mean = z
1
and Variance =

z1


Optimal estimate of position is:
ŷ(t
1
) = z
1


Variance of error in estimate:

2
x

(t
1
) =

2
z1


Boat in same position
at time t
2

-

Predicted

position is z
1



State space


position

Measurement
-

position

Sextant is not
perfect

12

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview


So we have the prediction
ŷ
-
(t
2
)


GPS Measurement at t
2
: Mean = z
2
and Variance =

z2


Need to
correct

the prediction by Sextant due to measurement to
get
ŷ
(t
2
)


Closer to more trusted measurement


should we do linear
interpolation?



prediction ŷ
-
(t
2
)

State
(by looking
at the stars at t2)

Measurement
usign GPS z(t
2
)

13

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview


Corrected mean is the new optimal estimate of position (basically
you ha
ve

updated


the predicted position by Sextant using GPS


New variance is smaller than either of the previous two variances

measurement
z(t
2
)

corrected optimal
estimate ŷ(t
2
)

prediction ŷ
-
(t
2
)

Kalman filter helps
you fuse
measurement and
prediction on the
basis of how much
you trust each


(I would trust the
GPS more than the
sextant)

14

Conceptual Overview

(The Kalman Equations)

Lessons so far:


Make prediction based on previous data
-

ŷ
-
,

-


Take measurement


z
k
,

z

Optimal estimate (ŷ) = Prediction + (Kalman Gain) * (Measurement
-

Prediction
)

Variance of estimate = Variance of prediction * (1


Kalman Gain
)

15

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview


At time t
3
, boat moves with velocity dy/dt=u


Naïve approach: Shift probability to the right to predict


This would work if we knew the velocity exactly (perfect model)

ŷ(t
2
)

Naïve Prediction
(sextant) ŷ
-
(t
3
)

What if the
boat was now
moving?

16

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview


Better to assume imperfect model by adding Gaussian noise


dy/dt = u + w


Distribution for prediction moves and spreads out


ŷ(t
2
)

Naïve Prediction
ŷ
-
(t
3
)

Prediction ŷ
-
(t
3
)

But you may not be so
sure about the exact
velocity

17

0
10
20
30
40
50
60
70
80
90
100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Conceptual Overview


Now we take a measurement at t
3


Need to once again correct the prediction


Same as before



Prediction
ŷ
-
(t
3
) Sextant

Measurement
z(t
3
) GPS

Corrected optimal estimate ŷ(t
3
) Updated Sextant position using GPS

18

Conceptual Overview

Lessons learnt from conceptual overview:


Initial conditions (
ŷ
k
-
1
and

k
-
1
)


Prediction (
ŷ
k
-

,

k
-
)


Use initial conditions and model (eg. constant velocity) to
make prediction



Measurement (z
k
)


Take measurement


Correction (
ŷ
k
,

k
)


Use measurement to correct prediction by

blending


prediction and residual


always a case of merging only two
Gaussians


Optimal estimate with smaller variance


19

Blending Factor


If we are sure about measurements:


Measurement error covariance (R) decreases to zero


K decreases and weights residual more heavily than prediction



If we are sure about prediction


Prediction error covariance P
-
k
decreases to zero


K increases and weights prediction more heavily than residual

20

The set of Kalman Filtering
Equations in Detail

ŷ
-
k
= Ay
k
-
1

+ Bu
k

P
-
k
= AP
k
-
1
A
T

+ Q

Prediction (Time Update)

(1) Project the state ahead

(2) Project the error covariance ahead

Correction (Measurement Update)

(1) Compute the Kalman Gain

(2) Update estimate with measurement z
k

(3) Update Error Covariance

ŷ
k
=
ŷ
-
k

+ K(z
k

-

H
ŷ
-
k

)

K = P
-
k
H
T
(HP
-
k
H
T

+ R)
-
1

P
k
= (I
-

KH)P
-
k

21

Assumptions behind Kalman Filter


The model you use to predict the

state


needs
to be a LINEAR function of the measurement
(so how do we use non
-
linear rainfall
-
runoff
models?)


The model error and the measurement error
(noise) must be Gaussian with zero mean


Markoviana Reading Group: Week3

HMMs and Kalman Filters

Hidden Markov Models (HMMs)


Discrete State Variables


Used to model sequence of events


Kalman Filters


Continuous State Variables, with Gaussian
Distribution


Used to model noisy continuous observations


Examples


Predict the motion of a bird through dense jungle foliage at
dusk


Predict the direction of the missile through intermittent radar
movement observations

Markoviana Reading Group: Week3

Gaussian (Normal) Distribution

Central Limit Theorem: The sum of n statistical independent random
variables converges for n


∞ towards the Gaussian distribution

(
Applet
Illustration)


Unlike the binomial and Poisson distribution, the Gaussian is a
continuous distribution:








= mean of distribution (also at the same place as mode and median)



2

= variance of distribution


y

is a continuous variable (
-


y







Gaussian distribution is fully defined by its mean and variance

2
2
2
)
(
2
1
)
(







y
e
y
p


p
(
x
)

1

2

e

(
x


)
2
2

2
gaus
s
i
an
Markoviana Reading Group: Week3

Gaussian Distribution: Examples

Linear Gaussian Distribution


Mean,


and Variance,




Multivariate Gaussian Distribution


For 3 random variables


Mean,


= [m1 m2 m3]


Covariance Matrix, Sigma = [ v
11

v
12

v
13









v
21

v
22

v
23








v
31

v
32

v
33

]

Markoviana Reading Group: Week3

Kalman Filters: General Properties

Estimate the state and the covariance of the
state at any time T, given observations, x
T

= {x
1
,
…, x
T
}


E.g., Estimate the state (location and velocity) of
airplane and its uncertainty, given some
measurements from an array of sensors

The probability of interest is P(y
t
|x
T
)

Filtering the state

T = current time, t

Predicting the state

T < current time, t

Smoothing the state

T > current time, t

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Gaussian Noise & Example

Next State is linear function of current
state, plus some Gaussian noise


Position Update:


Gaussian Noise:

Markoviana Reading Group: Week3

Updating Gaussian Distributions

Linear Gaussian family of distributions remains closed under standard Bayesian network
operations ( this means we end up with Gaussian distributions


a very nice property.)


One
-
step predicted distribution


Current distribution P(X
t
|e
1:t
) is Gaussian


Transition model P(X
t+1
|x
t
) is linear Gaussian


The updated distribution


Predicted distribution P(X
t+1
|e
1:t
) is Gaussian


Sensor model P(e
t+1
|X
t+1
) is linear Gaussian


Filtering and Prediction (From 15.2):

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

Markoviana Reading Group: Week3

One
-
dimensional Example

Update Rule (Derivations from Russel & Norvig)


Compute new mean and covariance matrix from the previous mean and
covariance matrix







Variance update is independent of the observation



Another variation of the update rule (from Max Welling, Caltech)

-


2

is variance or uncertainty,
K is the Kalman gain

-

K = 0


no attention to measurement

-

K = 1


complete attention to measurement

-


t+1

is weighted mean of new observation Zt+1 and the old mean


t

-
Observation unreliable



2
z

is large (more attention to old mean)

-
Old mean unreliable



2
t

is large (more attention to observation)

Markoviana Reading Group: Week3

The General Case

Multivariate Gaussian Distribution





Exponent is a quadratic function of the random variables x
i

in
x


Temporal model with Kalman filtering





F: linear transition model


H: linear sensor model


Sigma_x: transition noise covariance


Sigma_z: sensor noise covariance


Update equations for mean and covariance








K
t+1
: the Kalman gain matrix


F

t
: predicted state at t+1


HF

t
: the predicted observation


Z
t+1



HF
t
: error in predicted observation


Markoviana Reading Group: Week3

Illustration

Markoviana Reading Group: Week3

Applicability of Kalman Filtering

Popular applications


Navigation, guidance, radar tracking, sonar ranging, satellite
orbit computation, stock price prediction, landing of Eagle on
Moon, gyroscopes in airplanes, etc.


Extended Kalman Filters (EKF) can handle
Nonlinearities in Gaussian distributions


Model the system as locally linear in xt in the region of x
t

=

t


Works well for smooth, well
-
behaved systems


Switching Kalman Filters: multiple Kalman filters in
parallel, each using different model of the system


A weighted sum of predictions used


Markoviana Reading Group: Week3

Applicability of Kalman Filters

Markoviana Reading Group: Week3

Dynamic Bayesian Networks

Directed graphical models of stochastic processes


Extend HMMs by representing hidden (and observed) state in terms
of state variables, with possible complex interdependencies


Any number of state variables and evidence variables


Dynamic or Temporal Bayesian Network???


Model structure does not change over time


Parameters do not change over time


Extra hidden nodes can be added (mixture of models)


Markoviana Reading Group: Week3

DBNs and HMMs

HMM as a DBN


Single state variable and single evidence variable


Discrete variable DBN as an HMM


Combine all state variables in DBN into a single state
variable (with all possible values of individual state
variables)


Efficient Representation (with 20 boolean state
variables, DBN needs 160 probabilities, whereas
HMM needs roughly a trillion probabilities)


Analogous to Ordinary Bayesian Networks vs
Fully Tabulated Joint Distributions

Markoviana Reading Group: Week3

DBNs and Kalman Filters

Kalman filter as a DBN


Continuous variables and linear Gaussian
conditional distributions


DBN as a Kalman Filter


Not possible


DBN allows any arbitrary distributions


Lost keys example

Markoviana Reading Group: Week3

Constructing DBNs

Required information


Prior distributions over state variables P(X
0
)


The transition model P(X
t+1
|X
t
)


The sensor model P(E
t
|X
t
)