Probabilistic Plan Recognition
Kathryn Blackmond Laskey
Department of Systems Engineering and Operations Research
George Mason University
Dagstuhl
Seminar April 2011
1
The problem of plan recognition is to take
as input a sequence of actions performed
by an actor and to infer the goal pursued by
the actor and also to organize the action
sequence in terms of a plan structure
Schmidt,
Sridharan
and Goodson, 1978
…the problem of plan recognition is
largely a problem of inference under
conditions of uncertainty.
Charniak
and Goldman, 1993
2
PPR in a Nutshell
Represent
•
set of
possible plans
•
anticipated
evidence
for each plan
Specify
•
prior probabilities
for plans
•
likelihood
for evidence given plans
Infer
plans using
Bayes
Rule
Bayes
, Thomas. An essay towards solving a problem
in the doctrine of chances.
Philosophical Transactions
of the Royal Society of London, 53:370

418, 1763.
Thomas
Baye
s
(
1702

1762)
…or just directly specify
P(planobs
)
3
Why Probability?
•
Theoretically well

founded representation for relative
plausibility of competing explanations
•
Unified approach to inference and learning
•
Combine engineered and learned knowledge
•
Many general

purpose exact and approximate algorithms with
strong theoretical justification and practical success
•
Good results on many interesting problems
•
But…
–
Inference and learning (exact and approximate) are NP hard
–
Balancing tractability and expressiveness is
a major
research and
engineering challenge
4
Representing Plans and Observations
•
Plan recognition requires a computational representation of
possible plans
and
observable evidence
–
Goals
–
Actions
•
When executed in combination, actions are expected (with high
probability) to achieve the goal
–
Preconditions /
postconditions
of actions
–
Constraints
•
Most notably, temporal ordering
–
Observables
•
Actions may or may not be directly observable
•
Sometimes we observe
effects
of actions
–
Hierarchical decomposition of the above
•
For probabilistic plan recognition, we need to
assign
probabilities
to these elements
–
Balance expressivity against tractability of inference & learning
5
Some Representations for PPR
•
Bayesian networks
•
Hidden Markov Models / Dynamic Bayesian Networks
•
Plan Recognition Bayesian Networks / Probabilistic
Relational Models / Multi

Entity Bayesian Networks
•
Bayesian
Abductive
Logic Programs
•
Stochastic Grammars
•
Conditional Random Fields
•
Markov Logic Networks
Each of these formalisms can be thought of
as a way of representing a set of “possible
worlds” and defining a probability measure
on an algebra of subsets
6
Graphical Probability Models
•
Factorize joint distribution into factors involving only a few
variables each
–
Graph represents conditional independence assumptions
–
Local distributions specify probability information for small groups
of related variables
–
Factors are combined into joint distribution
•
Drastically simplifies specification,
inference and learning
•
20 possible goals, 100 possible actions
–
Fully general model
2.5x10
31
probabilities
–
“Naïve
Bayes
model”
19x20x100=38,000 probabilities
–
If each goal has only 10 associated actions then “naïve
Bayes
model”
19x10 = 190 probabilities
–
Naïve
Bayes
inference scales as #variables
x
#states/variable
G
A3
A2
A1
A4
A5
Naïve
Bayes
Model
7
Bayesian Network (BN)
•
Directed graph represents dependencies
•
Joint distribution factors as
•
Factored representation makes specification, inference and
learning tractable for interesting classes of problems
•
Directed graph naturally represents causality
–
Effects of intervention via “do” operator
–
Explaining away
127 probabilities
14 probabilities
Nuclear
Weapons (W)
Steal
Materials (S)
Buy
Reactors (B)
Regional
Domination (R)
Exercise (E)
Troops
Moving (T)
Invasion (I)
Pr(R,E,I,W,T,B,S
) =
Pr(R)Pr(E)Pr(IR)Pr(WR)Pr(TE,I)Pr(BW)Pr(SW
)
8
Possible and Probable Worlds
•
“Traditional or deductive logic admits only three attitudes to any
proposition: definite proof, disproof, or blank ignorance.” (
Jeffreys
)
•
Semantics of classical logic is based on
possible worlds
–
Set of possible worlds defined by language, domain, and axioms
–
In propositional logic, possible worlds assign truth values to atoms
(e.g., R
T; W
T; E
F)
•
Probability theory
–
Set of possible worlds is called the
sample space
–
Probability measure
maps subsets to real numbers
–
Probability axioms are a natural extension of classical propositional logic to
likelihood
•
BN combines propositional logic with probability
R
E
I
W
T
B
S
Pr
T
T
T
T
T
T
T
p
1
T
T
T
T
T
T
F
p
2
T
T
T
T
T
F
T
p
3
…
Nuclear
Weapons (W)
Steal
Materials (S)
Buy
Reactors (B)
Regional
Domination (R)
Exercise (E)
Troops
Moving (T)
Invasion (I)
9
Other Factored Representations
•
Markov network: factorization specified by undirected
graph
–
More natural for domains without natural causal direction
–
Joint distribution factorizes as:
•
Chain graph: factorization specified by graph with both
directed and undirected edges
•
Representations to exploit context

specific
independence
–
Probability trees
–
Tree

structured parameterization for local distributions in a BN
C indexes cliques in the graph
x
iC
is
i
th
variable in clique C
k
C
is size of clique C
Z is a normalization constant
10
Conditional Random Fields
•
Bayesian networks are
generative
models
–
Represent joint probability over plans and observations
–
Realistic dependence models often yield intractable
inference
•
Conditional (or discriminative) model directly
represents probability of plans given observations
–
Can allow some dependencies to be relaxed
•
CRFs
are
discriminative
–
Undirected graph represents local dependencies
–
Potential function represents strength of dependence
•
A CRF is a family of
MRFs
(a mapping from
observations to potentials)
11
Inference in Graphical Models
•
Exact inference
–
E.g., Belief propagation, junction tree, bucket elimination, symbolic
probabilistic inference,
cutset
conditioning
–
Exploit graph structure / factorization to simplify computation
–
Infeasible for complex problems
•
Approximate (deterministic)
–
E.g., Loopy BP,
variational
Bayes
•
Approximate (stochastic)
–
E.g., Gibbs sampling, Metropolis

Hastings sampling, likelihood
weighting
•
Combinations
–
E.g.,
Bidyuk
and
Dechter
(2007)
–
cutset
sampling
12
Belief Propagation for
Singly Connected
BNs
•
Goal: compute probability
distribution of random
variable B given evidence
(assume B itself is not
known)
•
Key idea: impact of belief
in B from evidence "above"
B and evidence "below" B
can be processed
separately
•
Justification: B
d

separates
“above” random variables
from “below” random
variables
=
evidence random variable
A1
A2
A3
A6
D5
B
D6
D1
D2
A4
A5
D7
D3
D4
?
Random variables
“above” B
Random
variables
“below” B
p
p
l
•
This picture depicts the updating process for one node.
•
The algorithm simultaneously updates beliefs for all the nodes
.
•
Loopy BP applies BP to network with loops; often results in
good approximation
13
Likelihood Weighting (for
BNs
)
1.
Proceed through non

evidence
variables in order consistent with
partial ordering induced by graph
–
Sample variable according to its local
probability distribution
–
Calculate weight proportional to
Pr(evidence
 sampled values)
2.
Repeat Step 1 until done
3.
Estimate
Pr(Variable
=value) by
weighted sample frequency
H
B
J
D
C
E
G
F
K
A
L
14
Junction Tree Algorithm
1.
Compile BN into junction tree
–
Tree of clusters of nodes
–
Has JT property: variable
belonging to 2 clusters must
belong to all clusters along path
connecting them
–
Becomes part of the knowledge
representation
–
Changes only if the graph
changes
2.
Use local local message

passing algorithm to propagate
beliefs in the junction tree
3.
Query on any node or any
set of nodes in same cluster
can be computed from
cluster joint distribution
H
B
J
D
C
E
G
F
K
A
L
FGJK
JKL
ABC
DEGHJ
BCDEH
CDGEH
DFGHJ
15
Gibbs Sampling
1.
Initialize
–
Evidence variables assigned to
observed values
–
Arbitrary value for other variables
2.
Sample non

evidence nodes one at
a time:
–
Sample with probability
Pr(variable
 Markov blanket)
–
Replace with newly sampled value
3.
Repeat Step 2 until done
4.
Estimate
Pr(Variable
=value) by
sample frequency
H
B
J
D
C
E
G
F
K
A
L
H
B
J
D
C
E
G
F
K
A
L
H
B
J
D
C
E
G
F
K
A
L
•
Markov blanket

In BN: parents, children, co

parents

In MN: neighbors
•
Variable is conditionally independent of
rest of network given its Markov blanket
16
Cutset
Sampling (for
BNs
)
H
B
J
D
C
E
G
F
K
A
L
H
B
J=j
D=d
C
E
G
F
K
A
L
•
Find a loop
cutset
•
Initialize
cutset
variables
•
Do until done
–
Propagate beliefs on non

cutset
variables
–
Do Gibbs iteration on
cutset
•
Estimate
P(Variable
=value) by
averaging probability over samples
•
This is a kind of “
Rao

Blackwellization
”
–
Reduce variance of Monte Carlo estimator
by replacing a sampling step with an
exact computation with same expected
value
17
Variational
Inference
•
Method for approximating posterior distribution of
unobserved variables given observed variables
•
Approximation finds distribution in family with
simpler functional form (e.g., remove some arcs in
graph) by minimizing a measure of distance from
true posterior
•
Estimation via “
variational
EM”
–
Alternate between “expectation” and “maximization”
steps
–
Converges to
local minimum
of distance function
–
Yields lower bound for marginal likelihood
•
Often faster but less accurate than MC
18
Extending Expressive Power of
BNs
Charniak
and Goldman (1993)
•
Propositional logic + probability is insufficiently expressive for
requirements of plan recognition
–
Repeated structure
–
Multiple interrelated entities (e.g., plans, actors, actions)
–
Type hierarchy and inheritance
–
Unbounded number of potentially relevant variables
•
Some formalisms with greater expressive power:
–
PBN (Plan recognition BN)
–
PRM (Probabilistic
Relational Models)
–
OOBN (Object

Oriented
Bayesian Networks)
–
MEBN (Multi

Entity BN)
–
Plates
–
BALP (Bayesian
Abductive
Logic Programs)
19
Example: Maritime Domain Awareness
Entities, attributes and relations
20
MDA Probabilistic Ontology
Built in
UnBBayes

MEBN
21
MDA SSBN
Screenshot of situation

specific BN in
UnBBayes

MEBN
(open

source tool for building & reasoning with PR

OWL
ontologies
)
22
Protégé
Plugin
for
UnBBayes
23
Drag

and

Drop Mapping
24
Markov Logic Networks
•
First

order knowledge base with weight
attached to formulas and clauses
•
KB + individual constants
ground
Markov network containing variable for
each grounding of a formula in the KB
•
Compact language for specifying large
Markov networks
25
MLN Example
(Richardson and
Domingos
, 2006)
26
CRFs
for Chat Recognition
(Hsu,
Lian
and
Jih
, 2011)
•
Subscript indexes pairs of individuals
•
Y
i
t
represents chatting activity of pair
•
X
i
t
represents observed acoustic features
•
Dependence structure:
–
Within

pair temporal
dependence
–
Between

pair concurrent
dependence
•
Can be represented
as MLN
27
Possible and Probable FO
Worlds
•
In first

order logic, a possible world (aka “structure”) assigns:
–
Each constant symbol to a domain element (e.g., go3
obj
23
)
–
Each
n

ary
function symbol to a function on
n

tuples
of domain
elements (e.g., (go

stp
pln1)
obj
23
–
Each
n

ary
relation symbol to a set of
n

tuples
of domain
elements (e.g., inst
{(obj23, go

), (obj
78
, liquor

store),
(obj
78
, store) … }
•
A first

order probabilistic logic assigns a probability measure
to first

order structures
–
This is called “measure model” semantics (Gaifman,1964)
28
FOL + Probability: Issues
•
Probability zero ≠
unsatisfiable
–
E.g., every possible value of a continuous distribution has probability
zero
•
FOL is
undecidable
; FOL + probability is not even semi

decidable
–
Example: IID sequence of coin tosses, 0 < P(H) < 1
•
Given any finite sequence of prior tosses, both H and T are possible
•
We cannot disprove
any
non

extreme probability distribution from a
finite sequence of tosses
•
Wrong solution
: “We will prevent you from expressing this
query because we cannot tractably compute the answer.”
•
Better solution
: “Represent the problem you really want to
solve, and then figure out a way to approximate the answer.”
–
Think carefully about what the real problem is!
29
Knowledge Based Model Construction
•
KBMC system contains :
–
Base representation that represents goals, plans, actions, actors,
observables, constraints, etc.
–
Model construction procedure that maps a context and/or query
into a target model
•
At problem solving time
–
Construct a problem

specific Bayesian network
–
Process queries on constructed model using general

purpose
BN algorithm
•
Advantages of expressive representation
–
Understandability
–
Maintainability
–
Knowledge reuse
–
Exploit repeated structure (representation, inference, learning)
–
Construct only as much of model as needed for query
30
Hypothesis Management
•
Constructed BN rapidly becomes intractable,
especially in presence of existence and
association uncertainty
•
What do we really need to represent?
•
Heuristics help to avoid constructing (or prune)
very unlikely hypotheses (or variables with very
weak impact on conclusions)
–
E.g., from only “John went to the airport” do not
nominate hypothesis that John intends to set off a
bomb
–
But a security system needs to be on the alert for
prospective bombers!
31
Lifted Inference
•
Constructed BN (
propositionalized
theory)
typically contains repeated structure
•
Applying standard BN inference often results in
many repetitions of the identical computation
•
Lifted inference algorithms detect such
repetitions
–
“Lift” problem from ground to first

order level
–
Perform computation only once
•
Very active area of research
(
Braz
, et al., 2005)
32
Learning = Inference
… in theory, at least
(= (storeof xi) yj)
(inst xi liquorshopping)
(inst yj liquorstore)
i
=1,…,N
j
=1,…,M
Plate model for parameter learning of
store

of
local distribution
33
Representing Temporal Evolution
•
Plans evolve in time
•
HMM / DBN / PDBN replicate variables describing
temporally evolving situation
Position(k)
Position(k1)
Report(k1)
Report(k)
PosReport(k)
Position(k1)
PosReport(k1)
Position(k)
ApparentShape(k)
ApparentShape(k1)
ShpReport(k1)
ShpReport(k1)
Hidden Markov Model (HMM)
unobservable evolving state + observable indicator
Dynamic Bayesian Network (DBN)
factored representation of state / observable
Partially Dynamic Bayesian Network (PDBN)
some variables not time

dependent
PosReport(k)
Position(k1)
PosReport(k1)
Position(k)
ApparentShape(k)
ApparentShape(k1)
ShpReport(k1)
ShpReport(k1)
Destination
Shape
Mission
34
DBN Inference
•
Any BN inference algorithm can be applied to a finite

horizon DBN
•
Special

case inference algorithms exploit DBN structure
–
“Rollup” algorithm marginalizes out past hidden states given past
observations to explicitly represent only a sliding window
–
Viterbi
algorithm finds most probable values of hidden states given
observations
–
Forward

backward algorithm estimates marginal distributions for
hidden states given observations
•
Exact inference is generally intractable
–
Factored frontier algorithm approximates marginalization of past
hidden state for intractable
DBNs
–
Particle filter is a temporal variant of likelihood weighting with
resampling
•
Beware of static nodes!
35
Resampling
Particle
Filter
initialization
Likelihood
weighting
Resampling
Likelihood
weighting
Evolution
•
Maintains sample of
weighted particles
•
Each particle is a single realization of all
non

evidence nodes
•
Particle is weighted by likelihood of observation given particle
•
Particles are
resampled
with probability proportional to weight
From
van
der
Merwe
et al. (undated)
36
Particle Impoverishment
•
Particles with large weights are sampled more
often, leading to low particle diversity
•
This effect is counteracted by “spreading” effects
of process noise
•
Impoverishment is very serious when:
–
Observations are extremely unlikely
–
Low “process noise” leads to long dwell times in
widely separated basins of attraction
•
“In fact, for the case of very small process noise, all particles
will collapse to a single point within a few iterations.”
•
“If the process noise is zero, then using a particle filter is not
entirely appropriate.” (
Arulampulam
et al., 2002)
37
Particle Filter with Static Nodes
•
PF
cannot recover
from impoverishment of static node
•
Some approaches:
–
Estimate separate PF for each combination of static node
•
Only if static node has small state space
–
Regularized PF

artificial evolution of static node
•
Ad hoc; no justification for amount of perturbation;
information loss over time
–
Shrinkage (Liu & West)
•
Combines ideas from artificial evolution & kernel smoothing
•
Perturbation “shrinks” static node for each particle
toward weighted sample mean
–
Perturbation holds variance of set of particles constant
–
Correlation in disturbances compensates for information loss
–
Resample

Move (
Gilks
&
Berzuini
)
•
Metropolis

Hastings step corrects for particle impoverishment
•
MH sampling of static node involves entire trajectory but is performed
less frequently as runs become longer
X
38
Stochastic Grammars
•
Motivation: find representation that is sufficiently
expressive for plan recognition but more
tractable than general DBN inference
•
A stochastic grammar is a set of stochastic
production rules for generating sequences of
actions (terminal symbols in the grammar)
•
Modularity of production rules yields factored
joint distribution
39
Stochastic Grammar

Example
•
Taken from
Geib
and
Goldman (
2009)
•
Plans are represented as
and/or tree with temporal
constraints
40
Stochastic Grammar

Inference
•
Parsing algorithms can be applied to compute
restricted class of queries
–
If plans can be represented in a given formalism then that
formalism’s inference algorithms can be applied to process
queries
–
We are often interested in a broader class of queries than
traditional parsing algorithms can handle (e.g., we usually
have not observed all actions)
•
Parse tree can be converted to DBN
–
Enables answering a broader class of queries
–
Can exploit structure of grammar to improve tractability of
inference
•
Special

purpose algorithms exploit grammar structure
41
Where Do We Stand?
•
Contributions of probabilistic methods
–
Useful way of thinking about problems
–
Unified approach to reasoning, parameter learning, structure learning
–
Principled combination of KE with learning
–
Can learn from small, moderate and large samples
–
Many general

purpose exact and approximate algorithms with strong
theoretical justification and practical success
–
Good results (better than previous state of the art) on many interesting
problems
•
Many challenging problems remain
–
Exact learning and inference are intractable
–
High

dimensional multi

modal distributions are just plain ugly
•
All inference algorithms break down on the toughest cases
–
Asymptotics
doesn’t mean much when the long run is millions of years!
–
With good engineering backed by solid theory, we will continue to make
progress
42
Bibliography
(1 of 2)
Arulampalam
, M.
Maskell
, S., Gordon, N. and Clapp, T. A Tutorial on Particle Filters for Online Nonlinear/Non

Gaussian
Bayesian Tracking,
IEEE Transactions on Signal Processing
, 50 , pp. 174
–
188, 2002.
Bidyuk
, B. and
Dechter
, R. "
Cutset
Sampling for Bayesian Networks",
Journal of Artificial Intelligence Research
28, pages 1

48, 2007.
Braz
, R., Amir, E. and Roth, D. Lifted First

Order Probabilistic Inference.
Proceedings of the International Joint Conference on
Artificial Intelligence
, 2005.
Bui, H.,
Venkatesh
, S. and West, G. "Policy Recognition in the Abstract Hidden Markov Model", Artificial Intelligence,
Journal
of Artificial Intelligence Research
, Volume 17, pages 451

499, 2002.
Charniak
, E. and Goldman, R. A Bayesian Model of Plan Recognition.
Artificial Intelligence
, 64: 53

79, 1993.
Charniak
, E. and Goldman, R. A Probabilistic Model of Plan Recognition.
Proceedings of the Ninth Conference on Artificial
Intelligence
1991.
Darwiche
,
A. Modeling and Reasoning with Bayesian Networks
. Cambridge University Press. 2009.
Gaifman
, H. Concerning measures in First

Order calculi.
Israel Journal of Mathematics
, 2, 1
–
18, 1964.
Geib
, C.W. and Goldman, R.P. A Probabilistic Plan Recognition Algorithm Based on Plan Tree Grammars.
Artificial
Intelligence
173, pp. 1101
–
1132, 2009.
Gilks
, W.R. and
Berzuini
,
C. Following a Moving Target
—
Monte Carlo Inference for Dynamic Bayesian Models,”
Journal of the
Royal Statistical Society B, 63
, pp. 127
–
146, 2001.
Hsu, J.,
Lian
, C., and
Jih
, W. Probabilistic Models for Concurrent Chatting Activity Recognition.
ACM Transactions on
Intelligent Systems and Technology
, Vol. 2, No. 1, 2011.
Jensen, F.,
Bayesian Networks and Decision Graphs
(2nd edition). Springer, 2007.
Korb
, K. and Nicholson, A.
Bayesian Artificial Intelligence
. Chapman and Hall, 2003.
Koller
, D., Friedman, N.
Probabilistic Graphical Models
. MIT Press, 2009.
Lafferty, J., McCallum, A., and Pereira, F. Conditional random fields: Probabilistic models for segmenting and labeling
sequence data. In
Proceedings of the 18th International Conference on Machine Learning
, 2001.
Laskey, K.B., MEBN: A Language for First

Order Bayesian Knowledge Bases,
Artificial Intelligence
, 172(2

3): 140

178, 2008
43
Bibliography
(2 of 2)
Liao, L. Patterson, D. J. Fox, D. and
Kautz
, H. Learning and Inferring Transportation Routines.
Artificial Intelligence
, 2007.
Liao, L., Fox, D., AND
Kautz
, H. Hierarchical Conditional Random Fields for GPS

based Activity Recognition. In
Springer
Tracts in Advanced Robotics
. Springer, 2007.
Liu, J. and West, M., Combined Parameter and State Estimation in Simulation

Based Filtering,” in Sequential Monte Carlo
Methods in Practice, A.
Doucet
, J. F. G. de
Freitas
, and N. J. Gordon, Eds. New York: Springer

Verlag
, 2001.
Musso
, C.
Oudjane
, N and
LeGland
, F. Improving
Regularised
Particle Filters, in
Sequential Monte Carlo Methods in Practice
,
A.
Doucet
, J. F. G. de
Freitas
, and N. J. Gordon, Eds. New York: Springer

Verlag
, 2001.
Neapolitan, R.
Learning Bayesian Networks
. Prentice Hall, 2003.
Pearl, J.
Probabilistic Reasoning in Intelligent Systems
. Morgan Kaufmann, 1988.
Pynadath
, D.V. and Wellman, M.P. Probabilistic State

Dependent Grammars for Plan Recognition.
Proceedings of the
Sixteenth Conference on Uncertainty in Artificial Intelligence
, 2000.
Pynadath
, D. V. and Wellman, M. P. Generalized queries on probabilistic context

free grammars.
IEEE Transactions on
Pattern Analysis and Machine Intelligence
, 20(1):65
–
77, 1998.
Richardson, M. and
Domingos
, P., Markov Logic Networks
. Machine Learning, 62,
107

136, 2006.
Schmidt, C.,
Sridharan
, N., and Goodson, J., The plan recognition problem: An Intersection of psychology and Artificial
Intelligence,
Artificial Intelligence
11 pp. 45
–
83, 1978.
van
der
Merwe
, R.,
Doucet
, A., de
Freitas
, N. and Wan, E. The Unscented Particle Filter, Adv. Neural Inform. Process. Syst.
2000.
van
der
Merwe
, R.,
Doucet
, A., de
Freitas
, N. and Wan, E. (undated) “The unscented particle filter,”
http://
cslu.cse.ogi.edu/publications/ps/UPF_CSLU_talk.pdf
Wellman, M.P., J.S. Breese, and R.P. Goldman (1992) From knowledge bases to decision models.
The Knowledge
Engineering Review
, 7(1):35

53.
Comments 0
Log in to post a comment