Probabilistic Plan Recognition

benhurspicyAI and Robotics

Nov 7, 2013 (3 years and 9 months ago)

144 views

Probabilistic Plan Recognition


Kathryn Blackmond Laskey


Department of Systems Engineering and Operations Research

George Mason University


Dagstuhl

Seminar April 2011


1


The problem of plan recognition is to take
as input a sequence of actions performed
by an actor and to infer the goal pursued by
the actor and also to organize the action
sequence in terms of a plan structure

Schmidt,
Sridharan

and Goodson, 1978

…the problem of plan recognition is
largely a problem of inference under
conditions of uncertainty.

Charniak

and Goldman, 1993

2


PPR in a Nutshell

Represent


set of
possible plans


anticipated
evidence
for each plan

Specify


prior probabilities
for plans


likelihood

for evidence given plans

Infer

plans using
Bayes

Rule

Bayes
, Thomas. An essay towards solving a problem
in the doctrine of chances.
Philosophical Transactions
of the Royal Society of London, 53:370
-

418, 1763.

Thomas
Baye
s

(
1702
-
1762)

…or just directly specify
P(plan|obs
)

3


Why Probability?


Theoretically well
-
founded representation for relative
plausibility of competing explanations


Unified approach to inference and learning


Combine engineered and learned knowledge


Many general
-
purpose exact and approximate algorithms with
strong theoretical justification and practical success


Good results on many interesting problems


But…


Inference and learning (exact and approximate) are NP hard


Balancing tractability and expressiveness is
a major
research and
engineering challenge


4


Representing Plans and Observations


Plan recognition requires a computational representation of
possible plans

and
observable evidence


Goals


Actions


When executed in combination, actions are expected (with high
probability) to achieve the goal


Preconditions /
postconditions

of actions


Constraints


Most notably, temporal ordering


Observables


Actions may or may not be directly observable


Sometimes we observe
effects

of actions



Hierarchical decomposition of the above


For probabilistic plan recognition, we need to
assign
probabilities
to these elements


Balance expressivity against tractability of inference & learning

5


Some Representations for PPR


Bayesian networks


Hidden Markov Models / Dynamic Bayesian Networks


Plan Recognition Bayesian Networks / Probabilistic
Relational Models / Multi
-
Entity Bayesian Networks


Bayesian
Abductive

Logic Programs


Stochastic Grammars


Conditional Random Fields


Markov Logic Networks

Each of these formalisms can be thought of
as a way of representing a set of “possible
worlds” and defining a probability measure
on an algebra of subsets

6


Graphical Probability Models


Factorize joint distribution into factors involving only a few
variables each


Graph represents conditional independence assumptions


Local distributions specify probability information for small groups
of related variables


Factors are combined into joint distribution


Drastically simplifies specification,

inference and learning


20 possible goals, 100 possible actions


Fully general model


2.5x10
31

probabilities


“Naïve
Bayes

model”


19x20x100=38,000 probabilities


If each goal has only 10 associated actions then “naïve
Bayes

model”


19x10 = 190 probabilities


Naïve
Bayes

inference scales as #variables
x

#states/variable


G
A3
A2
A1
A4
A5
Naïve
Bayes

Model

7


Bayesian Network (BN)


Directed graph represents dependencies


Joint distribution factors as


Factored representation makes specification, inference and
learning tractable for interesting classes of problems


Directed graph naturally represents causality


Effects of intervention via “do” operator


Explaining away

127 probabilities



14 probabilities

Nuclear
Weapons (W)
Steal
Materials (S)
Buy
Reactors (B)
Regional
Domination (R)
Exercise (E)
Troops
Moving (T)
Invasion (I)
Pr(R,E,I,W,T,B,S
) =

Pr(R)Pr(E)Pr(I|R)Pr(W|R)Pr(T|E,I)Pr(B|W)Pr(S|W
)

8


Possible and Probable Worlds


“Traditional or deductive logic admits only three attitudes to any
proposition: definite proof, disproof, or blank ignorance.” (
Jeffreys
)


Semantics of classical logic is based on
possible worlds


Set of possible worlds defined by language, domain, and axioms


In propositional logic, possible worlds assign truth values to atoms

(e.g., R


T; W


T; E


F)


Probability theory


Set of possible worlds is called the
sample space


Probability measure
maps subsets to real numbers


Probability axioms are a natural extension of classical propositional logic to
likelihood


BN combines propositional logic with probability

R

E

I

W

T

B

S

Pr

T

T

T

T

T

T

T

p
1

T

T

T

T

T

T

F

p
2

T

T

T

T

T

F

T

p
3



Nuclear
Weapons (W)
Steal
Materials (S)
Buy
Reactors (B)
Regional
Domination (R)
Exercise (E)
Troops
Moving (T)
Invasion (I)
9


Other Factored Representations


Markov network: factorization specified by undirected
graph


More natural for domains without natural causal direction


Joint distribution factorizes as:




Chain graph: factorization specified by graph with both
directed and undirected edges


Representations to exploit context
-
specific
independence


Probability trees


Tree
-
structured parameterization for local distributions in a BN

C indexes cliques in the graph

x
iC

is
i
th

variable in clique C

k
C

is size of clique C

Z is a normalization constant

10


Conditional Random Fields


Bayesian networks are
generative

models


Represent joint probability over plans and observations


Realistic dependence models often yield intractable
inference


Conditional (or discriminative) model directly
represents probability of plans given observations


Can allow some dependencies to be relaxed


CRFs

are
discriminative


Undirected graph represents local dependencies


Potential function represents strength of dependence


A CRF is a family of
MRFs

(a mapping from
observations to potentials)

11


Inference in Graphical Models


Exact inference


E.g., Belief propagation, junction tree, bucket elimination, symbolic
probabilistic inference,
cutset

conditioning


Exploit graph structure / factorization to simplify computation


Infeasible for complex problems


Approximate (deterministic)


E.g., Loopy BP,
variational

Bayes


Approximate (stochastic)


E.g., Gibbs sampling, Metropolis
-
Hastings sampling, likelihood
weighting


Combinations


E.g.,
Bidyuk

and
Dechter

(2007)


cutset

sampling

12


Belief Propagation for
Singly Connected
BNs


Goal: compute probability
distribution of random
variable B given evidence
(assume B itself is not
known)


Key idea: impact of belief
in B from evidence "above"
B and evidence "below" B
can be processed
separately


Justification: B
d
-
separates
“above” random variables
from “below” random
variables

=
evidence random variable



A1

A2

A3

A6

D5

B

D6

D1

D2

A4

A5

D7

D3

D4

?

Random variables

“above” B

Random

variables

“below” B

p

p

l


This picture depicts the updating process for one node.


The algorithm simultaneously updates beliefs for all the nodes
.


Loopy BP applies BP to network with loops; often results in
good approximation

13


Likelihood Weighting (for
BNs
)

1.
Proceed through non
-
evidence
variables in order consistent with
partial ordering induced by graph


Sample variable according to its local
probability distribution


Calculate weight proportional to

Pr(evidence

| sampled values)

2.
Repeat Step 1 until done

3.
Estimate
Pr(Variable
=value) by
weighted sample frequency

H
B
J
D
C
E
G
F
K
A
L
14


Junction Tree Algorithm

1.
Compile BN into junction tree


Tree of clusters of nodes


Has JT property: variable
belonging to 2 clusters must
belong to all clusters along path
connecting them


Becomes part of the knowledge
representation


Changes only if the graph
changes

2.
Use local local message
-
passing algorithm to propagate
beliefs in the junction tree

3.
Query on any node or any

set of nodes in same cluster
can be computed from

cluster joint distribution


H
B
J
D
C
E
G
F
K
A
L
FGJK

JKL

ABC

DEGHJ

BCDEH

CDGEH

DFGHJ

15


Gibbs Sampling

1.
Initialize


Evidence variables assigned to
observed values


Arbitrary value for other variables

2.
Sample non
-
evidence nodes one at
a time:


Sample with probability

Pr(variable

| Markov blanket)


Replace with newly sampled value

3.
Repeat Step 2 until done

4.
Estimate
Pr(Variable
=value) by
sample frequency

H
B
J
D
C
E
G
F
K
A
L
H
B
J
D
C
E
G
F
K
A
L
H
B
J
D
C
E
G
F
K
A
L

Markov blanket

-
In BN: parents, children, co
-
parents

-
In MN: neighbors


Variable is conditionally independent of

rest of network given its Markov blanket

16


Cutset

Sampling (for
BNs
)

H
B
J
D
C
E
G
F
K
A
L
H
B
J=j
D=d
C
E
G
F
K
A
L

Find a loop
cutset


Initialize
cutset

variables


Do until done


Propagate beliefs on non
-
cutset

variables


Do Gibbs iteration on
cutset


Estimate
P(Variable
=value) by
averaging probability over samples


This is a kind of “
Rao
-
Blackwellization



Reduce variance of Monte Carlo estimator
by replacing a sampling step with an
exact computation with same expected
value

17


Variational

Inference


Method for approximating posterior distribution of
unobserved variables given observed variables


Approximation finds distribution in family with
simpler functional form (e.g., remove some arcs in
graph) by minimizing a measure of distance from
true posterior


Estimation via “
variational

EM”


Alternate between “expectation” and “maximization”
steps


Converges to
local minimum
of distance function


Yields lower bound for marginal likelihood


Often faster but less accurate than MC

18


Extending Expressive Power of
BNs

Charniak

and Goldman (1993)


Propositional logic + probability is insufficiently expressive for
requirements of plan recognition


Repeated structure


Multiple interrelated entities (e.g., plans, actors, actions)


Type hierarchy and inheritance


Unbounded number of potentially relevant variables


Some formalisms with greater expressive power:


PBN (Plan recognition BN)


PRM (Probabilistic

Relational Models)


OOBN (Object
-
Oriented

Bayesian Networks)


MEBN (Multi
-
Entity BN)


Plates


BALP (Bayesian
Abductive


Logic Programs)


19


Example: Maritime Domain Awareness

Entities, attributes and relations

20


MDA Probabilistic Ontology

Built in
UnBBayes
-
MEBN

21


MDA SSBN

Screenshot of situation
-
specific BN in
UnBBayes
-
MEBN

(open
-
source tool for building & reasoning with PR
-
OWL
ontologies
)

22


Protégé
Plugin

for
UnBBayes

23


Drag
-
and
-
Drop Mapping

24


Markov Logic Networks


First
-
order knowledge base with weight
attached to formulas and clauses


KB + individual constants


ground
Markov network containing variable for
each grounding of a formula in the KB


Compact language for specifying large
Markov networks

25


MLN Example

(Richardson and
Domingos
, 2006)

26


CRFs

for Chat Recognition

(Hsu,
Lian

and
Jih
, 2011)


Subscript indexes pairs of individuals


Y
i
t

represents chatting activity of pair


X
i
t

represents observed acoustic features


Dependence structure:


Within
-
pair temporal

dependence


Between
-
pair concurrent

dependence


Can be represented

as MLN


27


Possible and Probable FO
Worlds


In first
-
order logic, a possible world (aka “structure”) assigns:


Each constant symbol to a domain element (e.g., go3


obj
23
)


Each
n
-
ary

function symbol to a function on
n
-
tuples

of domain
elements (e.g., (go
-
stp

pln1)


obj
23


Each
n
-
ary

relation symbol to a set of
n
-
tuples

of domain
elements (e.g., inst


{(obj23, go
-
), (obj
78
, liquor
-
store),

(obj
78
, store) … }


A first
-
order probabilistic logic assigns a probability measure
to first
-
order structures


This is called “measure model” semantics (Gaifman,1964)

28


FOL + Probability: Issues


Probability zero ≠
unsatisfiable



E.g., every possible value of a continuous distribution has probability
zero


FOL is
undecidable
; FOL + probability is not even semi
-
decidable


Example: IID sequence of coin tosses, 0 < P(H) < 1


Given any finite sequence of prior tosses, both H and T are possible


We cannot disprove
any

non
-
extreme probability distribution from a
finite sequence of tosses


Wrong solution
: “We will prevent you from expressing this
query because we cannot tractably compute the answer.”


Better solution
: “Represent the problem you really want to
solve, and then figure out a way to approximate the answer.”


Think carefully about what the real problem is!

29


Knowledge Based Model Construction


KBMC system contains :


Base representation that represents goals, plans, actions, actors,
observables, constraints, etc.


Model construction procedure that maps a context and/or query
into a target model


At problem solving time


Construct a problem
-
specific Bayesian network


Process queries on constructed model using general
-
purpose
BN algorithm



Advantages of expressive representation


Understandability


Maintainability


Knowledge reuse


Exploit repeated structure (representation, inference, learning)


Construct only as much of model as needed for query


30


Hypothesis Management


Constructed BN rapidly becomes intractable,
especially in presence of existence and
association uncertainty


What do we really need to represent?


Heuristics help to avoid constructing (or prune)
very unlikely hypotheses (or variables with very
weak impact on conclusions)


E.g., from only “John went to the airport” do not
nominate hypothesis that John intends to set off a
bomb


But a security system needs to be on the alert for
prospective bombers!

31


Lifted Inference


Constructed BN (
propositionalized

theory)
typically contains repeated structure


Applying standard BN inference often results in
many repetitions of the identical computation


Lifted inference algorithms detect such
repetitions


“Lift” problem from ground to first
-
order level


Perform computation only once


Very active area of research

(
Braz
, et al., 2005)

32


Learning = Inference

… in theory, at least

(= (store-of xi) yj)
(inst xi liquor-shopping)
(inst yj liquor-store)
i
=1,…,N

j
=1,…,M

Plate model for parameter learning of
store
-
of
local distribution

33


Representing Temporal Evolution


Plans evolve in time


HMM / DBN / PDBN replicate variables describing
temporally evolving situation

Position(k)
Position(k-1)
Report(k-1)
Report(k)
PosReport(k)
Position(k-1)
PosReport(k-1)
Position(k)
ApparentShape(k)
ApparentShape(k-1)
ShpReport(k-1)
ShpReport(k-1)
Hidden Markov Model (HMM)

unobservable evolving state + observable indicator

Dynamic Bayesian Network (DBN)

factored representation of state / observable

Partially Dynamic Bayesian Network (PDBN)

some variables not time
-
dependent

PosReport(k)
Position(k-1)
PosReport(k-1)
Position(k)
ApparentShape(k)
ApparentShape(k-1)
ShpReport(k-1)
ShpReport(k-1)
Destination
Shape
Mission
34


DBN Inference


Any BN inference algorithm can be applied to a finite
-
horizon DBN


Special
-
case inference algorithms exploit DBN structure


“Rollup” algorithm marginalizes out past hidden states given past
observations to explicitly represent only a sliding window


Viterbi

algorithm finds most probable values of hidden states given
observations


Forward
-
backward algorithm estimates marginal distributions for
hidden states given observations


Exact inference is generally intractable


Factored frontier algorithm approximates marginalization of past
hidden state for intractable
DBNs


Particle filter is a temporal variant of likelihood weighting with
resampling


Beware of static nodes!

35


Resampling

Particle
Filter

initialization

Likelihood

weighting

Resampling

Likelihood

weighting

Evolution


Maintains sample of
weighted particles


Each particle is a single realization of all

non
-
evidence nodes


Particle is weighted by likelihood of observation given particle


Particles are
resampled

with probability proportional to weight

From
van
der

Merwe

et al. (undated)

36


Particle Impoverishment


Particles with large weights are sampled more
often, leading to low particle diversity


This effect is counteracted by “spreading” effects
of process noise


Impoverishment is very serious when:


Observations are extremely unlikely


Low “process noise” leads to long dwell times in
widely separated basins of attraction


“In fact, for the case of very small process noise, all particles
will collapse to a single point within a few iterations.”


“If the process noise is zero, then using a particle filter is not
entirely appropriate.” (
Arulampulam

et al., 2002)


37


Particle Filter with Static Nodes


PF
cannot recover
from impoverishment of static node


Some approaches:


Estimate separate PF for each combination of static node


Only if static node has small state space


Regularized PF
-

artificial evolution of static node


Ad hoc; no justification for amount of perturbation;

information loss over time


Shrinkage (Liu & West)


Combines ideas from artificial evolution & kernel smoothing


Perturbation “shrinks” static node for each particle

toward weighted sample mean


Perturbation holds variance of set of particles constant


Correlation in disturbances compensates for information loss


Resample
-
Move (
Gilks

&
Berzuini
)


Metropolis
-
Hastings step corrects for particle impoverishment


MH sampling of static node involves entire trajectory but is performed
less frequently as runs become longer


























X

38


Stochastic Grammars


Motivation: find representation that is sufficiently
expressive for plan recognition but more
tractable than general DBN inference


A stochastic grammar is a set of stochastic
production rules for generating sequences of
actions (terminal symbols in the grammar)


Modularity of production rules yields factored
joint distribution

39


Stochastic Grammar
-

Example


Taken from
Geib

and
Goldman (
2009)


Plans are represented as
and/or tree with temporal
constraints

40


Stochastic Grammar
-

Inference


Parsing algorithms can be applied to compute
restricted class of queries


If plans can be represented in a given formalism then that
formalism’s inference algorithms can be applied to process
queries


We are often interested in a broader class of queries than
traditional parsing algorithms can handle (e.g., we usually
have not observed all actions)


Parse tree can be converted to DBN


Enables answering a broader class of queries


Can exploit structure of grammar to improve tractability of
inference


Special
-
purpose algorithms exploit grammar structure

41


Where Do We Stand?


Contributions of probabilistic methods


Useful way of thinking about problems


Unified approach to reasoning, parameter learning, structure learning


Principled combination of KE with learning


Can learn from small, moderate and large samples


Many general
-
purpose exact and approximate algorithms with strong
theoretical justification and practical success


Good results (better than previous state of the art) on many interesting
problems


Many challenging problems remain


Exact learning and inference are intractable


High
-
dimensional multi
-
modal distributions are just plain ugly


All inference algorithms break down on the toughest cases


Asymptotics

doesn’t mean much when the long run is millions of years!


With good engineering backed by solid theory, we will continue to make
progress

42


Bibliography
(1 of 2)

Arulampalam
, M.
Maskell
, S., Gordon, N. and Clapp, T. A Tutorial on Particle Filters for Online Nonlinear/Non
-
Gaussian
Bayesian Tracking,
IEEE Transactions on Signal Processing
, 50 , pp. 174

188, 2002.

Bidyuk
, B. and
Dechter
, R. "
Cutset

Sampling for Bayesian Networks",

Journal of Artificial Intelligence Research
28, pages 1
-
48, 2007.

Braz
, R., Amir, E. and Roth, D. Lifted First
-
Order Probabilistic Inference.
Proceedings of the International Joint Conference on
Artificial Intelligence
, 2005.

Bui, H.,
Venkatesh
, S. and West, G. "Policy Recognition in the Abstract Hidden Markov Model", Artificial Intelligence,
Journal
of Artificial Intelligence Research
, Volume 17, pages 451
-
499, 2002.

Charniak
, E. and Goldman, R. A Bayesian Model of Plan Recognition.
Artificial Intelligence
, 64: 53
-
79, 1993.

Charniak
, E. and Goldman, R. A Probabilistic Model of Plan Recognition.
Proceedings of the Ninth Conference on Artificial
Intelligence

1991.

Darwiche
,
A. Modeling and Reasoning with Bayesian Networks
. Cambridge University Press. 2009.

Gaifman
, H. Concerning measures in First
-
Order calculi.

Israel Journal of Mathematics
, 2, 1

18, 1964.

Geib
, C.W. and Goldman, R.P. A Probabilistic Plan Recognition Algorithm Based on Plan Tree Grammars.
Artificial
Intelligence

173, pp. 1101

1132, 2009.

Gilks
, W.R. and
Berzuini
,
C. Following a Moving Target

Monte Carlo Inference for Dynamic Bayesian Models,”

Journal of the
Royal Statistical Society B, 63
, pp. 127

146, 2001.

Hsu, J.,
Lian
, C., and
Jih
, W. Probabilistic Models for Concurrent Chatting Activity Recognition.
ACM Transactions on
Intelligent Systems and Technology
, Vol. 2, No. 1, 2011.

Jensen, F.,

Bayesian Networks and Decision Graphs

(2nd edition). Springer, 2007.

Korb
, K. and Nicholson, A.
Bayesian Artificial Intelligence
. Chapman and Hall, 2003.

Koller
, D., Friedman, N.
Probabilistic Graphical Models
. MIT Press, 2009.

Lafferty, J., McCallum, A., and Pereira, F. Conditional random fields: Probabilistic models for segmenting and labeling
sequence data. In
Proceedings of the 18th International Conference on Machine Learning
, 2001.

Laskey, K.B., MEBN: A Language for First
-
Order Bayesian Knowledge Bases,
Artificial Intelligence
, 172(2
-
3): 140
-
178, 2008

43


Bibliography
(2 of 2)

Liao, L. Patterson, D. J. Fox, D. and
Kautz
, H. Learning and Inferring Transportation Routines.
Artificial Intelligence
, 2007.

Liao, L., Fox, D., AND
Kautz
, H. Hierarchical Conditional Random Fields for GPS
-
based Activity Recognition. In
Springer
Tracts in Advanced Robotics
. Springer, 2007.

Liu, J. and West, M., Combined Parameter and State Estimation in Simulation
-
Based Filtering,” in Sequential Monte Carlo
Methods in Practice, A.
Doucet
, J. F. G. de
Freitas
, and N. J. Gordon, Eds. New York: Springer
-
Verlag
, 2001.

Musso
, C.
Oudjane
, N and
LeGland
, F. Improving
Regularised

Particle Filters, in
Sequential Monte Carlo Methods in Practice
,
A.
Doucet
, J. F. G. de
Freitas
, and N. J. Gordon, Eds. New York: Springer
-
Verlag
, 2001.

Neapolitan, R.
Learning Bayesian Networks
. Prentice Hall, 2003.

Pearl, J.
Probabilistic Reasoning in Intelligent Systems
. Morgan Kaufmann, 1988.

Pynadath
, D.V. and Wellman, M.P. Probabilistic State
-
Dependent Grammars for Plan Recognition.
Proceedings of the
Sixteenth Conference on Uncertainty in Artificial Intelligence
, 2000.

Pynadath
, D. V. and Wellman, M. P. Generalized queries on probabilistic context
-
free grammars.
IEEE Transactions on
Pattern Analysis and Machine Intelligence
, 20(1):65

77, 1998.

Richardson, M. and
Domingos
, P., Markov Logic Networks
. Machine Learning, 62,
107
-
136, 2006.

Schmidt, C.,
Sridharan
, N., and Goodson, J., The plan recognition problem: An Intersection of psychology and Artificial
Intelligence,
Artificial Intelligence

11 pp. 45

83, 1978.

van
der

Merwe
, R.,
Doucet
, A., de
Freitas
, N. and Wan, E. The Unscented Particle Filter, Adv. Neural Inform. Process. Syst.
2000.

van
der

Merwe
, R.,
Doucet
, A., de
Freitas
, N. and Wan, E. (undated) “The unscented particle filter,”
http://
cslu.cse.ogi.edu/publications/ps/UPF_CSLU_talk.pdf

Wellman, M.P., J.S. Breese, and R.P. Goldman (1992) From knowledge bases to decision models.
The Knowledge
Engineering Review
, 7(1):35
-
53.