Artificial Intelligence (AI)
Planning
Sungwook Yoon
Sungwook Yoon
What do we (AI researchers) mean by
Plan?
Sungwook Yoon
4.
A drawing or diagram made to scale
showing the structure or arrangement
of something.
5.
In perspective rendering, one of
several imaginary planes
perpendicular to the line of vision
between the viewer and the object
being depicted.
6.
A program or policy stipulating a
service or benefit:
a pension plan.
Synonyms:
blueprint, design, project,
scheme, strategy
plan
n.
1.
A scheme, program, or method
worked out beforehand for the
accomplishment of an objective:
a
plan of attack.
2.
A proposed or tentative project or
course of action:
had no plans for the
evening.
3.
A systematic arrangement of elements
or important parts; a configuration or
outline:
a seating plan; the plan of a
story.
plan
n.
1.
A scheme,
program, or
method worked
out beforehand
for the
accomplishment
of an objective:
a
plan of attack (or
exit
).
plan
n.
2.
A proposed or tentative project or
course of action:
had no plans for the
evening.
plan
n.
3.
A systematic arrangement
of elements or important
parts; a configuration or
outline:
a seating plan;
the plan of a story.
plan
n.
4.
A drawing or diagram made
to scale showing the structure or
arrangement of something.
plan
n.
5.
In perspective rendering, one of
several imaginary planes
perpendicular to the line of vision
between the viewer and the object
being depicted.
plan
n.
6.
A program or policy stipulating a service or benefit:
a pension plan.
4.
A drawing or diagram made to scale
showing the structure or arrangement
of something.
5.
In perspective rendering, one of
several imaginary planes
perpendicular to the line of vision
between the viewer and the object
being depicted.
6.
A program or policy stipulating a
service or benefit:
a pension plan.
Synonyms:
blueprint, design, project,
scheme, strategy
plan
n.
1.
A scheme, program, or method
worked out beforehand for the
accomplishment of an objective:
a
plan of attack.
2.
A proposed or tentative project or
course of action:
had no plans for the
evening.
3.
A systematic arrangement of elements
or important parts; a configuration or
outline:
a seating plan; the plan of a
story.
Automated Planning concerns …
•
Mainly synthesizing a course of actions to achieve
the given goal
•
Finding actions that need to be conducted in each
situation
–
When you are going to Chicago
–
In Tempe, “take a cab”
–
In Sky Harbor, “take the plane”
•
In summary, planning tries to find a plan (course
of actions) given the initial state (you are in
Tempe) and the goal (you want to be in Chicago)
Sungwook Yoon
What is a Planning Problem?
•
Any problem that needs sequential decision
–
For a single decision, you should look for Machine
Learning
•
Classification
•
Given a picture “is this a cat or a dog?”
•
Any Examples?
–
FreeCell
–
Sokoban
–
Micro

mouse
–
Bridge Game
–
Football
Sungwook Yoon
What is a Planner?
Sungwook Yoon
Planner
1. Move spade 2
to the cell
2. Move space 3
to the cell
3. …
1.
Move block
1 to left
2.
Move block
1 to above
3.
….
Planning Involves Deciding a Course of Action to achieve a
desired state of affairs
Environment
Goals
(Static vs.
Dynamic
)
(Observable vs.
Partially Observable
)
(perfect vs.
Imperfect
)
(Deterministic vs.
Stochastic
)
What action next?
(Instantaneous vs.
Durative
)
(Full vs.
Partial satisfaction
)
Any real world application for planning
please?
Sungwook Yoon
•
Autonomous planning, scheduling, control
–
NASA: JPL and Ames
•
Remote Agent
Experiment (RAX)
–
Deep Space 1
•
Mars Exploration
Rover (MER)
Space Exploration
•
Sheet

metal bending machines

Amada
Corporation
–
Software to plan the sequence of bends
[Gupta and Bourne,
J. Manufacturing Sci. and
Engr.
, 1999]
Manufacturing
•
Bridge Baron

Great Game Products
–
1997 world champion of computer bridge
[Smith, Nau, and Throop,
AI Magazine
, 1998]
–
2004: 2nd place
(North
—
Q)
…
…
PlayCard(P
3
; S, R
3
)
PlayCard(P
2
; S, R
2
)
PlayCard(P
4
; S, R
4
)
FinesseFour(P
4
; S)
PlayCard(P
1
; S, R
1
)
StandardFinesseTwo(P
2
; S)
LeadLow(P
1
; S)
PlayCard(P
4
; S, R
4
’
)
StandardFinesseThree(P
3
; S)
EasyFinesse(P
2
; S)
BustedFinesse(P
2
; S)
FinesseTwo(P
2
; S)
StandardFinesse(P
2
; S)
Finesse(P
1
; S)
Us:East declarer, West dummy
Opponents:defenders, South & North
Contract:East
–
3NT
On lead:West at trick 3
East:
KJ74
West:
A2
Out:
QT98653
(North
—
3)
East
—
J
West
—
2
North
—
3
South
—
5
South
—
Q
Games
Planning Involves Deciding a Course of Action to achieve a
desired state of affairs
Environment
Goals
(Static vs.
Dynamic
)
(Observable vs.
Partially Observable
)
(perfect vs.
Imperfect
)
(Deterministic vs.
Stochastic
)
What action next?
(Instantaneous vs.
Durative
)
(Full vs.
Partial satisfaction
)
Static
Deterministic
Observable
Instantaneous
Propositional
“Classical Planning”
Dynamic
Durative
Continuous
Stochastic
Partially
Observable
21
Percepts
Actions
????
World
perfect
fully
observable
instantaneous
deterministic
Classical Planning Assumptions
sole source
of change
Representing States
holding(A)
clear(B)
on(B,C)
onTable
(C)
State 1
handEmpty
clear(A)
on(A,B)
on(B,C)
onTable(C)
State 2
C
A
B
C
A
B
World states
are represented as sets of facts.
We will also refer to facts as propositions.
Closed World Assumption (CWA):
Fact not listed in a state are assumed to be false. Under CWA
we are assuming the agent has full observability.
Representing Goals
Goals
are also represented as sets of facts.
For example
{ on(A,B) }
is a goal in the blocks world.
A
goal state
is any state that contains all the goal facts.
handEmpty
clear(A)
on(A,B)
on(B,C)
onTable(C)
State 1
C
A
B
holding(A)
clear(B)
on(B,C)
onTable(C)
State 2
C
A
B
State 1 is a goal state for the goal
{ on(A,B) }
.
State 2 is not a goal state.
24
Representing Action in STRIPS
holding(A)
clear(B)
on(B,C)
onTable(C)
State 1
handEmpty
clear(A)
on(A,B)
on(B,C)
onTable(C)
State 2
Stack(A,B)
C
A
B
C
A
B
A STRIPS action definition specifies:
1) a set PRE of preconditions facts
2) a set ADD of add effect facts
3) a set DEL of delete effect facts
Stack(x,y):
PRE
: { holding(x), clear(y) }
ADD
: { on(x,y), handEmpty }
DEL
: { holding(x), clear(y) }
Stack(A,B):
PRE
: { holding(A), clear(B) }
ADD
: { on(A,B), handEmpty }
DEL
: { holding(A), clear(B) }
x←A
x←B
25
Semantics of STRIPS Actions
holding(A)
clear(B)
on(B,C)
onTable(C)
S
handEmpty
clear(A)
on(A,B)
on(B,C)
onTable(C)
S
䅄䐠
–
DEL
Stack(A,B)
C
A
B
C
A
B
•
A STRIPS action is
applicable
(or allowed) in a state when its
preconditions are contained in the state.
•
Taking an action in a state
S
results in a new state
S
ADD
–
DEL
(i.e. add the add effects and remove the delete effects)
Stack(A,B):
PRE
: { holding(A), clear(B) }
ADD
: { on(A,B), handEmpty }
DEL
: { holding(A), clear(B) }
26
STRIPS Planning Problems
Stack(A,B):
PRE
: { holding(A), clear(B) }
ADD
: { on(A,B), handEmpty }
DEL
: { holding(A), clear(B) }
A
STRIPS planning problem
specifies:
1) an initial state
S
2) a goal
G
3) a set of STRIPS actions
holding(A)
clear(B)
onTable(B)
Initial State
A
B
on(A,B)
Goal
Stack(B,A):
PRE
: { holding(B), clear(A) }
ADD
: { on(B,A), handEmpty }
DEL
: { holding(B), clear(A) }
STRIPS Actions
Example Problem
:
Objective
: find a “short” action sequence reaching a goal state,
or report that the goal is unachievable
Solution
: (
Stack(A,B)
)
27
Properties of Planners
A planner is
sound
if any action sequence it
returns is a true solution
A planner is
complete
outputs an action
sequence or “no solution” for any input problem
A planner is
optimal
if it always returns the
shortest possible solution
Is optimality an important requirement?
Is it a reasonable requirement?
28
Complexity of STRIPS Planning
PlanSAT is decidable.
Why?
In general PlanSAT is PSPACE

complete!
Just finding a plan is hard in the worst case.
even when actions limited to just 2 preconditions
and 2 effects
PlanSAT
Given:
a STRIPS planning problem
Output:
“yes” if problem is solvable, otherwise “no”
NOTE: PSPACE is set of all problems that are decidable in polynomial space.
PSPACE

complete is believed to be harder than NP

complete
Does this mean that we should give up on AI planning?
Satisficing vs. Optimality
?
While just finding a plan is hard in the worst case, for many planning domains,
finding a plan is easy.
However finding optimal solutions can still be hard in those domains.
For example, optimal planning in the blocks world is NP

complete.
In practice it is often sufficient to find “good” solutions “quickly” although they
may not be optimal.
For example, finding sub

optimal blocks world solutions can be
done in linear time. How?
30
Search Space: Blocks World
Search space is finite.
31
Forward

Chaining Search
. . . .
. . . .
initial state
goal
Breadth

first and best

first search are sound and complete
Very large branching factor can cause search to waste time and space
trying many irrelevant actions
O
(
b
d
) worst

case where
b
= branching factor,
d
= depth limit
Need a good heuristic function and/or pruning procedure
Early AI researchers gave up on forward search.
But there has been a recent resurgence. More on this later in the course.
32
Backward

Chaining Search
. . . .
initial state
goal
Backward search can focus on more “goal relevant” actions, but
still the branch factor is typically huge
Again a good heuristic function and/or pruning procedure
Early AI researchers gave up on forward and backward search.
But there has been recent progress in developing
general planning heuristics leading to a resurgence.
More on this later in the course.
33
Total

Order vs. Partial

Order Planning (POP)
State

space planning techniques produce
totally

ordered
plans,
i.e. plans consisting of a strict sequence of actions.
Often, however, there are many possible orderings of actions
than have equivalent effects.
However, often many orderings of the actions have equivalent
effects.
?
D
A
B
C
D
A
B
C
1)
move(A, B, TABLE) ; move(B, TABLE, A) ; move(C, D, TABLE) ; move(D, TABLE, C)
2)
move(A, B, TABLE) ; move(C, D, TABLE) ; move(D, TABLE, C) ; move(B, TABLE, A)
3)
move(C, D, TABLE) ; move(D, TABLE, C) ; move(A, B, TABLE) ; move(B, TABLE, A)
ect . . .
There are many possible plans:
34
Total

Order vs. Partial

Order Planning (POP)
?
D
A
B
C
D
A
B
C
1)
move(A, B, TABLE) ; move(B, TABLE, A) ; move(C, D, TABLE) ; move(D, TABLE, C)
2)
move(A, B, TABLE) ; move(C, D, TABLE) ; move(D, TABLE, C) ; move(B, TABLE, A)
3)
move(C, D, TABLE) ; move(D, TABLE, C) ; move(A, B, TABLE) ; move(B, TABLE, A)
ect . . .
There are many possible plans:
•
These plans share some common structure. They are all different
interleavings
of two separate plans:
1)
move(A, B, TABLE) ; move(B, TABLE, A)
2)
move(C, D, TABLE) ; move(D, TABLE, C)
•
A
partial

order
plan is one which specifies only the necessary ordering
information. One partial

order plan may have many total

orderings
35
Total

Order vs. Partial

Order Planning (POP)
Planning Techniques in Summary
•
Forward State Space Search
•
Backward State Space Search
•
Partial Order Planning (plan space search)
•
What is the state of the art technique?
Sungwook Yoon
Exercise
Sungwook Yoon
What is a Planning Problem?
•
Any problem that needs sequential decision
–
For a single decision, you should look for Machine
Learning
•
Any Examples?
–
FreeCell
–
Sokoban
–
Micromouse
–
Bridge Game
–
Football
Sungwook Yoon
Markov Decision Process (MDP)
•
Sequential
decision problems under uncertainty
–
Not just the immediate utility, but the longer

term
utility as well
–
Uncertainty in outcomes
•
Roots in operations research
•
Also used in economics, communications engineering,
ecology, performance modeling and of course, AI!
–
Also referred to as stochastic dynamic programs
Markov Decision Process (MDP)
•
Defined as a tuple:
<S, A, P, R>
–
S: State
–
A: Action
–
P: Transition function
•
Table P(s’ s, a), prob of s’ given action “a” in state “s”
–
R: Reward
•
R(s, a) = cost or reward of taking action a in state s
•
Choose a sequence of actions (not just one decision or one action)
–
Utility based on a sequence of decisions
Example: What SEQUENCE of actions should our
agent take?
Reward

1
Blocked
CELL
Reward
+1
Start
1
2
3
4
1
2
3
0.8
0.1
0.1
•
Each action costs
–
1/25
•
Agent can take action N, E, S, W
•
Faces uncertainty in every state
N
MDP Tuple: <S, A, P, R>
•
S: State of the agent on the grid (4,3)
–
Note that cell denoted by (x,y)
•
A: Actions of the agent, i.e., N, E, S, W
•
P: Transition function
–
Table P(s’ s, a), prob of s’ given action “a” in state “s”
–
E.g., P( (4,3)  (3,3), N) = 0.1
–
E.g., P((3, 2)  (3,3), N) = 0.8
–
(Robot movement, uncertainty of another agent’s actions,…)
•
R: Reward (more comments on the reward function later)
–
R( (3, 3), N) =

1/25
–
R (4,1) = +1
??Terminology
•
Before describing policies, lets go through some terminology
•
Terminology useful throughout this set of lectures
•
Policy
: Complete mapping from states to actions
MDP Basics and Terminology
An agent must make a decision or control a probabilistic
system
•
Goal is to choose a sequence of actions for optimality
•
Defined as <S, A, P, R>
•
MDP models:
–
Finite horizon: Maximize the expected reward for the
next n steps
–
Infinite horizon: Maximize the expected discounted
reward.
–
Transition model: Maximize average expected reward
per transition.
–
Goal state: maximize expected reward (minimize expected
cost) to some target state G.
???Reward Function
•
According to chapter2, directly associated with state
–
Denoted R(I)
–
Simplifies computations seen later in algorithms presented
•
Sometimes, reward is assumed associated with state,action
–
R(S, A)
–
We could also assume a mix of R(S,A) and R(S)
•
Sometimes, reward associated with state,action,destination

state
–
R(S,A,J)
–
R(S,A) =
S
R(S,A,J) * P(J  S, A)
J
Markov Assumption
•
Markov Assumption:
Transition probabilities (and rewards) from
any given state depend only on the state and not on previous
history
•
Where you end up after action depends only on current state
–
After Russian Mathematician A. A. Markov (1856

1922)
–
(He did not come up with markov decision processes
however)
–
Transitions in state (1,2) do not depend on prior state (1,1) or
(1,2)
???MDP vs POMDPs
•
Accessibility:
Agent’s percept in any given state identify the
state that it is in, e.g., state (4,3) vs (3,3)
–
Given observations, uniquely determine the state
–
Hence, we will not explicitly consider observations, only states
•
Inaccessibility:
Agent’s percepts in any given state DO NOT
identify the state that it is in, e.g., may be (4,3) or (3,3)
–
Given observations, not uniquely determine the state
–
POMDP: Partially observable MDP for inaccessible environments
•
We will focus on MDPs in this presentation.
MDP vs POMDP
Agent
World
States
Actions
MDP
Agent
World
Observations
Actions
SE
P
b
Policy
•
Policy is like a plan
–
Certainly, generated ahead of time, like a plan
•
Unlike traditional plans, it is not a sequence of actions
that an agent must execute
–
If there are failures in execution, agent can continue to
execute a policy
•
Prescribes an action for all the states
•
Maximizes expected reward, rather than just reaching
a goal state
MDP problem
•
The MDP problem consists of:
–
Finding the optimal control policy for all possible states;
–
Finding the sequence of optimal control functions for a specific initial
state
–
Finding the best control action(decision) for a specific state.
Non

Optimal Vs Optimal Policy

1
+1
Start
1
2
3
4
1
2
3
•
Choose Red policy or Yellow policy?
•
Choose Red policy or Blue policy?
Which is optimal (if any)?
•
Value iteration: One popular algorithm to determine optimal policy
Value Iteration: Key Idea
•
Iterate: update utility of state “I” using old utility of
neighbor states “J”; given actions “A”
–
U
t+1
(I) = max [R(I,A) +
S
P(JI,A)* U
t
(J)]
A J
–
P(JI,A): Probability of J if A is taken in state I
–
max F(A) returns highest F(A)
–
Immediate reward & longer term reward taken into
account
Value Iteration: Algorithm
•
Initialize: U
0
(I) = 0
•
Iterate:
U
t+1
(I) = max [ R(I,A) +
S
P(JI,A)* U
t
(J) ]
A J
–
Until close

enough (U
t+1
, U
t
)
•
At the end of iteration, calculate optimal policy:
Policy(I) = argmax [R(I,A) +
S
P(JI,A)* U
t+1
(J) ]
A J
MDP Solution Techniques
•
Value Iteration
•
Policy Iteration
•
Matrix Inversion
•
Linear Programming
•
LAO*
Sungwook Yoon
Planning vs. MDP
•
Common
–
Try to act better
•
Difference
–
Relational vs. Propositional
–
Symbolic vs. Value
–
Less Toyish vs. More Toyish
–
Solution Techniques
–
Classic vs. More General
Sungwook Yoon
Planning vs. MDP, recent trends
•
Recent Trend in Planning
–
Add diverse aspect
•
Probabilistic, Temporal, Oversubscribed .. Etc
–
Getting closer to MDP but with Relational
Representation
•
More real

world like ..
•
Recent Trend in MDP
–
More structure
•
Relational, Options, Hierarchy, Finding Harmonic Functions …
–
Getting closer to Planning!
Sungwook Yoon
Planning better than MDP?
•
They deal with different objectives
–
MDP focused more on optimality in general planning
setting
•
The size of the domain is too small
–
Planning focused on classic setting (unrealistic)
•
Well, still many interesting problems can be coded into
classic setting
•
Sokoban, FreeCell
•
Planning’s biggest advances is from fast pre

processing of the relational problems
–
Actually, they turn the problems into propositional
ones.
Sungwook Yoon
Can we solve real world problems?
•
Suppose all the Planning and MDP techniques are
well developed.
–
Temporal, Partial
Observability
, Continuous Variable,
etc.
•
Well, who will code such problems into AI agent?
•
Should consider the cost of developing such
problem definition and developing “very” general
Planner
–
Might be better with domain specific planner
•
Sokoban solver, FreeCell solver etc.
Sungwook Yoon
What is AI?
•
An AI is 99 percent Engineering and 1 percent
Intelligent Algorithms
–
Sungwook Yoon
Sungwook Yoon
Comments 0
Log in to post a comment