Rational Agent

elbowcheepAI and Robotics

Oct 15, 2013 (3 years and 10 months ago)

107 views

Artificial Intelligence


Think Human: Decision making, Problem solving, learning.

Think Rational: Logic, Reasoning systems
.

Acting Human: Do Human Task better, Turing Test, Needed skills such as: NLP, Knowledge
representation, automated reasoning,
machine learning, computer vision, robotics.

Acting Rational: needed skills for Turing test: Knowledge representation, reasoning, NLP. A Rational
agent needs to reason logically.


This book therefore concentrates on general principles of rational agents a
nd on components for
constructing them.










Rational
Agent

agent
=
architecture
(sensors, actuators)

+
agent p
rogram
.

The job of AI is to design an
agent program
that implements the agent function


the mapping from percepts to actions.


The agent
programs that we design in this book all have the same skeleton: they take the

current percept as input from the sensors and return an action to the actuators.
4
Notice the

difference

between the agent program, which takes the current percept as input, and the agent

function, which takes the entire percept history. The agent program takes just the current

percept as input because nothing more is available from the environment; if the a
gent’s actions

need to depend on the entire percept sequence, the agent will have to remember the percepts.


There are four basic kinds of agent programs that

embody the principles underlying almost all intelligent systems:



Simple reflex agents;



Model
-
based reflex agents;



Goal
-
based agents; and



Utility
-
based agents.


Simple reflex agents

condition

action rule (if
car
-
in
-
front
-
is
-
braking
then
initiate
-
braking)

Select action according to current percept (without the history)

rule

R
ULE
-
M
ATCH
(
state
,
rules
)

return rule
.A
CTION


Model
-
based reflex agents

how the world evolves independently of the agent


how the agent’s own actions affect the world

how the world works (
U
PDATE
-
S
TATE
(
state
,
action
,
percept
,
model
))

Has internal state

Chapt
ers:
4
,11,12,15,17,25


Goal
-
based agents

Search, planning

the agent needs some sort of
goal
information that describes situations that are desirable

happy /unhappy


Utility
-
based agents

utility function
(there is a trade off such as Speed VS Safety)

decision mak
ing under uncertainty

perception, representation, reasoning and learning

How happy I will be in such a state


Learning agents

Learning in intelligent agents

can be summarized as a process of modification of each component of the agent to bring the

componen
ts into closer agreement with the available feedback information, thereby improving

the overall performance of the agent.


Representation




Atomic representation

The algorithms underlying
search
and
game
-
playing
(Chapters 3

5),
HiddenMarkov models
(Chapter
15), and
Markov decision processes
(Chapter 17) all work with atomic representations

or, at least, they
treat representations
as if
they were atomic.


Factored representation

Many important areas of AI are based on factored representations, incl
uding

constraint satisfaction
algorithms (Chapter 6),
propositional logic
(Chapter 7),
planning

(Chapters 10 and 11),
Bayesian networks
(Chapters 13

16), and the
machine learning
algorithms

in Chapters 18, 20, and 21.


S
tructured representation

Structured
representations underlie
relational databases

and
first
-
order logic
(Chapters 8, 9, and 12),
first
-
order probability models
(Chapter 14),

knowledge
-
based learning
(Chapter 19) and much of
natural language understanding

(Chapters 22 and 23).












problem
-
solving by searching


problem
-
solving agent


This chapter describes one kind of goal
-
based agent called a
problem
-
solving agent


Problem
-
solving agents use
atomic
representations

Goal
-
based agents that use more advanced
factored
or
structured
representations

are usually called
planning agents
and are discussed in Chapters 7 and 10


Well
-
defined problems and solutions

A
problem
can be defined formally by five components:



States Domain



The
initial state
that the agent starts in.



A description of the possible
actions
available to the agent. Given a particular state
s
,

A
CTIONS
(
s
)
returns the set of actions that can be executed in
s
.



A description of what each action does; the formal name for this is the
transition

model
,
specified
by a function R
ESULT
(
s
,
a
) that returns the state that results from

doing action
a
in state
s
.



The
goal test
, which determines whether a given state is a goal state



A
path cost
function that assigns a numeric cost to each path.


We can evaluate an
algorithm’s performance in four ways:



Completeness
: Is the algorithm guaranteed to find a solution when there is one?



Optimality
: Does the strategy find the optimal solution
, a
n
optimal solution
has the lowest path
cost among all solutions
?



Time complexity
: How long does it take to find a solution?



Space complexity
: How much memory is needed to perform the search?


Uninformed search strategies

BFS, Uninformed cost search, DFS, depth
-
limit
-
search, Iteration deepening DFS, bidirectional search.


Informed
search strategies

Greedy BFS, A star

memory
-
bounded heuristic search
:


iterative
-
deepening A star, recursive BFS.


Learning to search better

A

metalevel learning
algorithm can learn from these experiences to avoid exploring unpromising
subtrees. The tec
hniques used for this kind of learning are described in Chapter 21
-

Reinfor
cement
learning

The goal of learning is to minimize the
total cost
of problem solving, trading off computational

expense and path cost.







Un
Informed Search


Closed list
-

a
hash table which holds the

visited nodes.


BFS




chooses the shallowest node in
frontier



DFS



always expands
the deepest node in the current frontier



Iterative deepening depth
-
first search



Uniform Cost Searc
h



Expand the node with the minimum path
cost first.



Implementation: priority queue




Bidirectional search
















Informed Search

Conditions for optimality: Admissibility and consistency



The first condition we require for optimality is that h(
n) be an admissible heuristic. An
admissible heuristic is one that never overestimates the cost to reach the goal. Because g(n)

is the actual cost to reach n along the current path, and f(n)=g(n) + h(n), we have as an

immediate consequence that f(n) never
overestimates the true cost of a solution along the

current path through n.










To any node the estimated cost of reaching the goal from node is no greater than

The cost of
reaching the goal

H(node) <= Cost(node,goal)



A second, slightly strong
er condition CONSISTENCY called consistency (or
sometimes monotonicity) is required only for applications of A


瑯⁧taph⁳ 慲ch⸹⁁.
heur楳瑩挠栨i⤠楳⁣潮獩s瑥n琠楦Ⱐ,潲

every node n and every successor n’ of n generated
by any action a, the estimated cost
of

reaching the goal from n is no greater than the
step cost of getting to n’ plus the estimated

cost of reaching the goal from n’:

h(n) ≤ c(n, a, n’) + h(n’)









To any node and to any successor generated by any action

the estimated cost of
rea
ching the goal from node is no greater than the st
ep cost of getting to successor
plus the estimates cost of reaching the goal from successor

H(node) <= Cost(node,

action,

successor) + H(successor)



It is fairly easy to show (Exercise 3.29) that every con
sistent heuristic is also admissible.

Consistency is therefore a stricter requirement than admissibility


Semi
-
Formal Proof:


any cost to node Equal to his cost

Cost(node,

goal) = Cost(node,

goal)

the

total cost of a node is Equal to the cost of his successor plus the cost from the successor to the goal

Cost(node,

goal) = Cost(node,

action,

successor) + Cost(successor,

goal)

Since
Consistency
means

H(node) <= Cost(node,

action,

successor) + H(success
or)

Consistency

to any node in the graph

H(successor)

<=
Cost(successor,

action,

descendant
) + H(
descendant
)

Inductively
:

H(node)

<=
Cost(node,

action,

successor)

+
Cost(successor,

action,

descendant
) +

… +

H(goal)

Hence:

H(
node
) <= Cost(
node
,

goal)










Greedy/Best
-
first S
earch

f(n) = h(n)


A* search: Minimizing the total estimated solution cost

f(n) = g(n) + h(n)



IDA* Algorithm



Each iteration is a depth
-
first search that keeps track of the cost evaluation f = g + h
of each node generated.



If a node is generated whose cost exceeds the threshold for that iteration, its path is
cut off.



The cost threshold is initialized to the heuristic of the initial state



The cost threshold increases in each iteration to the total cost of the lowest
-
cost node that was
pruned during the previous iteration.



The algorithm terminates when a goal state is reached whose total cost does not exceed the
current threshold.



Dupl
icate Pruning

• Do not enter the father of the current state

With or without using closed
-
list

• Using a closed
-
list, check the closed list before entering new nodes to the open list

Note: in A*,
h
has to be consistent!

Do not remove the original check

• Using a stack, check the current branch and stack status before entering new nodes

public

CNode TreeSearch(CProblem problem)


{



CQueue open_list =
new

CQueue(problem.
initState
);



while
(!open_list.IsEmpty())



{




cu
rrent = open_list.Pop();




if
(problem.GoalTest(current))

return

current;

for each Child of current state current:

if
(!open_list.IsExist(Child) && !IsInPath(current,

Child))






open_list.Insert(Child);



}



return

null
;


}

public

CNode GraphSearch(
CProblem problem)


{



CQueue open_list =
new

CQueue(problem.
initState
);



CQueue close_list =
new

CQueue();



while
(!open_list.IsEmpty())



{




current = open_list.Sort().Pop();




if
(problem.GoalTest(current))

return

current;




if
(close_list.IsExist
(current))

continue
;

for each Child of current state current:


if
(!open_list.IsExist(Child) && !close_list.IsExist(Child))





open_list.Insert(Child);


close_list.Insert(current);



}



return

null
;


}

Local Search

If the path to the goal does not
matter, we might consider a different class of algorithms,

ones that do not worry about paths at all.
Local search
algorithms operate using a single
current node
(rather than multiple paths) and generally move only to neighbors

of that node. there is no “g
oal test” and no “path cost” for this problem.



Hill
-
climbing search




Simulated Annealing



Permits moves to states with lower values



Gradually decreases the frequency of such moves and their size.



Schedule()

o

Returns the current temperature

o

Depends on start temperature and round number




Acceptor()

o

Returns the probability of choosing

bad

node.

o

Depends on h(n)
-
h(n_son) and current temperature









Genetic algorithms







Bayesian network
s

P
(
x
1
, . . . , x
n
) = i=1 to n
multiple all
P
(
x
i
|
parents
(
X
i
))


We can satisfy this condition with this methodology:

1.
Nodes:
First determine the set of variables that are required to model the domain. Now

order them,
{
X
1
, . . . ,X
n
}
. Any order will work, but the resulting network
will be more

compact if the variables are ordered such that causes precede effects.

2.
Links:
For
i
= 1 to
n
do:



Choose, from
X
1
, . . . ,X
i

1
, a minimal set of parents for
X
i
, such that Equation is satisfied.



For each parent insert a link from the parent to
X
i
.



CPTs: Write down the conditional probability table,
P
(
X
i
|
Parents
(
X
i
))
.


Properties:



The topological semantics

specifies that each variable

is conditionally independent of its non
-
descendants
, given
its parents.

ןתניהב ולש אצאצ אל אוהש ימ לכב יאנתב יולת חתלב אוה תמוצ
לכ
.ולש םירוהה



A
node is conditionally independent of all other nodes in the network, given its parents, children,

and children’s parents

that is, given its
Markov blanket
.

יתלב אוה תמוצ
תרחא תמוצ לכב יאנתב יולת
תשרב


ולש םירוהה ןתניהב
,
ולש םידליה לש םירוההו ולש םידליה





תמוצמ עיגהל ןתינ אל םא
A

תמוצל
B

זא
A

יאנתב יולת יתלב אוה
ב

B
.

ןתניהב ןיאש


לש דלי לכל עיגהל ןתינ
A


מ עיגמש הרוה לכל עיגהל ןתינ
-
A



ת"ב םיתמצה ףתושמ ןב שי םא ןכל
יאנתב

)ףרגב תרחא תולת ןיא םא(


םגו
םייולת םיתמצה זא ףתושמ אבא שי םא

ןתניהב
X



עיגהל ןתינ
ןומדק באמ

לש
X

ןומדק באל

ולש

ןכל
םייולת זא עודי ףתושמ ןב שי םא

:ףרגב תרחא תולת ןיא םא

יאנתב ת"ב םיתמצה זא עודי ףתושמ הרוה םג

בא ןתניהב םג
X

אוה באהש תמוצ ,םיתמצה
X

לש הרוה אוהש תמוצו
X

יאנתב ת"ב

דלי ןתניהב םג
X

אוה דליהש תמוצ ,םיתמצה
X

לש אצאצ אוהש תמוצו
X

יאנתב ת"ב


:הרעה
.יהשלכ תולת תמייק אל םא יאנתב ת"ב םה םיתמצ

"םייולת" םוקמב "יאנתב ת"ב אל" בותכל שי :הצלמה







Naïve Bayes Classifier






Naïve Bayes

Algorithm







Decision tree
algorithm



C
an be characterized as
searching a space of hypotheses for one that fits the training examples.



Performs a simple to complex, hill
-
climbing search through his hypothesis space, beginning with
the empty tree.



The evaluation function that guides this hill
-
climbing
se
arch is the information gain measure.



Some insi
ght of its capabilities and limitation

o

Hypothesis space of all decision trees is a complete space of finite discrete
-
valued
functions, relative the ava
ilable attributes. Every finite discrete
-
valued function can be
represented by some decision tree.

o

Maintains only a single current hypothesis as it searches through the space of decision
trees, It loses the capabilities that follow from explicitly represen
ting all consistent
hypotheses.

o

As hill
-
climbing has no backtracking and there for
Converging to locally optimal solution
s
that are not globally optimal.

o

Can be easily extended to handle noisy training data by modifying its termination
criterion to accept
hypotheses that imperfectly fit the training data.













Games
-

adversarial search

A game can be formally defined as a kind of search problem with the

following elements:



S
0
: The
initial state
, which specifies how the game is set up at the start.



P
LAYER
(
s
)
: Defines which player has the move in a state.



A
CTIONS
(
s
)
: Returns the set of legal moves in a state.



R
ESULT
(
s
,
a
): The
transition model
, which defines the result of a move.



T
ERMINAL
-
T
EST
(
s
)
: A
terminal test
, which is true when the game is over an
d false

otherwise. States where the game has ended are called
terminal states
.



U
TILITY
(
s, p
)
: A
utility function
(also called an objective function or payoff function),


defines the final numeric value for a game that ends in terminal state
s
for a player
p
.












Optimal decisions in multiplayer games

First, we need to replace the single value for each node with a
vector
of values. For example, in a three
-
player game with players
A
,
B
, and
C
, a vector
(
v
A
, v
B
, v
C
)
is associated with each node. For terminal
states, this vector gives the utility of the state from each player’s viewpoint.

The simplest way to implement this is to have the U
TILITY
function return a vector of utilities.



Now we have to consider nonterminal states. Consider the node marked
X
in the game

tree shown in Figure 5.4. In that state, player
C
chooses what to do. The two choices lead

to terminal states with utility vectors
(
v
A
=1
, v
B
=2
, v
C
=6
)
and
(
v
A
=4
, v
B
=2
, v
C
=3
)
.

Since 6 is bigger than 3,
C
should choose the first move. This means that if state
X
is reached,

subsequent play will lead to a terminal state with utilities
(
v
A
=1
, v
B
=2
, v
C
=6
)
. Hence,

the backed
-
up value of
X
is this vector. The backed
-
up value
of a node
n
is always the utility

vector of the successor state with the highest value for the player choosing at
n
.
















Alpha

Beta Pruning

M
INIMAX
(
root
) = max(min(3
,
12
,
8)
,
min(2
, x, y
)
,
min(14
,
5
,
2))

= max(3
,
min(2
, x, y
)
,
2)

= max(3
, z,
2)
where
z
= min(2
, x, y
)


2

= 3


α

=
the value of the best (i.e., highest
-
value) choice we have found so far at any choice point

along the path for
MAX
.

β

=
the value of the best (i.e., lowest
-
value) choice we have found so far at any choice point

along

the path for
MIN
.

Alpha

beta search updates the values of
α

and
β

as it goes along and prunes the remaining

branches at a node (i.e., terminates the recursive call) as soon as the value of the current

node is known to be worse than the current
α

or
β

value for
MAX
or
MIN
, respectively.


alpha

beta needs to examine only
O
(
b^
m/
2
)
nodes to pick the best move, instead of
O
(
b^
m
)
for minimax.
This means that the effective

branching factor becomes

b
instead of
b

alpha

beta

can solve a tree roughly twice as deep as minimax in the same amount of time.

If successors are examined in random order rather than best
-
first, the total number of nodes examined will
be roughly
O
(
b
^
3
m/
4
)
for moderate
b
.





In many games, repeated states occur frequently because of
transpositions

different permutations of the
move sequence that end up in the same position.

It is worthwhile to store the evaluation of the resulting position in a hash table the first time it is
encountered so that we don’t have to recompute it on subsequent occurrences. The hash table of
previously seen positions is traditionally called a
transposition table
; it is essentially identical to the
explored
list in G
RAPH
-
S
EARCH.


alpha

beta still has
to search all the way to terminal states for at least a portion of the search space. This
depth is usually not practical, because moves must be made in a reasonable amount of time.


Cutting off search

replace

the utility function by a heuristic evaluation function E
VAL
, which estimates the position’s utility,
and replace the terminal test by a
cutoff test
that decides when to apply E
VAL
.


We replace the two lines in
Alpha

beta search

that mention T
ERMINAL
-
T
EST
with the following line:

if
C
UTOFF
-
T
EST
(
state
,
depth
)
then return
E
VAL
(
state
)


A more robust approach is to apply iterative deepening. When time runs out, the program returns the
move selected by the deepest completed search. As a bonus, iterative
deepening also helps with move
ordering.


Evaluation functions

First, the evaluation function should order the
terminal
states in the same way as the true utility function:
states that are wins must evaluate better than draws, which in turn must be better
than losses. Otherwise,
an agent using the evaluation function might err even if it can see ahead all the way to the end of the
game. Second, the computation must not take too long! (The whole point is to search faster.) Third, for
nonterminal states, the
evaluation function should be strongly correlated with the actual chances of
winning.

Forward pruning

It is also possible to do
forward pruning
, meaning that some moves at a given node are pruned
immediately without further consideration.

One approach to forward pruning is
beam search
: on each ply, consider only a “beam” of the
n

best moves (according to the evaluation function) rather than considering all possible moves.

Unfortunately, this approach is rather dangerous because there is no
guarantee that the best

move will not be pruned away. (P
ROB
C
UT)

S
tochastic games

In real life, many unpredictable external events can put us into unforeseen situations. Many

games

mirror this unpredictability by including a random element, such as the throwing of

dice. We call these
stochastic games
.


A
stochastic

game tree must include
chance nodes
in addition to
MAX
and
MIN
nodes.

Chance nodes are shown as circles.

The next step is to understand how to make correct decisions. Obviously, we still want to pick the move
that leads to the best position. However, positions do not have definite

minimax values. Instead, we can
only calculate the
expected value
of a positio
n: the average over all possible outcomes of the chance
nodes. This leads us to generalize the
minimax value
for deterministic games to an
expecti
-
minimax
value
for games with chance nodes. Terminal nodes and
MAX
and
MIN
nodes (for which the dice roll is
k
nown) work exactly the same way as before. For chance nodes we compute the expected value, which is
the sum of the value over all outcomes, weighted by the probability of each chance action:

E
XPECTIMINIMAX
(
s
) =

U
TILITY
(
s
)




if T
ERMINAL
-
T
EST
(
s
)

max
a
E
XPECTIMINIMAX
(
R
ESULT
(
s, a
))

if P
LAYER
(
s
)=
MAX

min
a
E
XPECTIMINIMAX
(
R
ESULT
(
s, a
))

if P
LAYER
(
s
)=
MIN

sum all for each r
P
(
r
)
E
XPECTIMINIMAX
(
R
ESULT
(
s, r
))
if P
LAYER
(
s
)=
CHANCE

where
r
represents a possible dice roll (or other chance event) and R
ESULT
(
s, r
)
is

the same

state as
s
, with the additional fact that the result of the dice roll is
r
.

it will take
O
(
b^
m*
n^
m
)
, where
n
is the number of distinct rolls.

Evaluation functions for games of chance

As with minimax, the obvious approximation to make with
expectiminimax is to cut the

search off at
some point and apply an evaluation function to each leaf. One might think that

evaluation functions for
games such as backgammon should be just like evaluation functions

for chess

they just need to give
higher sco
res to better positions. But in fact, the presence of

chance nodes means that one has to be more
careful about what the evaluation values mean.

Hence, the program behaves totally differently if we make
a change in the scale of some evaluation

values! It tu
rns out that to avoid this sensitivity, the evaluation
function must be a positive

linear transformation of the probability of winning from a position (or, more
generally, of the

expected utility of the position). This is an important and general property
of situations
in

which uncertainty is involved.

Perceptron


Finds weights given test set as Boolean
function (linear sepa
rab
le
)

http://page.mi.fu
-
berlin.de/rojas/neural/chapter/K3.pdf

page 9 from 22

Let
w
1
and
w
2
be the weights of a perceptron with two inputs, and

its

threshold. If the perceptron computes the XOR function the following four

inequalities must be fulfilled:


Since

is positive, according to the first inequality,
w
1
and
w
2
are positive

too, according to the second and third inequalities. Therefore the inequality

w
1
+
w
2
<

cannot be true. This contradiction implies that no perceptron

capable of computing the XOR function exists. An analogous proof holds for

the func
tion.



BACK
-
PROP
-
LEARNING Sample

I
1

=
-
1, I
2

= 0.5

Calculate Outputs:



H
1
= g(
-
1*1+0.5*(
-
0.7724))=g(
-
1.3862)= 1/(1+℮
1.3862
) = 0.2

H
2
= g(
-
1*1+0.5*(
-
0.1972))=g(
-
1.0986)= 1/(1+℮
1.0986
) = 0.25

O
1
= g(0.2*(
-
5.466)+0.25*(
-
0.0216))=g(
-
1.0986)= 1/(1+℮
1.0986
)

= 0.25

O
2
= g(0.2*(5.4655)+0.25*(
-
1.6))=g(0.6931)= 1/(1+℮
-
0.6931
) = 0.6667


Calculate ΔO
1
, ΔO
2


ΔO
1

= g(
-
1.0986)*(1
-

g(
-
1.0986))*(0.3
-
0.25)=0.25*(1
-
0.25)*0.05=0.0094

ΔO
2

= g(0.6931)*(1
-

g(0.6931))*(0.6667
-
0.6667)=0


Calculate ΔH
1
, ΔH
2


ΔH
1

= g(
-
1.3862)*(1
-

g(
-
1.3862))*(
-
5.466*ΔO
1
+5.4655*ΔO
2
)=

=0.2*(1
-
0.2)*(
-
0.0514)=
-
0.0082

ΔH
2

= g(
-
1.0986)*(1
-

g(
-
1.0986))*(
-
0.0216 *ΔO
1
+(
-
1.6)*ΔO
2
)=

=0.25*(1
-
0.25)*(
-
0.0002)=
-
0.0000375 = 0

Update weights for all the output layer (W=W+0.3*Input*Δ)


WI
1
H
1
= WI
1
H
1
+ 0.3*

I
1
*ΔH
1
= 1+0.3*(
-
1)*(
-
0.0082) = 1.0025

WI
1
H
2
= WI
1
H
2
+ 0.3*

I
1
*ΔH
2
= 1+0.3*(
-
1)*0 = 1

WI
2
H
1
= WI
2
H
1
+ 0.3*

I
2
*ΔH
1
=
-
0.7724+0.3*(0.5)*(
-
0.0082) =
-
0.7736

WI
2
H
2
= WI
2
H
2
+ 0.3*

I
2
*ΔH
2
=
-
0.1972+0.3*(0.5)*0 =
-
0.1972

WH
1
O
1
= WH
1
O
1
+ 0.3*

H
1
*ΔO
1
=
-
5.
466+0.3*0.2*0.0094 =
-
5.4654

WH
1
O
2
= WH
1
O
2
+ 0.3*

H
1
*ΔO
2
= 5.4655+0.3*0.2*0 = 5.4655

WH
2
O
1
= WH
2
O
1
+ 0.3*

H
2
*ΔO
1
=
-
0.0216+0.3*0.25*0.0094 =
-
0.0209

WH
2
O
2
= WH
2
O
2
+ 0.3*

H
2
*ΔO
2
=
-
1.6+0.3*0.25*0 =
-
1.6























Planning


Strips




STRIPS is

the simplest and the second

oldest
representation of operators in AI.



When that the initial state is represented

by a database of
positive facts, STRIPS

can be viewed as being simply a
way of

specifying an update to this database


Representing States &
Goals


STRIPS:


describes states & operators in a restricted language


States:


a conjunction of

facts

(ground literals that do not contain variable
symbols)


Goals:


a conjunction of positive literals


STRIPS: Goal
-
Stack Planning

Given a goal
stack:

1.

Initialize: Push the goal to the stack.

2.

If the top of the stack is satisfied in the current

state, pop.

3.

Otherwise, if the top is a conjunction, push the

individual conjuncts to
the stack.

4.

Otherwise, check if the add
-
list of any operator

can be unifi
ed with the
top, push
/
replace

the operator

and
push
its preconditions to the stack.

5.

If the top is an action
/
operator
, pop and execute it:

state = state + t.add
-
list
-
t.delete
-
list

plan = [plan | t]

6.

Loop 2
-
5 till stack is empty.