Artificial Intelligence
Think Human: Decision making, Problem solving, learning.
Think Rational: Logic, Reasoning systems
.
Acting Human: Do Human Task better, Turing Test, Needed skills such as: NLP, Knowledge
representation, automated reasoning,
machine learning, computer vision, robotics.
Acting Rational: needed skills for Turing test: Knowledge representation, reasoning, NLP. A Rational
agent needs to reason logically.
This book therefore concentrates on general principles of rational agents a
nd on components for
constructing them.
Rational
Agent
agent
=
architecture
(sensors, actuators)
+
agent p
rogram
.
The job of AI is to design an
agent program
that implements the agent function
—
the mapping from percepts to actions.
The agent
programs that we design in this book all have the same skeleton: they take the
current percept as input from the sensors and return an action to the actuators.
4
Notice the
difference
between the agent program, which takes the current percept as input, and the agent
function, which takes the entire percept history. The agent program takes just the current
percept as input because nothing more is available from the environment; if the a
gent’s actions
need to depend on the entire percept sequence, the agent will have to remember the percepts.
There are four basic kinds of agent programs that
embody the principles underlying almost all intelligent systems:
•
Simple reflex agents;
•
Model

based reflex agents;
•
Goal

based agents; and
•
Utility

based agents.
Simple reflex agents
condition
–
action rule (if
car

in

front

is

braking
then
initiate

braking)
Select action according to current percept (without the history)
rule
←
R
ULE

M
ATCH
(
state
,
rules
)
return rule
.A
CTION
Model

based reflex agents
how the world evolves independently of the agent
how the agent’s own actions affect the world
how the world works (
U
PDATE

S
TATE
(
state
,
action
,
percept
,
model
))
Has internal state
Chapt
ers:
4
,11,12,15,17,25
Goal

based agents
Search, planning
the agent needs some sort of
goal
information that describes situations that are desirable
happy /unhappy
Utility

based agents
utility function
(there is a trade off such as Speed VS Safety)
decision mak
ing under uncertainty
perception, representation, reasoning and learning
How happy I will be in such a state
Learning agents
Learning in intelligent agents
can be summarized as a process of modification of each component of the agent to bring the
componen
ts into closer agreement with the available feedback information, thereby improving
the overall performance of the agent.
Representation
Atomic representation
The algorithms underlying
search
and
game

playing
(Chapters 3
–
5),
HiddenMarkov models
(Chapter
15), and
Markov decision processes
(Chapter 17) all work with atomic representations
—
or, at least, they
treat representations
as if
they were atomic.
Factored representation
Many important areas of AI are based on factored representations, incl
uding
constraint satisfaction
algorithms (Chapter 6),
propositional logic
(Chapter 7),
planning
(Chapters 10 and 11),
Bayesian networks
(Chapters 13
–
16), and the
machine learning
algorithms
in Chapters 18, 20, and 21.
S
tructured representation
Structured
representations underlie
relational databases
and
first

order logic
(Chapters 8, 9, and 12),
first

order probability models
(Chapter 14),
knowledge

based learning
(Chapter 19) and much of
natural language understanding
(Chapters 22 and 23).
problem

solving by searching
problem

solving agent
This chapter describes one kind of goal

based agent called a
problem

solving agent
Problem

solving agents use
atomic
representations
Goal

based agents that use more advanced
factored
or
structured
representations
are usually called
planning agents
and are discussed in Chapters 7 and 10
Well

defined problems and solutions
A
problem
can be defined formally by five components:
States Domain
The
initial state
that the agent starts in.
A description of the possible
actions
available to the agent. Given a particular state
s
,
A
CTIONS
(
s
)
returns the set of actions that can be executed in
s
.
A description of what each action does; the formal name for this is the
transition
model
,
specified
by a function R
ESULT
(
s
,
a
) that returns the state that results from
doing action
a
in state
s
.
The
goal test
, which determines whether a given state is a goal state
A
path cost
function that assigns a numeric cost to each path.
We can evaluate an
algorithm’s performance in four ways:
Completeness
: Is the algorithm guaranteed to find a solution when there is one?
Optimality
: Does the strategy find the optimal solution
, a
n
optimal solution
has the lowest path
cost among all solutions
?
Time complexity
: How long does it take to find a solution?
Space complexity
: How much memory is needed to perform the search?
Uninformed search strategies
BFS, Uninformed cost search, DFS, depth

limit

search, Iteration deepening DFS, bidirectional search.
Informed
search strategies
Greedy BFS, A star
memory

bounded heuristic search
:
iterative

deepening A star, recursive BFS.
Learning to search better
A
metalevel learning
algorithm can learn from these experiences to avoid exploring unpromising
subtrees. The tec
hniques used for this kind of learning are described in Chapter 21

Reinfor
cement
learning
The goal of learning is to minimize the
total cost
of problem solving, trading off computational
expense and path cost.
Un
Informed Search
Closed list

a
hash table which holds the
visited nodes.
BFS
chooses the shallowest node in
frontier
DFS
always expands
the deepest node in the current frontier
Iterative deepening depth

first search
Uniform Cost Searc
h
Expand the node with the minimum path
cost first.
Implementation: priority queue
Bidirectional search
Informed Search
Conditions for optimality: Admissibility and consistency
The first condition we require for optimality is that h(
n) be an admissible heuristic. An
admissible heuristic is one that never overestimates the cost to reach the goal. Because g(n)
is the actual cost to reach n along the current path, and f(n)=g(n) + h(n), we have as an
immediate consequence that f(n) never
overestimates the true cost of a solution along the
current path through n.
To any node the estimated cost of reaching the goal from node is no greater than
The cost of
reaching the goal
H(node) <= Cost(node,goal)
A second, slightly strong
er condition CONSISTENCY called consistency (or
sometimes monotonicity) is required only for applications of A
∗
瑯taph 慲ch⸹⁁.
heur楳瑩挠栨i⤠楳潮獩s瑥n琠楦Ⱐ,潲
every node n and every successor n’ of n generated
by any action a, the estimated cost
of
reaching the goal from n is no greater than the
step cost of getting to n’ plus the estimated
cost of reaching the goal from n’:
h(n) ≤ c(n, a, n’) + h(n’)
To any node and to any successor generated by any action
the estimated cost of
rea
ching the goal from node is no greater than the st
ep cost of getting to successor
plus the estimates cost of reaching the goal from successor
H(node) <= Cost(node,
action,
successor) + H(successor)
It is fairly easy to show (Exercise 3.29) that every con
sistent heuristic is also admissible.
Consistency is therefore a stricter requirement than admissibility
Semi

Formal Proof:
any cost to node Equal to his cost
Cost(node,
goal) = Cost(node,
goal)
the
total cost of a node is Equal to the cost of his successor plus the cost from the successor to the goal
Cost(node,
goal) = Cost(node,
action,
successor) + Cost(successor,
goal)
Since
Consistency
means
H(node) <= Cost(node,
action,
successor) + H(success
or)
Consistency
to any node in the graph
H(successor)
<=
Cost(successor,
action,
descendant
) + H(
descendant
)
Inductively
:
H(node)
<=
Cost(node,
action,
successor)
+
Cost(successor,
action,
descendant
) +
… +
H(goal)
Hence:
H(
node
) <= Cost(
node
,
goal)
Greedy/Best

first S
earch
f(n) = h(n)
A* search: Minimizing the total estimated solution cost
f(n) = g(n) + h(n)
IDA* Algorithm
Each iteration is a depth

first search that keeps track of the cost evaluation f = g + h
of each node generated.
If a node is generated whose cost exceeds the threshold for that iteration, its path is
cut off.
The cost threshold is initialized to the heuristic of the initial state
The cost threshold increases in each iteration to the total cost of the lowest

cost node that was
pruned during the previous iteration.
The algorithm terminates when a goal state is reached whose total cost does not exceed the
current threshold.
Dupl
icate Pruning
• Do not enter the father of the current state
With or without using closed

list
• Using a closed

list, check the closed list before entering new nodes to the open list
Note: in A*,
h
has to be consistent!
Do not remove the original check
• Using a stack, check the current branch and stack status before entering new nodes
public
CNode TreeSearch(CProblem problem)
{
CQueue open_list =
new
CQueue(problem.
initState
);
while
(!open_list.IsEmpty())
{
cu
rrent = open_list.Pop();
if
(problem.GoalTest(current))
return
current;
for each Child of current state current:
if
(!open_list.IsExist(Child) && !IsInPath(current,
Child))
open_list.Insert(Child);
}
return
null
;
}
public
CNode GraphSearch(
CProblem problem)
{
CQueue open_list =
new
CQueue(problem.
initState
);
CQueue close_list =
new
CQueue();
while
(!open_list.IsEmpty())
{
current = open_list.Sort().Pop();
if
(problem.GoalTest(current))
return
current;
if
(close_list.IsExist
(current))
continue
;
for each Child of current state current:
if
(!open_list.IsExist(Child) && !close_list.IsExist(Child))
open_list.Insert(Child);
close_list.Insert(current);
}
return
null
;
}
Local Search
If the path to the goal does not
matter, we might consider a different class of algorithms,
ones that do not worry about paths at all.
Local search
algorithms operate using a single
current node
(rather than multiple paths) and generally move only to neighbors
of that node. there is no “g
oal test” and no “path cost” for this problem.
Hill

climbing search
Simulated Annealing
Permits moves to states with lower values
Gradually decreases the frequency of such moves and their size.
Schedule()
o
Returns the current temperature
o
Depends on start temperature and round number
Acceptor()
o
Returns the probability of choosing
“
bad
”
node.
o
Depends on h(n)

h(n_son) and current temperature
Genetic algorithms
Bayesian network
s
P
(
x
1
, . . . , x
n
) = i=1 to n
multiple all
P
(
x
i

parents
(
X
i
))
We can satisfy this condition with this methodology:
1.
Nodes:
First determine the set of variables that are required to model the domain. Now
order them,
{
X
1
, . . . ,X
n
}
. Any order will work, but the resulting network
will be more
compact if the variables are ordered such that causes precede effects.
2.
Links:
For
i
= 1 to
n
do:
•
Choose, from
X
1
, . . . ,X
i
−
1
, a minimal set of parents for
X
i
, such that Equation is satisfied.
•
For each parent insert a link from the parent to
X
i
.
•
CPTs: Write down the conditional probability table,
P
(
X
i

Parents
(
X
i
))
.
Properties:
The topological semantics
specifies that each variable
is conditionally independent of its non

descendants
, given
its parents.
ןתניהב ולש אצאצ אל אוהש ימ לכב יאנתב יולת חתלב אוה תמוצ
לכ
.ולש םירוהה
A
node is conditionally independent of all other nodes in the network, given its parents, children,
and children’s parents
—
that is, given its
Markov blanket
.
יתלב אוה תמוצ
תרחא תמוצ לכב יאנתב יולת
תשרב
ולש םירוהה ןתניהב
,
ולש םידליה לש םירוההו ולש םידליה
תמוצמ עיגהל ןתינ אל םא
A
תמוצל
B
זא
A
יאנתב יולת יתלב אוה
ב
B
.
ןתניהב ןיאש
לש דלי לכל עיגהל ןתינ
A
מ עיגמש הרוה לכל עיגהל ןתינ

A
ת"ב םיתמצה ףתושמ ןב שי םא ןכל
יאנתב
)ףרגב תרחא תולת ןיא םא(
םגו
םייולת םיתמצה זא ףתושמ אבא שי םא
ןתניהב
X
עיגהל ןתינ
ןומדק באמ
לש
X
ןומדק באל
ולש
ןכל
םייולת זא עודי ףתושמ ןב שי םא
:ףרגב תרחא תולת ןיא םא
יאנתב ת"ב םיתמצה זא עודי ףתושמ הרוה םג
בא ןתניהב םג
X
אוה באהש תמוצ ,םיתמצה
X
לש הרוה אוהש תמוצו
X
יאנתב ת"ב
דלי ןתניהב םג
X
אוה דליהש תמוצ ,םיתמצה
X
לש אצאצ אוהש תמוצו
X
יאנתב ת"ב
:הרעה
.יהשלכ תולת תמייק אל םא יאנתב ת"ב םה םיתמצ
"םייולת" םוקמב "יאנתב ת"ב אל" בותכל שי :הצלמה
Naïve Bayes Classifier
Naïve Bayes
Algorithm
Decision tree
algorithm
C
an be characterized as
searching a space of hypotheses for one that fits the training examples.
Performs a simple to complex, hill

climbing search through his hypothesis space, beginning with
the empty tree.
The evaluation function that guides this hill

climbing
se
arch is the information gain measure.
Some insi
ght of its capabilities and limitation
o
Hypothesis space of all decision trees is a complete space of finite discrete

valued
functions, relative the ava
ilable attributes. Every finite discrete

valued function can be
represented by some decision tree.
o
Maintains only a single current hypothesis as it searches through the space of decision
trees, It loses the capabilities that follow from explicitly represen
ting all consistent
hypotheses.
o
As hill

climbing has no backtracking and there for
Converging to locally optimal solution
s
that are not globally optimal.
o
Can be easily extended to handle noisy training data by modifying its termination
criterion to accept
hypotheses that imperfectly fit the training data.
Games

adversarial search
A game can be formally defined as a kind of search problem with the
following elements:
S
0
: The
initial state
, which specifies how the game is set up at the start.
P
LAYER
(
s
)
: Defines which player has the move in a state.
A
CTIONS
(
s
)
: Returns the set of legal moves in a state.
R
ESULT
(
s
,
a
): The
transition model
, which defines the result of a move.
T
ERMINAL

T
EST
(
s
)
: A
terminal test
, which is true when the game is over an
d false
otherwise. States where the game has ended are called
terminal states
.
U
TILITY
(
s, p
)
: A
utility function
(also called an objective function or payoff function),
defines the final numeric value for a game that ends in terminal state
s
for a player
p
.
Optimal decisions in multiplayer games
First, we need to replace the single value for each node with a
vector
of values. For example, in a three

player game with players
A
,
B
, and
C
, a vector
(
v
A
, v
B
, v
C
)
is associated with each node. For terminal
states, this vector gives the utility of the state from each player’s viewpoint.
The simplest way to implement this is to have the U
TILITY
function return a vector of utilities.
Now we have to consider nonterminal states. Consider the node marked
X
in the game
tree shown in Figure 5.4. In that state, player
C
chooses what to do. The two choices lead
to terminal states with utility vectors
(
v
A
=1
, v
B
=2
, v
C
=6
)
and
(
v
A
=4
, v
B
=2
, v
C
=3
)
.
Since 6 is bigger than 3,
C
should choose the first move. This means that if state
X
is reached,
subsequent play will lead to a terminal state with utilities
(
v
A
=1
, v
B
=2
, v
C
=6
)
. Hence,
the backed

up value of
X
is this vector. The backed

up value
of a node
n
is always the utility
vector of the successor state with the highest value for the player choosing at
n
.
Alpha
–
Beta Pruning
M
INIMAX
(
root
) = max(min(3
,
12
,
8)
,
min(2
, x, y
)
,
min(14
,
5
,
2))
= max(3
,
min(2
, x, y
)
,
2)
= max(3
, z,
2)
where
z
= min(2
, x, y
)
≤
2
= 3
α
=
the value of the best (i.e., highest

value) choice we have found so far at any choice point
along the path for
MAX
.
β
=
the value of the best (i.e., lowest

value) choice we have found so far at any choice point
along
the path for
MIN
.
Alpha
–
beta search updates the values of
α
and
β
as it goes along and prunes the remaining
branches at a node (i.e., terminates the recursive call) as soon as the value of the current
node is known to be worse than the current
α
or
β
value for
MAX
or
MIN
, respectively.
alpha
–
beta needs to examine only
O
(
b^
m/
2
)
nodes to pick the best move, instead of
O
(
b^
m
)
for minimax.
This means that the effective
branching factor becomes
√
b
instead of
b
alpha
–
beta
can solve a tree roughly twice as deep as minimax in the same amount of time.
If successors are examined in random order rather than best

first, the total number of nodes examined will
be roughly
O
(
b
^
3
m/
4
)
for moderate
b
.
In many games, repeated states occur frequently because of
transpositions
—
different permutations of the
move sequence that end up in the same position.
It is worthwhile to store the evaluation of the resulting position in a hash table the first time it is
encountered so that we don’t have to recompute it on subsequent occurrences. The hash table of
previously seen positions is traditionally called a
transposition table
; it is essentially identical to the
explored
list in G
RAPH

S
EARCH.
alpha
–
beta still has
to search all the way to terminal states for at least a portion of the search space. This
depth is usually not practical, because moves must be made in a reasonable amount of time.
Cutting off search
replace
the utility function by a heuristic evaluation function E
VAL
, which estimates the position’s utility,
and replace the terminal test by a
cutoff test
that decides when to apply E
VAL
.
We replace the two lines in
Alpha
–
beta search
that mention T
ERMINAL

T
EST
with the following line:
if
C
UTOFF

T
EST
(
state
,
depth
)
then return
E
VAL
(
state
)
A more robust approach is to apply iterative deepening. When time runs out, the program returns the
move selected by the deepest completed search. As a bonus, iterative
deepening also helps with move
ordering.
Evaluation functions
First, the evaluation function should order the
terminal
states in the same way as the true utility function:
states that are wins must evaluate better than draws, which in turn must be better
than losses. Otherwise,
an agent using the evaluation function might err even if it can see ahead all the way to the end of the
game. Second, the computation must not take too long! (The whole point is to search faster.) Third, for
nonterminal states, the
evaluation function should be strongly correlated with the actual chances of
winning.
Forward pruning
It is also possible to do
forward pruning
, meaning that some moves at a given node are pruned
immediately without further consideration.
One approach to forward pruning is
beam search
: on each ply, consider only a “beam” of the
n
best moves (according to the evaluation function) rather than considering all possible moves.
Unfortunately, this approach is rather dangerous because there is no
guarantee that the best
move will not be pruned away. (P
ROB
C
UT)
S
tochastic games
In real life, many unpredictable external events can put us into unforeseen situations. Many
games
mirror this unpredictability by including a random element, such as the throwing of
dice. We call these
stochastic games
.
A
stochastic
game tree must include
chance nodes
in addition to
MAX
and
MIN
nodes.
Chance nodes are shown as circles.
The next step is to understand how to make correct decisions. Obviously, we still want to pick the move
that leads to the best position. However, positions do not have definite
minimax values. Instead, we can
only calculate the
expected value
of a positio
n: the average over all possible outcomes of the chance
nodes. This leads us to generalize the
minimax value
for deterministic games to an
expecti

minimax
value
for games with chance nodes. Terminal nodes and
MAX
and
MIN
nodes (for which the dice roll is
k
nown) work exactly the same way as before. For chance nodes we compute the expected value, which is
the sum of the value over all outcomes, weighted by the probability of each chance action:
E
XPECTIMINIMAX
(
s
) =
U
TILITY
(
s
)
if T
ERMINAL

T
EST
(
s
)
max
a
E
XPECTIMINIMAX
(
R
ESULT
(
s, a
))
if P
LAYER
(
s
)=
MAX
min
a
E
XPECTIMINIMAX
(
R
ESULT
(
s, a
))
if P
LAYER
(
s
)=
MIN
sum all for each r
P
(
r
)
E
XPECTIMINIMAX
(
R
ESULT
(
s, r
))
if P
LAYER
(
s
)=
CHANCE
where
r
represents a possible dice roll (or other chance event) and R
ESULT
(
s, r
)
is
the same
state as
s
, with the additional fact that the result of the dice roll is
r
.
it will take
O
(
b^
m*
n^
m
)
, where
n
is the number of distinct rolls.
Evaluation functions for games of chance
As with minimax, the obvious approximation to make with
expectiminimax is to cut the
search off at
some point and apply an evaluation function to each leaf. One might think that
evaluation functions for
games such as backgammon should be just like evaluation functions
for chess
—
they just need to give
higher sco
res to better positions. But in fact, the presence of
chance nodes means that one has to be more
careful about what the evaluation values mean.
Hence, the program behaves totally differently if we make
a change in the scale of some evaluation
values! It tu
rns out that to avoid this sensitivity, the evaluation
function must be a positive
linear transformation of the probability of winning from a position (or, more
generally, of the
expected utility of the position). This is an important and general property
of situations
in
which uncertainty is involved.
Perceptron
Finds weights given test set as Boolean
function (linear sepa
rab
le
)
http://page.mi.fu

berlin.de/rojas/neural/chapter/K3.pdf
page 9 from 22
Let
w
1
and
w
2
be the weights of a perceptron with two inputs, and
its
threshold. If the perceptron computes the XOR function the following four
inequalities must be fulfilled:
Since
is positive, according to the first inequality,
w
1
and
w
2
are positive
too, according to the second and third inequalities. Therefore the inequality
w
1
+
w
2
<
cannot be true. This contradiction implies that no perceptron
capable of computing the XOR function exists. An analogous proof holds for
the func
tion.
BACK

PROP

LEARNING Sample
I
1
=

1, I
2
= 0.5
Calculate Outputs:
H
1
= g(

1*1+0.5*(

0.7724))=g(

1.3862)= 1/(1+℮
1.3862
) = 0.2
H
2
= g(

1*1+0.5*(

0.1972))=g(

1.0986)= 1/(1+℮
1.0986
) = 0.25
O
1
= g(0.2*(

5.466)+0.25*(

0.0216))=g(

1.0986)= 1/(1+℮
1.0986
)
= 0.25
O
2
= g(0.2*(5.4655)+0.25*(

1.6))=g(0.6931)= 1/(1+℮

0.6931
) = 0.6667
Calculate ΔO
1
, ΔO
2
ΔO
1
= g(

1.0986)*(1

g(

1.0986))*(0.3

0.25)=0.25*(1

0.25)*0.05=0.0094
ΔO
2
= g(0.6931)*(1

g(0.6931))*(0.6667

0.6667)=0
Calculate ΔH
1
, ΔH
2
ΔH
1
= g(

1.3862)*(1

g(

1.3862))*(

5.466*ΔO
1
+5.4655*ΔO
2
)=
=0.2*(1

0.2)*(

0.0514)=

0.0082
ΔH
2
= g(

1.0986)*(1

g(

1.0986))*(

0.0216 *ΔO
1
+(

1.6)*ΔO
2
)=
=0.25*(1

0.25)*(

0.0002)=

0.0000375 = 0
Update weights for all the output layer (W=W+0.3*Input*Δ)
WI
1
H
1
= WI
1
H
1
+ 0.3*
I
1
*ΔH
1
= 1+0.3*(

1)*(

0.0082) = 1.0025
WI
1
H
2
= WI
1
H
2
+ 0.3*
I
1
*ΔH
2
= 1+0.3*(

1)*0 = 1
WI
2
H
1
= WI
2
H
1
+ 0.3*
I
2
*ΔH
1
=

0.7724+0.3*(0.5)*(

0.0082) =

0.7736
WI
2
H
2
= WI
2
H
2
+ 0.3*
I
2
*ΔH
2
=

0.1972+0.3*(0.5)*0 =

0.1972
WH
1
O
1
= WH
1
O
1
+ 0.3*
H
1
*ΔO
1
=

5.
466+0.3*0.2*0.0094 =

5.4654
WH
1
O
2
= WH
1
O
2
+ 0.3*
H
1
*ΔO
2
= 5.4655+0.3*0.2*0 = 5.4655
WH
2
O
1
= WH
2
O
1
+ 0.3*
H
2
*ΔO
1
=

0.0216+0.3*0.25*0.0094 =

0.0209
WH
2
O
2
= WH
2
O
2
+ 0.3*
H
2
*ΔO
2
=

1.6+0.3*0.25*0 =

1.6
Planning
–
Strips
STRIPS is
the simplest and the second
oldest
representation of operators in AI.
When that the initial state is represented
by a database of
positive facts, STRIPS
can be viewed as being simply a
way of
specifying an update to this database
Representing States &
Goals
STRIPS:
describes states & operators in a restricted language
States:
a conjunction of
“
facts
”
(ground literals that do not contain variable
symbols)
Goals:
a conjunction of positive literals
STRIPS: Goal

Stack Planning
Given a goal
stack:
1.
Initialize: Push the goal to the stack.
2.
If the top of the stack is satisfied in the current
state, pop.
3.
Otherwise, if the top is a conjunction, push the
individual conjuncts to
the stack.
4.
Otherwise, check if the add

list of any operator
can be unifi
ed with the
top, push
/
replace
the operator
and
push
its preconditions to the stack.
5.
If the top is an action
/
operator
, pop and execute it:
state = state + t.add

list

t.delete

list
plan = [plan  t]
6.
Loop 2

5 till stack is empty.
Comments 0
Log in to post a comment