Back

Propagation Agent Network
Chung

Chen Chen
Abstract
Back propagation neural

network successfully resolves many problems in computer science.
However, it is not such a good model for symbolic problems such as natural language
processing (NLP), because
the neuron cannot represent symbol intuitively. In this paper, we
proposed an agent

network based on the back

propagation algorithm to resolve not only
problems of weight adjustment, but also problem for symbolic computation.
Keywords : Agent, Neural Net
work, Back

propagation, Natural Language
1.
Introduction
Agent becomes a fancy idea in computer science recently without formal definition.
Informally speaking, an agent is an object that perceiving its environment through sensors and
acting upon that enviro
nment through effector [1]. Several question arise from the informal
definition, including the terms
“
environment
”
,
“
sensor
”
, and
“
effector
”
. In this paper, we
propose a learning model for a network of agents, and define the
“
envionment
”
,
“
sensor
”
and
“
eff
ector
”
in this model. The learning model is similar to the back

propagation neural network
(BPNN), so we call it the back

propagation agent network (BPAN).
The difference between BPAN and BPNN is list as following.
(1). All neurons in BPNN are replaced w
ith agents in BPAN.
(2). In BPNN, all input and output for neurons are numbers, but in BPAN, symbolic inputs
and output are allowed for each agent.
(3). In BPNN, weights for links are adjusted to approximate the target function, but in BPAN,
agent adjust h
ypothesis to approximate the target function.
2.
Background
Although BPAN is similar to BPNN, but the idea comes from the LMS (least mean squares)
learning approach invented by Widrow and Hoff in 1960[2]. The LMS approach is a classic
method for machine le
arning and categorized into reinforcement learning approach in several
books [1][3]. The goal of LMS approach is to learn a set of weights for utility function to
minimize the error E between the training values (V
train
) and the values predicted by the
hyp
othesis (V
predict
). However, for many problems, such as machine control and chess playing,
programs do not get response from environment in every step, so that no training value
available for the program. Widrow and Hoff proposed a smart approach to use th
e average
utility of successor state as the expect value (V
expect
) when there are no responses at the time
now. The following formula shows function V
expect
and the error function.
V
expect
(e) =
s=successor(e)
V
predict
(s)
E =
e
(V
expect
(e)

V
predict
(e)
)
2

(1)
Widrow and Hoff proposed an algorithm to learn weights for minimizing the error function
showed in Figure 1.
Algorithm LMS

Learning
Input : a set of training examples {(e
1
, v
1
)
…
⡥
m
, v
m
) }
Output : a set of weig
hts {w
1
,
…
w
n
} to approximate target function f
Initialize weights randomly.
For each training example (e,v)
Use the current weights to calculate V
predict
(e)
For each weight w
i
, update it as
w
i
w
i
+
⡖
expect
(e)
–
V
predict
(e)) x
i
End
Figure 1 : LMS
Learning Algorithm
Several applications adopt the LMS approach successfully, such as game

playing and robot
control. For game

playing, Arthur Samuel build a successful chess program by modifying the
LMS approach [4][5]. The modified approach is now called
Temporal Difference Learning
(TD

learning) [6][7] with Q

function[][] in several books [1][3]. A more recently application
designed by Tesaro called TD

gammon system [10] using the TD

learning have a better result
than his previous designed Neurogammon sys
tem[11] based on neural network. For robot
control, Michie and Chambers designed a system called BOXES to resolve the cart

pole
problem [12].
Algorithm Back

Propagation
Input : N: a multi layer network
E : a set of training examples {(e1,v1)
…
(敭ⱶ,)}
㨠
t桥hle慲ni湧ate
O畴灵p : 獥t ⁷敩g桴猠sw
1
,
…
w
n
} to approximate target function f
For each training example (e,v)
Propagate the input forward through the network :
1.
Input the instance e to the network and compute the output O(u) for every
neuron in th
e network
Propagate the error backward through the network :
2.
For each network output unit k, calculate it
’
s error terms
k
k
o k ( 1

o
k
)(t
k

o
k
)
3.
For each hidden unit h, calculate its error term
h
,
h
o h ( 1

o
h
)
w
kh
h
4.
w
ji
w
ji
+
w
ji
where
w
j
i
=
j
x
ji
End
Figure 2 : Back

Propagation Algorithm
If we read the algorithm in Figure 1 and Figure 2 carefully, we may found that both of them
may be abstract into the following algorithm
For each example e
For each node (n) in a graph in order
C
alculate the predicted output of n
For each agent a in reverse order
Calculate the expected input of n
Adjust the weights to approach expect input from predicted output
Figure 3 : Abstract Back

Propagation Algorithm
3.
The Model of Back

Propagation Age
nt Network
For each example e
For each agent a in order
a.predict()
For each agent a in reverse order
a.expect()
a.adjust()
Figure 4 : Back

Propagation Algorithm for Agent Network
A BPAN is a network with shared variables. Agents in the network
“
sense
”
the value of
variables, adjust the hypothesis and modify the value of variables as it
’
s
“
effect
”
. Shared
variables in the network may be categorized into two groups
–
prediction variables and
expectation variables. and define a set of functions for
each agent in this model.
4.
Examples for Back

Propagation Agent Network
4.1 Reinforcement Learning
5.
Conclusion and Future Works.
Reference
1.
Russell, S., & Norvig, P. (1995) Artificial Intelligence : A modern approach. Englewood
Cliffs, NJ : Prentice

Hall.
2.
Widrow, B. and Hoff, M.E. (1960). Adaptive switching circuits. In 1960 IRE WESCON
Conention Record, pages 96

104, New York.
3.
Tom M. Mitchell
(1997) Tom M.
Mitchell McGraw

Hill International Editions.
4.
Samuel, A.L.(1959) Some studies in machine learning usin
g the game checkers. IBM
Journal of Research and Development. 3(3):210

229.
5.
Samuel,A.L. (1967). Some studies in machine learning using the game of checkers
II

Recent progress. IBM Journal of Research and Development, 11(6):601

617.
6.
TD

Learning
7.
Watkins, C.J
.(1989). Models of Delayed Reinforcement Learning. PhD thesis,
Psychology Department, Cambridge University, Cambridge, United Kingdom.
8.
Tesaro, G. (1992). Practical issues in temporal difference learning. Machine Learning,
8(3

4):257

277.
9.
Tesaro, G. and Sej
nowski, T.J. (1989). A parallel network that learns to play backgammon.
Artificial Intelligence, 39(3):357

390.
10.
Michie, D. and Chambers, R.A.(1968). BOXES: An experiment in adaptive control. In
Dale, E. Michie, D., editors, Machine Intelligence 2, pages 12
5

133.
Elsevier/North

Holland, Amsterdam, London, New York.
11.
Minsky, M., & Papert, S. (1969). Perceptrons. Cambrige, MA: MIT Press.
12.
Newell, A., Shaw, J.C., and Simon, H.A. (1958) Chess Playing Programs and the problem
of Complexity. IBM Journal of Research
and Development, 4(2):320

335. Repriented in
Feigenbaum and Fedman (1963).
13.
Genesereth and Ketchpel (1994), Agent

Based Software Engineering.
Comments 0
Log in to post a comment