Back-Propagation Agent Network

randombroadAI and Robotics

Oct 15, 2013 (3 years and 8 months ago)

66 views

Back
-
Propagation Agent Network


Chung
-
Chen Chen


Abstract

Back propagation neural
-
network successfully resolves many problems in computer science.
However, it is not such a good model for symbolic problems such as natural language
processing (NLP), because

the neuron cannot represent symbol intuitively. In this paper, we
proposed an agent
-
network based on the back
-
propagation algorithm to resolve not only
problems of weight adjustment, but also problem for symbolic computation.


Keywords : Agent, Neural Net
work, Back
-
propagation, Natural Language


1.

Introduction

Agent becomes a fancy idea in computer science recently without formal definition.
Informally speaking, an agent is an object that perceiving its environment through sensors and
acting upon that enviro
nment through effector [1]. Several question arise from the informal
definition, including the terms

environment

,

sensor

, and

effector

. In this paper, we
propose a learning model for a network of agents, and define the

envionment

,

sensor


and

eff
ector


in this model. The learning model is similar to the back
-
propagation neural network
(BPNN), so we call it the back
-
propagation agent network (BPAN).


The difference between BPAN and BPNN is list as following.

(1). All neurons in BPNN are replaced w
ith agents in BPAN.

(2). In BPNN, all input and output for neurons are numbers, but in BPAN, symbolic inputs
and output are allowed for each agent.

(3). In BPNN, weights for links are adjusted to approximate the target function, but in BPAN,
agent adjust h
ypothesis to approximate the target function.




2.

Background

Although BPAN is similar to BPNN, but the idea comes from the LMS (least mean squares)
learning approach invented by Widrow and Hoff in 1960[2]. The LMS approach is a classic
method for machine le
arning and categorized into reinforcement learning approach in several
books [1][3]. The goal of LMS approach is to learn a set of weights for utility function to
minimize the error E between the training values (V
train
) and the values predicted by the
hyp
othesis (V
predict
). However, for many problems, such as machine control and chess playing,
programs do not get response from environment in every step, so that no training value
available for the program. Widrow and Hoff proposed a smart approach to use th
e average
utility of successor state as the expect value (V
expect
) when there are no responses at the time
now. The following formula shows function V
expect

and the error function.



V
expect
(e) =

s=successor(e)

V
predict
(s)

E =

e

(V
expect
(e)
-

V
predict
(e)
)
2
-----------------------------------

(1)


Widrow and Hoff proposed an algorithm to learn weights for minimizing the error function
showed in Figure 1.


Algorithm LMS
-
Learning

Input : a set of training examples {(e
1
, v
1
)



m
, v
m
) }

Output : a set of weig
hts {w
1
,

w
n
} to approximate target function f


Initialize weights randomly.

For each training example (e,v)


Use the current weights to calculate V
predict
(e)


For each weight w
i
, update it as



w
i



w
i

+



expect
(e)


V
predict
(e)) x
i

End

Figure 1 : LMS
Learning Algorithm

Several applications adopt the LMS approach successfully, such as game
-
playing and robot
control. For game
-
playing, Arthur Samuel build a successful chess program by modifying the
LMS approach [4][5]. The modified approach is now called
Temporal Difference Learning
(TD
-
learning) [6][7] with Q
-
function[][] in several books [1][3]. A more recently application
designed by Tesaro called TD
-
gammon system [10] using the TD
-
learning have a better result
than his previous designed Neurogammon sys
tem[11] based on neural network. For robot
control, Michie and Chambers designed a system called BOXES to resolve the cart
-
pole
problem [12].


Algorithm Back
-
Propagation

Input : N: a multi layer network

E : a set of training examples {(e1,v1)

(敭ⱶ,)}




t桥hle慲ni湧⁲ate

O畴灵p :⁡ 獥t ⁷敩g桴猠sw
1
,

w
n
} to approximate target function f


For each training example (e,v)


Propagate the input forward through the network :

1.

Input the instance e to the network and compute the output O(u) for every
neuron in th
e network

Propagate the error backward through the network :

2.

For each network output unit k, calculate it

s error terms

k



k



o k ( 1
-
o
k
)(t
k
-
o
k
)

3.

For each hidden unit h, calculate its error term

h
,



h



o h ( 1
-
o
h
)


w
kh

h

4.

w
ji


w
ji
+

w
ji
where


w
j
i
=



j
x
ji

End

Figure 2 : Back
-
Propagation Algorithm


If we read the algorithm in Figure 1 and Figure 2 carefully, we may found that both of them
may be abstract into the following algorithm


For each example e


For each node (n) in a graph in order



C
alculate the predicted output of n


For each agent a in reverse order



Calculate the expected input of n



Adjust the weights to approach expect input from predicted output

Figure 3 : Abstract Back
-
Propagation Algorithm


3.

The Model of Back
-
Propagation Age
nt Network


For each example e


For each agent a in order



a.predict()


For each agent a in reverse order



a.expect()



a.adjust()

Figure 4 : Back
-
Propagation Algorithm for Agent Network


A BPAN is a network with shared variables. Agents in the network

sense


the value of
variables, adjust the hypothesis and modify the value of variables as it

s

effect

. Shared
variables in the network may be categorized into two groups


prediction variables and
expectation variables. and define a set of functions for

each agent in this model.


4.

Examples for Back
-
Propagation Agent Network

4.1 Reinforcement Learning


5.

Conclusion and Future Works.


Reference

1.

Russell, S., & Norvig, P. (1995) Artificial Intelligence : A modern approach. Englewood
Cliffs, NJ : Prentice
-
Hall.

2.

Widrow, B. and Hoff, M.E. (1960). Adaptive switching circuits. In 1960 IRE WESCON
Conention Record, pages 96
-
104, New York.

3.

Tom M. Mitchell
(1997) Tom M.
Mitchell McGraw
-
Hill International Editions.

4.

Samuel, A.L.(1959) Some studies in machine learning usin
g the game checkers. IBM
Journal of Research and Development. 3(3):210
-
229.

5.

Samuel,A.L. (1967). Some studies in machine learning using the game of checkers
II
-
Recent progress. IBM Journal of Research and Development, 11(6):601
-
617.

6.

TD
-
Learning

7.

Watkins, C.J
.(1989). Models of Delayed Reinforcement Learning. PhD thesis,
Psychology Department, Cambridge University, Cambridge, United Kingdom.

8.

Tesaro, G. (1992). Practical issues in temporal difference learning. Machine Learning,
8(3
-
4):257
-
277.

9.

Tesaro, G. and Sej
nowski, T.J. (1989). A parallel network that learns to play backgammon.
Artificial Intelligence, 39(3):357
-
390.

10.

Michie, D. and Chambers, R.A.(1968). BOXES: An experiment in adaptive control. In
Dale, E. Michie, D., editors, Machine Intelligence 2, pages 12
5
-
133.
Elsevier/North
-
Holland, Amsterdam, London, New York.

11.

Minsky, M., & Papert, S. (1969). Perceptrons. Cambrige, MA: MIT Press.

12.

Newell, A., Shaw, J.C., and Simon, H.A. (1958) Chess Playing Programs and the problem
of Complexity. IBM Journal of Research
and Development, 4(2):320
-
335. Repriented in
Feigenbaum and Fedman (1963).

13.

Genesereth and Ketchpel (1994), Agent
-
Based Software Engineering.