Artificial Intelligence,
An Introductory Course
The International Congress for global Science and Technology
www.icgst.com
Instructor: Ashraf Aboshosha, Dr. rer. nat.
Engineering Dept., Atomic Energy Authority,
8
th
Section, Nasr City, Cairo, P.O. Box. 29
Email:
aboshosha@icgst.com
,
Tel.: 0121804952
Lecture (3): Artificial Neural Networks
Course Syllabus:
• An introduction to artificial intelligence and
machine learning
• Artificial Neural Networks
• Intelligent search techniques
• Neural programming
This is free educational material
1
6. Network learning strategies
Under the notation of learning in a network, we will consider a process
of forcing a network to yield a particular response to a specific input. A
particular response may or may not be specified to provide external
correction. learning is necessary when the information about inputs/ outputs is
unknown or incomplete a priori, so that no design of a network can be
performed in advance. The majority of the networks requires training in a
supervised or unsupervised learning mode. Some of the networks, however,
can be designed without incremental training. They are designed by batch
learning rather than stepwise training.
Batch learning takes place when the network weights are adjusted in
a single training step. In this mode of learning, the complete set of input/output
training data is needed to determine the weights, and feedback information
produced by the network itself is not involved in developing the network. This
learning technique is also called recording. Learning with feedback either
from the teacher or from the environment rather than a teacher, however, is
more typical for neural networks. Such learning is called incremental and is
usually performed in steps. The concept of feedback plays a central role in
learning. The concept is highly elusive and somewhat paradoxical. In a broad
sense it can be understood as an introduction of a pattern of relationships into
the causeandeffect path. We will distinguish two different types of learning.
1)supervised learning 2)unsupervised learning
2
X
F(W,X)
W
Network
Learning
Algorithm
O
Network output
Network Input
d
Learning
Signal
fig. (9) Supervised Learning Rule
In supervised learning fig. (9) we assume that at each instant of time
when the inputs applied, the desired response d of the system is provided by
the teacher, the distance between the actual and desired response serves as
error measure and is used to correct network parameters externally. Sense
we assume adjustable weights, the teacher may implement a rewardand
punishment scheme to adapt the network's weight matrix W. For instance, in
learning classifications of input patterns or situations with known response,
the error can be used to modify weights so that the error decreases. This
mode of learning is very pervasive. Also, it is used in many situations of
natural learning. A set of input and output patterns called a training set is
required for this learning mode. Typically, supervised learning rewards
accurate classifications or associations and punishes those which yield
inaccurate response. The teacher estimates the negative error gradient
direction and reduces the error accordingly. In many situations, the inputs,
outputs and the computed gradient are deterministic, however, the
minimization of error proceeds over all its random realizations. As a result,
most supervised learning algorithms reduce to stochastic minimization of error
in multidimensional weight space. In learning without supervision, the
desired response is not known , thus , explicit error information cannot be
used to improve network behavior. Since no information is available as to
correctness or incorrectness of responses, learning must somehow be
accomplished based on observations of responses to inputs that we have
marginal or no knowledge about. for example unsupervised learning can
easily result in finding the boundary between classes of input patterns.
3
X
F(W,X)
W
Network output
O
Network Input
Fig. (10) Unsupervised Learning Rule
Unsupervised learning fig. (10) algorithms use patterns that are
typically redundant raw data having no labels regarding their class
membership, or association. in this mode of learning the network must
discover for itself any possibly existing patterns, regularities, separating
properties, etc. while discovering these, the network undergoes change of its
parameters, which is called selforganization. the technique of unsupervised
learning is often used to perform clustering as the unsupervised classification
of objects without providing information about the actual classes. This kind of
learning corresponding to minimal a priori information available. Some
information about the number of clusters, or similarity versus dissimilarity of
patterns, can be helpful for this mode of learning. Unsupervised learning is
sometimes called learning without teacher. this terminology is not the most
appropriate, because learning without teacher is not possible at all. although,
the teacher does not have to be involved in every training step, he has to set
goals even in an unsupervised learning mode. We may think of the following
analogy. learning with supervision corresponds to classroom learning with the
teacher's questions answered by students and corrected, if needed by the
teacher. Learning without supervision corresponds to learning the subject
from a videotape lecture covering the material but not including any other
teacher's involvement. Therefor, the student cannot get explanations of
unclear questions, check answers and become fully informed.
6.1. The general algorithm of learning
4
n
i
x
Neural networks have a different types and every type has its own
learning rule. All the methods of learning have the same general algorithm,
this algorithm mainly change the network parameters according to its learning
rule to accommodate the network’s characteristics to its desired pattern. In
general for the neuron I and its input j the weight vector
increases in proportion to the product of input x and
learning signal r. The learning signal r is in general a function of and
sometimes of the teacher’s signal . We thus have for the network shown in
fig. (11):
1 2
[ ............ ]
t
i i i i
w w w w=
,,
i
W X
i
d
..........................................(16)
(,,)
i
r f w x d=
The increment of the weight vector W
i
produced by the learning step at time t
according to the general learning rule is
.................... (17)
( ) [ ( ), ( ),( )] ( )
i i i
w t cr w t x t d t x t∆ =
Where c is a positive number called the learning constant that determines the rate of
learning. The weight vector adapted at time t becomes at the next instant, or learning
step,
....... (18)
( 1) ( ) [ ( ),( ),( )] ( )
i i i i
w t w t cr w t x t d t x t+ = +
The superscript convention will be used in this text to index the disceretetime
training steps as in equ. (18). For the k’th step we thus have from (18) using this
convention.
......................... (19)
1
(,,)
k k k k k k
i i i i
w w cr w x d
+
= +
The learning in (18,19) assumes the form of a sequence of discerte time weight
modifications. Continuoustime learning can be expressed as :
( )
( )
i
dw t
crx t
dt
=
........................................................ (20)
w
1
5
f(net)
w
2
∆W
Wn
x
1
x
2
O=f(net)
f’(net)
Xn

r
d
X
+
X
X
+
C
Fig. (11) Neural Networks learning algorithm
3.6.2. Delta learning rule
The Delta learning rule is valid for continuos activation functions, and
in the supervised training mode. The learning signal for this rule is called delta
and is defined as follows
...................(21)
[  ( )]'( )
t t
i i i
r d f w X f w X≅
The term f W is the derivative of the activation function f(net)
computed for net=. The explanation of the delta learning rule is shown in
fig. (3.11) this learning rule can be readily derived from the condition of least
square error between
o
and . Calculating the gradient vector with respect
to of the square error defined as
X
i
t
'( )
w
i
t
X
i
d
i
w
i
..............................(22)
E 0.5 (d  o )
i i
2
≅
Which is equivalent to
6
X
X
...........................(23)
2
0.5 [  ( )]
t
i i
E d f w=
We Obtain the error gradient vector are
......................(24)
 (  )'( )
t
i i i
E d o f w X∇ =
The component of the gradient vector are
/  (  )'( )
t
ij i i i i
E
W d o
f
W X X∂ ∂ =
.......(25)
Since the minimization of the error requires the weight changes to be in the
negative gradient direction, we take
∆W
i
=−c ∇Ε ........................................(26)
Where η is a positive constant. We then obtain from equations.
∆w
i
=c(d
i
−o
i
)f’(net
i
)x .....................(27)
Οr, for the single weight the adjustment becomes
∆w
ij
=c(d
i
o
i
)f’(net
i
)x
j ....................(28)
Νote that the weight adjustment as in equation is computed based on
minimization of the squared error. considering the use of the general learning
rule and plugging in the learning signal as defined in equation , the weight
adjustment becomes
∆w
i
=c(d
i
o
i
)f’(net
i
)x
j ...................(29)
Science c has been assumed to be arbitrary constant. The weights are
initialized at any values for this method of training. The delta rules was
introduced only recently for neural networks training (McClelland and
Rumelhart 1986). this rule parallels the discrete perceptron training rule. It
also can be called the continuous perceptron training rule. it also can be
called the continuous perceptron training rule. The delta learning rule can be
generalized for multilayer networks.
6.3. WidrowHoff learning rule
7
i
x
i
x
x
The WindrowHoff learning rule (Widrow 1962) is applicable for the
supervised training of neural networks. It is independent of the activation
function of neurons used since it minimizes the squared error between the
desired output value and the neuron’s activation value . The
learning signal for this rule is defined as follows:
i
d
t
i
net w x
=
.............................. (30)

t
i i
r d w=
The weight vector increment under this learning rule is
..................... (31)
(  )
t
i
w c d w x∆ =
or, for the single weight the adjustment is
, for j=1,2, .. , n .............. (32)
(  )
t
ij i i j
w c d w x∆ =
This rule can be considered a special case of the Delta learning rule.
Indeed, assuming that f(net)=net, we obtain that f’(net)=1. this rule sometimes
called the LMS (Least Mean Square) learning rule. Weights are initialized at
any values in this method.
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο