14: Artificial Neural Networks

ARTIFICIAL NEURAL NETWORKS

Amrender Kumar

Indian Agricultural Statistics Research Institute, New Delhi-11012

akjha@iasri.res.in

1. Introduction

Neural networks, more accurately called Artificial Neural Networks (ANNs), are

computational models that consist of a number of simple processing units that communicate

by sending signals to one another over a large number of weighted connections. They were

originally developed from the inspiration of human brains. In human brains, a biological

neuron collects signals from other neurons through a host of fine structures called dendrites.

The neuron sends out spikes of electrical activity through a long, thin stand known as an

axon, which splits into thousands of branches. At the end of each branch, a structure called a

synapse converts the activity from the axon into electrical effects that inhibit or excite activity

in the connected neurons. When a neuron receives excitatory input that is sufficiently large

compared with its inhibitory input, it sends a spike of electrical activity down its axon.

Learning occurs by changing the effectiveness of the synapses so that the influence of one

neuron on another changes. Like human brains, neural networks also consist of processing

units (artificial neurons) and connections (weights) between them. The processing units

transport incoming information on their outgoing connections to other units. The "electrical"

information is simulated with specific values stored in those weights that make these

networks have the capacity to learn, memorize, and create relationships amongst data. A very

important feature of these networks is their adaptive nature where "learning by example"

replaces "programming" in solving problems. This feature makes such computational models

very appealing in application domains where one has little or incomplete understanding of the

problem to be solved but where training data is readily available. These networks are

“neural” in the sense that they may have been inspired by neuroscience but not necessarily

because they are faithful models of biological neural or cognitive phenomena. ANNs have

powerful pattern classification and pattern recognition capabilities through learning and

generalize from experience. ANNs are non-linear data driven self adaptive approach as

opposed to the traditional model based methods. They are powerful tools for modelling,

especially when the underlying data relationship is unknown. ANNs can identify and learn

correlated patterns between input data sets and corresponding target values. After training,

ANNs can be used to predict the outcome of new independent input data. ANNs imitate the

learning process of the human brain and can process problems involving non-linear and

complex data even if the data are imprecise and noisy. These techniques are being

successfully applied across an extraordinary range of problem domains, in areas as diverse as

finance, medicine, engineering, geology, physics, biology and agriculture. There are many

different types of neural networks. Some of the most traditional applications include

classification, noise reduction and prediction.

2. Review

Genesis of ANN modeling and its applications appear to be a recent development.

However, this field was established before the advent of computers. It started with the

modeling the functions of a human brain by McCulloch and Pitts in 1943, proposed a model

of “computing element” called Mc-Culloch – Pitts neuron, which performs weighted sum of

the inputs to the element followed by a threshold logic operation. Combinations of these

151

14: Artificial Neural Networks

computing elements were used to realize several logical computations. The main drawback of

this model of computation is that the weights are fixed and hence the model could not learn

from examples. Hebb (1949) proposed a learning scheme for adjusting a connection weight

based on pre and post synaptic values of the variables. Hebb’s law became a fundamental

learning rule in neuron – network literature. Rosenblatt (1958) proposed the perceptron

models, which have weights adjustable by the perceptron learning law. Widrows and Hoff

(1960) proposed an ADALINE (Adaptive Linear Element) model for computing elements

and LMS (Least Mean Square) learning algorithm to adjust the weights of an ADALINE

model. Hopfield (1982) gave energy analysis of feed back neural networks. The analysis has

shown the existence of stable equilibrium states in a feed back network, provided the network

has symmetrical weights. Rumelhart et al. (1986) showed that it is possible to adjust the

weights of a multilayer feed forward neural network in a systematic way to learn the implicit

mapping in a set of input – output patterns pairs. The learning law is called generalized delta

rule or error back propagation. Cheng and Titterington (1994) made a detailed study of ANN

models vis-a-vis traditional statistical models. They have shown that some statistical

procedures including regression, principal component analysis, density function and

statistical image analysis can be given neural network expressions. Warner and Misra (1996)

reviewed the relevant literature on neural networks, explained the learning algorithm and

made a comparison between regression and neural network models in terms of notations,

terminologies and implementation. Kaastra and Boyd (1996) developed neural network

model for forecasting financial and economic time series. Dewolf and Francl (1997, 2000)

demonstrated the applicability of neural network technology for plant diseases forecasting.

Zhang et al. (1998) provided the general summary of the work in ANN forecasting, providing

the guidelines for neural network modeling, general paradigm of the ANNs especially those

used for forecasting. They have reviewed the relative performance of ANNs with the

traditional statistical methods, wherein in most of the studies ANNs were found to be better

than the latter. Sanzogni and Kerr (2001) developed models for predicting milk production

from farm inputs using standard feed forward ANN. Chakraborty et al. (2004) utilized the

ANN technique for predicted severity of anthracnose diseases in legume crop. Gaudart et al.

(2004) compared the performance of MLP and that of linear regression for epidemiological

data with regard to quality of prediction and robustness to deviation from underlying

assumptions of normality, homoscedasticity and independence of errors and it was found that

MLP performed better than linear regression. More general books on neural networks, to cite

a few, Hassoun (1995), Patterson (1996), Schalkoff (1997), Yegnanarayana (1999), Anderson

(2003) etc. are available. Software on neural networks has also been made, to cite a few,

Statistica, Matlab etc.

Commercial Software:- Statistica Neural Network, TNs2Server,DataEngine, Know Man

Basic Suite, Partek, Saxon, ECANSE - Environment for Computer Aided Neural Software

Engineering, Neuroshell, Neurogen, Matlab:Neural Network Toolbar, Tarjan, FCM(Fuzzy

Control manager) etc.

Freeware Software:- NetII, Spider Nets Neural Network Library, NeuDC, Binary Hopfeild

Net with free Java source, Neural shell, PlaNet, Valentino Computational Neuroscience Work

bench, Neural Simulation language version-NSL, etc.

3. Characteristics of neural networks

The following are the basic characteristics of neural network:

•

Exhibit mapping capabilities, that is, they can map input patterns to their associated

output patterns.

•

Learn by examples. Thus, NN architectures can be ‘trained’ with known examples of

152

14: Artificial Neural Networks

a problem before they are tested for their ‘inference’ capability on unknown instances

of the problem. They can, therefore, identify new objects previously untrained.

•

Possess the capability to generalize. Thus, they can predict new outcomes from past

trends.

•

Robust systems and are fault tolerant. They can, therefore, recall full patterns from

incomplete, partial or noisy patterns.

4. Basics of artificial neural networks

The terminology of artificial neural networks has developed from a biological model

of the brain. A neural network consists of a set of connected cells: The neurons. The neurons

receive impulses from either input cells or other neurons and perform some kind of

transformation of the input and transmit the outcome to other neurons or to output cells. The

neural networks are built from layers of neurons connected so that one layer receives input

from the preceding layer of neurons and passes the output on to the subsequent layer. A

neuron is a real function of the input vector (y

1

,K, y

k

). The output is obtained as

f (x

j

) = f

where f is a function, typically the sigmoid (logistic or tangent

hyperbolic) function. A graphical presentation of neuron is given in figure 1. Mathematically

a Multi-Layer Perceptron network is a function consisting of compositions of weighted sums

of the functions corresponding to the neurons.

)(

1

∑

=

+

k

i

iijj

ywα

Fig. 1: A single neuron

5. Neural networks architectures

An ANNs is defined as a data processing system consisting of a large number of

simple highly inter connected processing elements (artificial neurons) in an architecture

inspired by the structure of the cerebral cortex of the brain. There are several types of

architecture of ANNs. However, the two most widely used ANNs are discussed below:

5.1 Feed forward networks

In a feed forward network, information flows in one direction along connecting

pathways, from the input layer via the hidden layers to the final output layer. There is no

feedback (loops) i.e., the output of any layer does not affect that same or preceding layer. A

graphical presentation of feed forward network is given in figure 2.

153

14: Artificial Neural Networks

Fig. 2: A multi-layer feed forward neural network

5.2 Recurrent networks

These networks differ from feed forward network architectures in the sense that there

is at least one feedback loop. Thus, in these networks, for example, there could exist one

layer with feedback connections as shown in figure below. There could also be neurons with

self-feedback links, i.e. the output of a neuron is fed back into itself as input. A graphical

presentation of feed forward network is given in figure 3.

Input layer Hidden layer Output layer

Fig. 3: Recurrent neural network

6. Types of neural networks

There are wide variety of neural networks and their architectures. Types of neural

networks range from simple Boolean networks (perceptions) to complex self-organizing

networks (Kohonen networks). There are also many other types of networks like Hopefield

networks, Pulse networks, Radial-Basis Function networks, Boltzmann machine. The most

important class of neural networks for real world problems solving includes

•

Multilayer Perceptron

•

Radial Basis Function Networks

•

Kohonen Self Organizing Feature Maps

6.1 Multilayer Perceptron

The most popular form of neural network architecture is the multilayer perceptron

(MLP). A multilayer perceptron:

•

has any number of inputs.

•

has one or more hidden layers with any number of units.

•

uses linear combination functions in the input layers.

•

uses generally sigmoid activation functions in the hidden layers.

•

has any number of outputs with any activation function.

154

14: Artificial Neural Networks

•

has connections between the input layer and the first hidden layer, between the hidden

layers, and between the last hidden layer and the output layer.

Given enough data, enough hidden units, and enough training time, an MLP with just one

hidden layer can learn to approximate virtually any function to any degree of accuracy. (A

statistical analogy is approximating a function with nth order polynomials.) For this reason

MLPs are known as universal approximators and can be used when you have little prior

knowledge of the relationship between inputs and targets. Although one hidden layer is

always sufficient provided you have enough data, there are situations where a network with

two or more hidden layers may require fewer hidden units and weights than a network with

one hidden layer, so using extra hidden layers sometimes can improve generalization.

6.2 Radial Basis Function Networks

Radial basis functions (RBF) networks are also feedforward, but have only one

hidden layer. A RBF network:

•

has any number of inputs.

•

typically has only one hidden layer with any number of units.

•

uses radial combination functions in the hidden layer, based on the squared Euclidean

distance between the input vector and the weight vector.

•

typically uses exponential or softmax activation functions in the hidden layer, in

which case the network is a Gaussian RBF network.

•

has any number of outputs with any activation function.

•

has connections between the input layer and the hidden layer, and between the hidden

layer and the output layer.

MLPs are said to be distributed-processing networks because the effect of a hidden unit can

be distributed over the entire input space. On the other hand, Gaussian RBF networks are said

to be local-processing networks because the effect of a hidden unit is usually concentrated in

a local area centered at the weight vector.

6.3 Kohonen Neural Network

Self Organizing Feature Map (SOFM, or Kohonen) networks are used quite

differently to the other networks. Whereas all the other networks are designed for supervised

learning tasks, SOFM networks are designed primarily for unsupervised learning (Patterson,

1996). At first glance this may seem strange. Without outputs, what can the network learn?

The answer is that the SOFM network attempts to learn the structure of the data. One possible

use is therefore in exploratory data analysis. A second possible use is in novelty detection.

SOFM networks can learn to recognize clusters in the training data, and respond to it. If new

data, unlike previous cases, is encountered, the network fails to recognize it and this indicates

novelty. A SOFM network has only two layers: the input layer, and an output layer of radial

units (also known as the topological map layer). Schematic representation of Kohonen

network is given in Fig. 4

155

14: Artificial Neural Networks

Fig. 4: A Kohonen Neural Network Applications

7. Learning of ANNs

The most significant property of a neural network is that it can learn from

environment, and can improve its performance through learning. Learning is a process by

which the free parameters of a neural network i.e. synaptic weights and thresholds are

adapted through a continuous process of stimulation by the environment in which the

network is embedded. The network becomes more knowledgeable about environment after

each iteration of learning process. There are three types of learning paradigms namely,

supervised learning, reinforced learning and self-organized or unsupervised learning.

7.1 Supervised learning

In this, every input pattern that is used to train the network is associated with an

output pattern, which is the target or the desired pattern. A teacher is assumed to be present

during the learning process, when a comparison is made between the network’s computed

output and the correct expected output, to determine the error. The error can then be used to

change network parameters, which result in an improvement in performance.

Learning law describes the weight vector for the i

th

processing unit at time instant

(t+1) in terms of the weight vector at time instant (t) as follows:

)()()1( twtwtw

iii

Δ

+

=

+

,

where is the change in the weight vector. )(tw

i

Δ

The network adapts as follows: change the weight by an amount proportional to the

difference between the desired output and the actual output. As an equation:

Δ W

i

= η * (D-Y).I

i

where η is the learning rate, D is the desired output, Y is the actual output, and I

i

is the i

th

input. This is called the Perceptron Learning Rule. The weights in an ANN, similar to

coefficients in a regression model, are adjusted to solve the problem presented to ANN.

Learning or training is term used to describe process of finding values of these weights.

Supervised learning which incorporates an external teacher, so that each output unit is told

what its desired response to input signals ought to be. During the learning process global

information may be required. An important issue concerning supervised learning is the

156

14: Artificial Neural Networks

problem of error convergence, i.e. the minimization of error between the desired and

computed unit values. The aim is to determine a set of weights which minimizes the error.

7.2 Unsupervised learning

With unsupervised learning, there is no feedback from the environment to indicate if

the outputs of the network are correct. The network must discover features, regulations,

correlations, or categories in the input data automatically. In fact, for most varieties of

unsupervised learning, the targets are the same as inputs. In other words, unsupervised

learning usually performs the same task as an auto-associative network, compressing

information from the inputs.

7.3 Reinforced learning

In supervised learning there is a target output value for each input value. However, in

many situations, there is less detailed information available. In extreme situations, there is

only a single bit of information after a long sequence of inputs telling whether the output is

right or wrong. Reinforcement learning is one method developed to deal with such situations.

Reinforcement learning is a kind of learning in that some feedback from the environment is

given. However the feedback signal is only evaluative, not instructive. Reinforcement

learning is often called learning with a critic as opposed to learning with a teacher.

8. Development of an ANN model

The various steps in developing a neural network model are:

8.1 Variable selection

The input variables important for modeling/ forecasting variable(s) under study are

selected by suitable variable selection procedures.

8.2 Formation of training, testing and validation sets

The data set is divided into three distinct sets called training, testing and validation

sets. The training set is the largest set and is used by neural network to learn patterns present

in the data. The testing set is used to evaluate the generalization ability of a supposedly

trained network. A final check on the performance of the trained network is made using

validation set.

8.3 Neural network structure

Neural network architecture defines its structure including number of hidden layers,

number of hidden nodes and number of output nodes etc.

(a)

Number of hidden layers: The hidden layer(s) provide the network with its ability to

generalize. In theory, a neural network with one hidden layer with a sufficient

number of hidden neurons is capable of approximating any continuous function. In

practice, neural network with one and occasionally two hidden layers are widely

used and have to perform very well.

(b)

Number of hidden nodes: There is no magic formula for selecting the optimum

number of hidden neurons. However, some thumb rules are available for calculating

number of hidden neurons. A rough approximation can be obtained by the

geometric pyramid rule proposed by Masters (1993). For a three layer network with

n input and m output neurons, the hidden layer would have sqrt(n*m) neurons.

157

14: Artificial Neural Networks

(c)

Number of output nodes: Neural networks with multiple outputs, especially if these

outputs are widely spaced, will produce inferior results as compared to a network

with a single output.

(d)

Activation function: Activation functions are mathematical formulae that determine

the output of a processing node. Most units in neural network transform their net

inputs by using a scalar-to-scalar function called an activation function, yielding a

value called the unit's activation. Except possibly for output units, the activation

value is fed to one or more other units. Activation functions with a bounded range

are often called ‘squashing functions’. Appropriate differentiable function will be

used as activation function. Some of the most commonly used activation functions

are :

(a) The sigmoid (logistic) function

1

x1xf

−

−+=

))exp(()(

(b) The hyperbolic tangent (tanh) function

))exp()(exp(/))exp()(exp()( xxxxxf

−

+

−

−

=

(c)

The sine or cosine function

)cos()()sin()( xxforxxf

=

=

Activation functions for the hidden units are needed to introduce non-linearity into

the networks. The reason is that a composition of linear functions is again a linear

function. However, it is the non-linearity (i.e. the capability to represent nonlinear

functions) that makes multilayer networks so powerful. Almost any nonlinear

function does the job, although for back-propagation learning it must be

differentiable and it helps if the function is bounded. Therefore, the sigmoid

functions are the most common choices. There are some heuristic rules for selection

of the activation function. For example, Klimasauskas (1991) suggests logistic

activation functions for classification problems which involve learning about

average behaviour, and to use the hyperbolic tangent functions if the problem

involves learning about deviations from the average such as the forecasting

problem.

8.4 Model building

Multilayer feed forward neural network or multi layer perceptron (MLP), is very

popular and is used more than other neural network type for a wide variety of tasks.

Multilayer feed forward neural network learned by back propagation algorithm is based on

supervised procedure, i.e., the network constructs a model based on examples of data with

known output. It has to build the model up solely from the examples presented, which are

together assumed to implicitly contain the information necessary to establish the relation. An

MLP is a powerful system, often capable of modeling complex, relationships between

variables. It allows prediction of an output object for a given input object. The architecture of

MLP is a layered feedforward neural network in which the non-linear elements (neurons) are

arranged in successive layers, and the information flow uni-directionally from input layer to

output layer through hidden layer(s). An MLP with just one hidden layer can learn to

approximate virtually any function to any degree of accuracy. For this reason MLPs are

known as universal approximates and can be used when we have litter prior knowledge of the

relationship between input and targets. One hidden layer is always sufficient provided we

have enough data. Schematic representation of neural network is given in Fig. 5

158

14: Artificial Neural Networks

Inputs

Outputs

Fig. 5: Schematic representation of neural network

Each interconnection in an ANN has a strength that is expressed by a number referred to as

weight. This is accomplished by adjusting the weights of given interconnection according to

some learning algorithm. Learning methods in neural networks can be broadly classified into

three basic types (i) supervised learning (ii) unsupervised learning and (iii) reinforced

learning. In MLP, the supervised learning will be used for adjusting the weights. The graphic

representation of this learning is given in Fig. 6

Input vector

Output vector

Target vector

Differences

8.5 Neural network training

Training a neural network to learn patterns in the data involves iteratively presenting

it with examples of the correct known answers. The objective of training is to find the set of

weights between the neurons that determine the global minimum of error function. This

involves decision regarding the number of iteration i.e., when to stop training a neural

network and the selection of learning rate (a constant of proportionality which determines the

size of the weight adjustments made at each iteration) and momentum values (how past

weight changes affect current weight changes). Backpropagation is the most commonly used

method for training multilayered feed-forward networks. It can be applied to any feed-

forward network with differentiable activation functions. For most networks, the learning

process is based on a suitable error function, which is then minimized with respect to the

weights and bias. If a network has differential activation functions, then the activations of the

output units become differentiable functions of input variables, the weights and bias. If we

also define a differentiable error function of the network outputs such as the sum of square

error function, then the error function itself is a differentiable function of the weights.

Therefore, we can evaluate the derivative of the error with respect to weights, and these

derivatives can then be used to find the weights that minimize the error function by either

using optimization method. The algorithm for evaluating the derivative of the error function

is known as backpropagation, because it propagates the errors backward through the

Adjust weights

ANN

model

Fig. 6 A learning cycle in the ANN model

=

159

14: Artificial Neural Networks

network. Multilayer feed forward neural network or multilayered perceptron (MLP), is very

popular and is used more than other neural network type for a wide variety of tasks. MLP

learned by backpropagation algorithm is based on supervised procedure, i.e. the network

constructs a model based on examples of data with known output. The Backpropagation

Learning Algorithm is based on an error correction learning rule and specifically on the

minimization of the mean squared error that is a measure of the difference between the actual

and the desired output. As all multilayer feedforward networks, the multilayer perceptrons are

constructed of at least three layers (one input layer, one or more hidden layers and one output

layer), each layer consisting of elementary processing units (artificial neurons), which

incorporate a nonlinear activation function, commonly the logistic sigmoid function.

The algorithm calculates the difference between the actual response and the desired output of

each neuron of the output layer of the network. Assuming that y

j

(n) is the actual output of the

j

th

neuron of the output layer at the iteration n and d

j

(n) is the corresponding desired output,

the error signal e

j

(n) is defined as:

)n(y)n(d)n(e

jjj

−

=

The instantaneous value of the error for the neuron j is defined as and

correspondingly, the instantaneous total error E(n) is obtained by summing the neural error

over all neurons in the output layer. Thus,

2/)n(e

2

j

2/)n(e

2

j

∑

=

j

2

j

)n(e

2

1

)n(E

In the above formula, j runs over all the neurons of the output layer. If we define N to be the

total number of training patterns that consist the training set applied to the neural network

during the training process, then the average squared error E

av

is obtained by summing E(n)

over all the training patterns and then normalizing with respect to the size N of the training

set. Thus,

∑

=

=

N

1n

av

)n(E

2

1

E

It is obvious, that the instantaneous error E(n), as well as the average squared error E

av

, is a

function of all the free parameters of the network. The objective of the learning process is to

modify these free parameters of the network in such a way that E

av

is minimized. To perform

this minimization, a simple training algorithm is utilized. The training algorithm updates the

synaptic weights on a pattern-by-pattern basis until one epoch, that is, one complete

presentation of the entire training set is completed. The correction (modification)

that is applied on the synaptic weight

(indicating the synaptic strength of the synapse

originating from neuron i and directing to neuron j), after the application of the n

th

training

pattern is proportional to the partial derivative

)n(w

ji

∇

ij

w

)n(w

)n(E

ji

∂

∂

. Specifically, the correction applied

is given by:

)n(w

)n(E

w

ji

ij

∂

∂

η−=Δ

In the above formula (this is also known as delta rule), η is the learning-rate parameter of the

back-propagation algorithm. The use of the minus sign in above equation accounts for the

gradient-descent in weight-space, reflecting the seek of a direction for weight change that

reduces the value of E(n). The exact value of the learning rate η is of great importance for the

convergence of the algorithm since it modulates the changes in the synaptic weights, from

160

14: Artificial Neural Networks

iteration to iteration. The smaller the value of η, the smoother the trajectory in the weight

space and the slower the convergence of the algorithm. On the other hand, if the value of η is

too large, the resulting large changes in the synaptic weights may result the network to

exhibit unstable (oscillatory) behaviour. Therefore, the momentum term was introduce for

generational of the above equation, Thus

)n(w

)n(E

)1n(ww

ji

jiij

∂

∂

η−−Δα=Δ

In this equation α is the is a positive number called the momentum constant is called the

Generalized Delta Rule and it includes the Delta Rule as a special case (α =0). The weight

update can be obtained as

)n(y)n()1n(w)n(w

ijjiij

η

δ

+

−

Δ

α

=

Δ

The weight adjustment

is made only after the entire training set has been presented to the

network (Konstantinos, A.; 2000).

ji

w

With respect to the convergence rate the back-propagation algorithm is relatively slow. This

is related to the stochastic nature of the algorithm that provides an instantaneous estimation of

the gradient of the error surface in weight space. In the case that the error surface is fairly flat

along a weight dimension, the derivative of the error surface with respect to that weight is

small in magnitude, therefore the synaptic adjustment applied to the weight is small and

consequently many iterations of the algorithms may be required to produce a significant

reduction in the error performance of the network.

9. Evaluation criteria

The most common error function minimized in neural networks is the sum of squared

errors. Other error functions offered by different software include least absolute deviations,

least fourth powers, asymmetric least squares and percentage differences.

10. Conclusion

ANNs has a ability to learn by example makes them very flexible and powerful which

make them quite suitable for a variety of problem areas. Hence, to best utilize ANNs for

different problems, it is essential to understand the potential as well as limitations of neural

networks. For some tasks, neural networks will never replace conventional methods, but for a

growing list of applications, the neural architecture will provide either an alternative or a

complement to these existing techniques. ANNs have a huge potential for prediction and

classification when they are integrated with Artificial Intelligence, Fuzzy Logic and related

subjects.

References

Anderson, J. A. (2003). An Introduction to neural networks. Prentice Hall.

Chakraborty, S., Ghosh. R, Ghosh, M. , Fernandes, C.D. and Charchar, M.J. (2004). Weather-

based prediction of anthracnose severity using artificial neural network models. Plant

Pathology, 53, 375-386.

Cheng, B. and Titterington, D. M. (1994). Neural networks: A review from a statistical

perspective. Statistical Science, 9, 2-54.

161

14: Artificial Neural Networks

162

Dewolf, E.D., and Francl, L.J., (1997). Neural network that distinguish in period of wheat tan

spot in an outdoor environment. Phytopathalogy, 87(1) pp 83-87.

Dewolf, E.D. and Francl, L.J. (2000) Neural network classification of tan spot and

stagonespore blotch infection period in wheat field environment. Phytopathalogy,

20(2), 108-113 .

Gaudart, J. Giusiano, B. and Huiart, L. (2004). Comparison of the performance of multi-layer

perceptron and linear regression for epidemiological data. Comput. Statist. & Data

Anal., 44, 547-70.

Hassoun, M. H. (1995). Fundamentals of Artificial Neural Networks. Cambridge: MIT Press.

Hebb,D.O. (1949) The organization of behaviour: A Neuropsychological Theory, Wiley,

New York

Hopfield, J.J. (1982). Neural network and physical system with emergent collective

computational capabilities. In proceeding of the National Academy of Science(USA)

79, 2554-2558.

Kaastra, I. and Boyd, M.(1996): Designing a neural network for forecasting financial and

economic time series. Neurocomputing, 10(3), pp 215-236 (1996)

Klimasauskas, C.C. (1991). Applying neural networks. Part 3: Training a neural network, PC-

AI, May/ June, 20–24.

Konstantinos, A. (2000). Application of Back Propagation Learning Algorithms on

Multilayer Perceptrons, Project Report, Department of Computing, University of

Bradford, England.

Mcculloch, W.S. and Pitts, W. (1943) A logical calculus of the ideas immanent in nervous

activity. Bull. Math. Biophy., 5, 115-133

Patterson, D. (1996). Artificial Neural Networks. Singapore: Prentice Hall.

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage ang

organization in the brain. Psychological review, 65, 86-408.

Rumelhart, D.E., Hinton, G.E and Williams, R.J. (1986). “Learning internal representation by

error propagation”, in Parallel distributed processing: Exploration in microstructure

of cognition, Vol. (1) ( D.E. Rumelhart, J.L. McClelland and the PDP research

gropus, edn.) Cambridge, MA: MIT Press, 318-362.

Saanzogni, Louis and Kerr, Don (2001) Milk production estimate using feed forward

artificial neural networks. Computer and Electronics in Agriculture, 32, 21-30.

Schalkoff, R. J. (1997). Artificial neural networks. The Mc Graw-Hall

Warner, B. and Misra, M. (1996). Understanding neural networks as statistical tools.

American Statistician, 50, 284-93.

Widrow, B. and Hoff, M.E. (1960). Adapative switching circuit. IREWESCON convention

record, 4, 96-104

Yegnanarayana, B. (1999). Artificial Neural Networks. Prentice Hall

Zhang, G., Patuwo, B. E. and Hu, M. Y. (1998). Forecasting with artificial neural networks:

The state of the art. International Journal of Forecasting,14, 35-62.

## Comments 0

Log in to post a comment