0925-2312/00/$- see front matter ( 2000 Elsevier Science B.V.All rights reserved.

PII:S 0 9 2 5 - 2 3 1 2 ( 9 9 ) 0 0 1 4 6 - 0

Neurocomputing 31 (2000) 1}13

Variable selection using neural-network models

Giovanna Castellano*,Anna Maria Fanelli

Dipartimento di Informatica,Universita% di Bari,Via E.Orabona,4-70126 Bari,Italy

Received 3 April 1998;accepted 22 March 1999

Abstract

In this paper we propose an approach to variable selection that uses a neural-network model

as the tool to determine which variables are to be discarded.The method performs a backward

selection by successively removing input nodes in a network trained with the complete set of

variables as inputs.Input nodes are removed,along with their connections,and remaining

weights are adjusted in such a way that the overall input}output behavior learnt by the network

is kept approximately unchanged.A simple criterion to select input nodes to be removed is

developed.The proposed method is tested on a famous example of system identi"cation.

Experimental results show that the removal of input nodes from the neural network model

improves its generalization ability.In addition,the method compares favorably with respect to

other feature reduction methods.( 2000 Elsevier Science B.V.All rights reserved.

Keywords:Variable selection;Neural network pruning;Least-squares methods;Principal

component analysis

1.Introduction

A crucial issue in many problems of pattern recognition or system identi"cation is

reducing the amount of data to process,which is often a key factor in determining the

performance of the information processing system.The problem of data reduction,

also termed`feature reductiona,is de"ned as follows:given a set of available features,

select a subset of features that retain most of the intrinsic information content of data.

There are two di!erent approaches to achieve feature reduction:feature extraction and

feature selection.Feature extraction,linearly or non linearly,transforms the original

*Corresponding author.

E-mail address:castellano@di.uniba.it (G.Castellano)

set of features into a reduced one.Well-known feature extraction methods are

Principal Component Analysis (PCA) and Discriminant Analysis [9].On the other

hand,feature selection,also referred to as variable selection,selects a subset of features

from the initial set of available features.

A number of di!erent methods have been proposed to approach the optimal

solution to feature selection [10].Signi"cant contributions have come from statisti-

cians in the"eld of Pattern Recognition,ranging from techniques that"nd the

optimal feature set (e.g.Exhaustive search or Branch and Bound algorithm[20]) and

those that result in a sub-optimal feature set that is near to the optimal solution

[13,22].More recently,some variable selection methods for arti"cial neural networks

have been developed [1,16,4].However,no optimal and generally applicable solution

to the feature selection problemexists:some methods are more suitable under certain

conditions and some under others,depending on the degree of knowledge about the

problem at hand.When the only source of available information is provided by

training data,the feature selection task can be well performed using a neural ap-

proach.In fact neural networks do not make any assumption about probability

distribution functions of data,thus relieving the restricted formal conditions of the

statistical approach.

This paper is concerned with the problem of variable or feature selection using

arti"cial neural networks.In this context,variable selection can be seen as a special

case of network pruning.The pruning of input nodes is equivalent to removing the

corresponding features from the original feature set.Several pruning procedures for

neural networks have been proposed [23],but most of them focus on removing

hidden nodes or connections,and they are not directly applicable to prune irrelevant

input nodes.Pruning procedures extended to the removal of input nodes were

proposed in [8,12,14,17,18,24],where the variable selection process is typically based

on a measure of the relevance of an input node,so that the less relevant features are

removed.However,most of these techniques evaluate the relevance of input nodes

during the training process,thus they strictly depend on the adopted learning

algorithm.

We propose a variable selection method based on an algorithm,that we developed

for pruning hidden nodes in neural networks [5}7].The method performs a backward

feature selection by successively removing input nodes (along with their connections)

in a satisfactorily trained network and adjusting the remaining weights in such a way

that the overall input}output behavior learnt by the network is kept approximately

unchanged.This condition leads to the formulation of a linear systemthat is solved in

the least-squares sense by means of a very e$cient preconditioned conjugate gradient

procedure.The criterion for choosing the features to be removed is derived from

a property of the particular least-squares method employed.This procedure is

repeated until the desired trade-o!between accuracy and parsimony of the network is

achieved.

Unlike most variable selection methods,that remove all useless features in one step,

our algorithm removes features iteratively,thus enabling a systematic evaluation of

the reduced network models produced during the progressive elimination of features.

Therefore,the number of input nodes (i.e.the"nal number of features) is determined

2 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13

just according to the performance required to the network,without making a priori

assumptions or evaluations about the importance of the input variables.This gives

more#exibility to the variable selection algorithm that can be iterated until either

a predetermined number of features have been eliminated or the performance of the

current reduced network falls below speci"ed requirements.Moreover,the method

does not depend on the learning procedure,since it removes input nodes after the

training phase.

The paper is organized as follows:Section 2 introduces notations and de"nitions of

the neural network,in Section 3 the proposed variable selection algorithm is de-

scribed,while in Section 4 experimental results are given.Finally,in Section 5 some

conclusions about the method are drawn.

2.The neural network

A neural network of arbitrary topology can be represented by a directed graph

N"(<,E,w).<is the set of nodes,which is divided into the subset <

I

of input nodes,

the subset <

H

of hidden nodes and the subset <

O

of output nodes.E-<]<is the set

of connections.Each connection (j,i)3E is associated with a weight w

ij

3R.For each

unit i3<,let us de"ne its`projectivea"eld P

i

"M j3<D (i,j)3EN and its`receptivea

Fig.1.An example of a neural network architecture.Here,the set of input units is <

I

"M1,2,3,4N,the set of

hidden units is <

H

"M5,6,7N,and the set of output units is <

O

"M8,9,10N.As an example the receptive"eld

of unit 7 is R

7

"M1,3,5N (light-gray units) whereas its projective"eld is P

7

"M8,10N (dark-gray units).

G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 3

"eld R

i

"Mj3<D (j,i)3EN.We denote by p

i

and r

i

the cardinality of P

i

and R

i

,

respectively.

Every non-input node i3<

H

X<

O

receives from its receptive"eld R

i

a net input

given by

u

i

"+

j|R

i

w

ij

y

j

,

where y

j

represents the output value of node j,and sends to its projective"eld P

i

an

output equal to

y

i

"f (u

i

),

where f is an arbitrary activation function.

No computation is done by input nodes:they just transmit an n-dimensional input

pattern x"(x

1

,

2

,x

n

).

Thus,the output of input node h3<

I

is the hth feature x

h

of the input pattern.

Fig.1 shows an example of the network architecture and illustrates the notations

introduced above.

3.The variable selection algorithm

In this section we describe our variable selection algorithm,that is an extension of

an iterative pruning method that we previously developed for complexity reduction

(i.e.hidden layer size) in neural networks of arbitrary topology [7].To select input

variables,we performan iterative backward selection search which begins with the set

of original n features and eliminates one feature at each step.Given a trained

network with the complete set of features as its inputs,the elimination of a feature is

equivalent to prune o!the corresponding input node along with its outgoing

connections.It is assumed that training is carried out over a set of M training

patterns xm"(x

m

1

,

2

,x

m

n

),m"1,

2

,M by means of any learning procedure.

The process of feature elimination during one step of the algorithm is outlined

below.

Suppose that an input node h3<

I

has been identi"ed to be removed (elimination

criterion will be discussed below).The elimination of h involves removing all its

outgoing connections (see Fig.2) and updating the remaining weights incoming into

h's projective"eld in such a way that the net input of every node i3P

h

remains

approximately unchanged.This amounts to requiring that the following relation

holds:

+

j|R

i

w

ij

x

m

j

"+

j|R

i

~

M

h

N

(w

ij

#d

ij

)x

m

j

(1)

for each node i3P

h

and for each training pattern x

m

,m"1,

2

,M.

4 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13

Fig.2.Illustration of a step of the variable selection algorithm.Unit 3 has been selected to be removed.

After the elimination all its outgoing connections (dashed lines) are excised and the weights of connections

incoming into its projective"eld (bold lines) are adjusted.

The quantities d

ij

's are the adjusting factors for the weights w

ij

's.Simple algebraic

manipulations yield the following linear system:

+

j|R

i

~

M

h

N

d

ij

x

m

j

"w

ih

x

m

h

.(2)

Typically system (2),which can be conveniently represented as Ad"b,is overdeter-

mined (it contains Mp

h

linear equations in +

i|P

h

(r

i

!1) unknowns d

ij

's),thus it can be

solved by means of a standard least-square method.We chose a preconditioned

conjugate-gradient method called CGPCNE [2],because it is fast and provides good

least-squares solutions.

3.1.The elimination criterion

The criterion for identifying the input node h to be removed at each step has been

suggested by a property of the adopted least-squares method.The CGPCNE solves

a linear system by starting from an initial solution d

0

and iteratively producing

a sequence of solutions Md

k

N

k/1,2,

2

so as to decrease the residuals o

k

"DDAd

k

!b

k

DD.If

the initial residual o

0

is minimum,the convergence of CGPCNE method is faster.

G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 5

Since the initial point d

0

is usually chosen to be the null vector,minimizing the initial

residual amounts to minimize the norm of vector b.

In system(2) the vector of known terms Mw

ih

x

m

h

N

i|P

h

,m/1,

2

,M

depends essentially on

the node h being removed,therefore our idea is to select the node for which the norm

of such vector is minimum.Precisely,we adopted the following criterion:

remove,at each step,the input node h3<

I

such that the known term vector

Mw

ih

x

m

h

N

i|P

h

,m/1,

2

,M

is smallest,i.e.such that the quantity +

i|P

h

,m/1,

2

,M

(w

ih

x

m

h

)

2

is

minimum.

Moreover,to prevent the algorithm from producing`uselessa nodes,the selected

node h should also satisfy the following condition:R

i

!MhNO0 for each i3P

h

.

The proposed criterion could not provide the globally optimal choice,because there

is no guarantee that starting fromthe smallest initial residual will result in the smallest

"nal residual.Moreover,this criterion neglects possible correlations between input

variables.To avoid this problem,combined criteria which take into account some

correlation measure could well be employed,without altering the nature of the

algorithm.Otherwise,we could"rst orthogonalize the inputs (for example by a KL

transform) and then apply the variable selection algorithm using the transformed

inputs.The pruning would be performed on input eigennodes rather than the actual

input variables.A similar approach has been explored in [15].

We adopt the above-de"ned elimination criterion because it provides good results

in practice without requiring high computational cost.Additionally,the proposed

selection criterion has an interesting interpretation.In [19],in an attempt to identify

the goodness of a hidden node,a quantity similar to +

i|P

h

,/1,

2

,M

(w

ih

x

m

h

)

2

was

independently de"ned to represent the total signal propagated forwardly by the node.

Therefore,our criterion can be also interpreted as a criterion that removes input

nodes having the smallest total amount of feedforwardly propagated signals.

3.2.Dexnition of the algorithm

Now,we de"ne precisely the variable selection algorithm.Starting with an initial

trained network N

(0)

"(<

(0)

,E

(0)

,w

(0)

) having the complete set of features as set of

input nodes <

(0)

I

,the algorithm iteratively produces a sequence of networks

MN

(t)

N

t/1,2,...

with smaller and smaller set of input nodes <

(t)

I

,by"rst identifying the

node h3<

I

to be removed (according to the above-mentioned criterion),and then

solving the corresponding system (2) to properly adjust the remaining weights.The

process is iterated until a stopping condition (as discussed below) is satis"ed.Fig.3

summarizes the algorithm.

The iterative nature of our variable selection algorithmallows a systematic invest-

igation of the performance of reduced network models with fewer input nodes.At each

step the performance of the network with reduced set of variables as inputs can be

compared with that of the network with the whole set of variables as input,and it can

be retained as a potential"nal model according to the desired performance.Hence,

di!erent stopping conditions can be de"ned according to the speci"c performance

measure adopted.For example,if the application at hand requires to select the

relevant input variables while keeping the performance of the initial network over the

6 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13

Fig.3.The variable selection algorithm.

training data,the algorithm will be stopped as soon as such performance worsens

signi"cantly.As well,if a good generalization ability is requested,a stopping condition

that takes into account the performance of the network with the selected input

variables over the test set can be used,regardless of the behavior over the training data

(see stopping condition de"ned in Section 4).Also,the algorithmcan be stopped after

a predetermined number of features have been eliminated.

The proposed algorithm is computationally cheap,as the overall computational

cost of each iteration depends mainly on the solution of system (2) which requires

a number of operations per CGPCNE-step roughly proportional to M) e,being e the

total number of connections in the network.Besides,the number of cycles performed

by the CGPCNE procedure is typically very low [3],so that the overall computa-

tional complexity of each step of the algorithm is O(M) e).Moreover,after the

variable selection process,the"nal network need not be retrained because the

updating of the weights is embedded in the algorithm itself.

4.Experimental results

Our variable selection method was tested on a well-known problem of system

identi"cation given by Box and Jenkins [3] and used as test problemby many authors

[11,25].The process to be modeled is a gas furnace with gas#ow rate u(t) as single

input and CO

2

concentration y(t) as output.The original data set representing the

dynamics of this systemcontains 296 pairs of the form[u(t);y(t)] (see [25]).In order to

extract a dynamic process model to predict y(t) we use a feedforward neural network

with the ten variables y(t!1),

2

,y(t!4),u(t!1),

2

,u(t!6) as candidate inputs.

This requires a transformation of the original data set into a new data set containing

290 data points of the form [y(t!1),

2

,y(t!4),u(t!1),

2

,u(t!6);y(t)].In the

experiments,this data set is divided into a training set,composed of the"rst 145 data,

and a test set containing the remaining 145 points.

G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 7

Fig.4.Performance of the trained networks with di!erent number of hidden nodes.

Fig.5.Performance of a 10-4-1 network (N2) during the removal of input nodes.For each step the removed

feature is indicated at the top of the corresponding point.

Networks with 10 input nodes,one hidden layer and one output node were

considered.In order to de"ne the size of the hidden layer,we trained networks with 2,

3,4,and 5 hidden nodes.A Back Propagation training algorithm was used,starting

frominitial weights uniformly distributed in [!1.0,1.0].In all the trials the training

was stopped when all the training points were learnt,that is when,for each training

point x

m

,the following condition was met:

Do

m

!t

m

D4e,

where o

m

represents the output of the network and t

m

is the desired output.In our

experiments we chose e"0.01.Fig.4 shows the performance of the trained networks

in terms of RMSE over both the training and the test set.As it can be seen,the

network with four hidden nodes produced slightly better results,hence we decided to

adopt a 10-4-1 network architecture.

In order to evaluate the e!ects of our variable selection method,the algorithmwas

applied to three 10-4-1 trained networks,corresponding to three di!erent initial

weight con"gurations (in the following labeled with N1,N2,N3).As an illustrative

example,Fig.5 shows the RMSE of one (N2) of the three networks during the input

8 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13

Table 1

Results of the variable selection algorithm stopped according to the performance over the test set

Network RMSE on the test set Final input variables

With complete With reduced

set of input nodes set of input nodes

N1 0.4516 0.2365 y(t!1),y(t!4),u(t!2),u(t!5)

N2 0.1240 0.0687 y(t!1),u(t!4),u(t!6)

N3 0.1224 0.0787 u(t!3),u(t!4),u(t!6)

node selection process.In this case the algorithmwas iterated until all the input nodes

but one were removed.It can be noted that the network performance over the training

set is kept almost unchanged until three nodes are left in the input layer.More

important,the RMSE over the test set decreases during the feature reduction process

(even with the only feature u(t!6) as input,the RMSE decreases to 0.0804 from an

initial value of 0.124 with all 10 inputs).

Since we care about the generalizationability of the"nal reduced model,we de"ned

a stopping condition for the variable selection algorithm aiming at improving the

performance of the original network over the test set.Precisely,after the removal of an

input node,we checked the RMSE of the network over the test set and,as

soon as an increment of 0.1 or more was observed,the elimination process was

stopped.The results of the feature selection algorithm with such a stopping

condition are presented in Table 1,where the generalization of both the networks

with all input variables and the networks with reduced input variables

is shown,together with the"nal subsets of input variables.As it can

be seen,in each case our procedure is able to drastically reduce the number of

input features while decreasing the RMSE over the test set.Thus,once an appro-

priate stopping condition is de"ned,our method is able to reduce drastically

the number of input features and to improve generalization (about 50% in all

cases) without worsening excessively the behavior learnt by the network during

training.

Finally,two other feature reduction methods were considered to evaluate the

performance of the proposed algorithm.First,we compared our method with an

empirical input selection method proposed in [11] that uses the ANFIS-network

model to"nd the signi"cant input features for the Box and Jenkins gas furnace

problem.The input selection process is based on the simplistic assumption that two

inputs can be used,containing elements from both the set of historical inputs

u(t!1),

2

,u(t!6) and the set of historical outputs y(t!1),

2

,y(t!4).Therefore

the variable selection is made empirically:24 ANFIS networks } one for each input

combination (u(t),y(t)) } are built and trained by one epoch of an LS method,using

our same training and test sets.Then,the best performing network is furtherly trained

and taken as the"nal model with reduced input features.The"nal ANFIS network

G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 9

Table 2

Eigen values of the PCA method

Number of features Training set Testing set

Eigenvalue Cumul.eigenvalue Eigenvalue Cumul.eigenvalue

1 7.132918 7.13292 2.562015 2.56202

2 0.733853 7.86677 1.398529 3.96054

3 0.627533 8.49430 0.990468 4.95101

4 0.564570 9.05887 0.893866 5.84488

5 0.450442 9.50932 0.814355 6.65923

6 0.170958 9.68027 0.793716 7.45295

7 0.101631 9.78190 0.777535 8.23048

8 0.076483 9.85839 0.665687 8.89617

9 0.072200 9.93059 0.571943 9.46811

10 0.069413 10.00000 0.531886 10.00000

obtained in this way corresponds to the set of features (y(t!1),u(t!3)) and has an

RMSE of 0.13 over the training set and of 0.57 over the test set.Hence it performs

poorly in comparison to each of our three"nal networks (see Table 1) which exhibit

a slightly larger subset of input variables.

Then,we conducted additional tests using a popular feature reduction method such

as Principal Component Analysis (PCA).Indeed,fair empirical comparison of our

method with PCA is not directly applicable,since we are not considering feature

transformations like the PCA method,but only the selection of a subset of signi"cant

features fromthe original feature set.This leads to the need of using the same number

of features selected by our algorithm when evaluating the PCA method for the

problemat hand.Moreover,since we are facing with the problemof systemidenti"ca-

tion,the e!ect of reducing the number of features through the PCAmust be measured

in terms of RMSE,that is computed as follows.Dimensionality reduction through the

PCA is performed by discarding components of the transformed feature set that have

small variances (i.e.small eigenvalues) and retaining only those terms that have large

variances (i.e.large eigenvalues).This truncation causes an MSE equal to the sum of

the variances of the discarded terms,which is computed as the sum of the smallest

eigenvalues [21].Speci"cally,to compare the results of PCA method with our

method,that selects three features in the outstanding case (see network N2),the

original feature set was approximated by truncating the transformed feature set to the

"rst three components.Then the MSE was computed as the sum of the smallest

remaining seven eigenvalues.Table 2 shows the eigenvalues of the PCA method for

both the training and testing data set,along with accumulation of eigenvalues.Table 3

lists the RMSEfor both the training and testing set when the original features,features

selected by our method and features transformed by PCAare used.It can be seen that

the performance of the proposed method compares very favorably with that of the

PCA method.

10 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13

Table 3

Comparison of our feature selection method and the PCA method in terms of

RMSE

Original features Reduced features (3)

Our method PCA

Training set 0.040 0.051 1.227

Testing set 0.124 0.068 2.247

5.Conclusions

A method of variable selection using neural networks has been proposed.The key

idea consists of iteratively removing input nodes with associated weights in a trained

network and then adjusting the remaining weights so as to preserve the overall

network behavior.This leads to formulate the selection problemin terms of de"ning

a system of linear equations that we solve with a very e$cient conjugate gradient

least-squares procedure.Asimple and e!ective criterion for identifying input nodes to

be removed is also derived which does not require high computational cost and

proves to work well in practice.However,alternative selection rules can be adopted as

well,without altering the method as a whole.The iterative nature of the algorithm

allows monitoring the performance of the network with reduced input variables

obtained at each stage of the input elimination process,in order to de"ne an

appropriate stopping condition.This makes the algorithm very#exible.

Moreover,unlike most variable selection procedures existing in literature,

no parameter needs to be set and no relevance measure of variables must be

introduced.The validity of the method was examined through a systemidenti"cation

problem and a comparison study was made with other feature reduction methods.

Experimental results encourage the application of the proposed method to complex

tasks that need to identify a core of signi"cant input variables,such as pattern

recognition problems.

Acknowledgements

The authors wish to thank the anonymous reviewers for their helpful suggestions

and comments.

References

[1] R.Battiti,Using mutual information for selecting features in supervised neural net learning,IEEE

Trans.Neural Networks 5 (4) (1994) 537}550.

G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 11

[2] BjoKrck,T.Elfving,Accelerated projection methods for computing pseudoinverse solutions of

systems of linear equations,BIT 19 (1979) 145}163.

[3] G.Box,G.Jenkins,Time Series Analysis,Forecasting and Control,Holden Day,San Francisco,CA,

1970,pp.532}533.

[4] F.Z.Brill,D.E.Brown,W.N.Martin,Fast genetic selection of features for neural-network classi"ers,

IEEE Trans.Neural Networks 3 (March 1992) 324}328.

[5] G.Castellano,A.M.Fanelli,M.Pelillo,Pruning in recurrent neural networks,Proceedings

of the International Conference on Arti"cial Neural Networks,Sorrento,Italy,May 1994,

pp.451}454.

[6] G.Castellano,A.M.Fanelli,M.Pelillo,Iterative pruning in second-order recurrent neural networks,

Neural Process.Lett.2 (6) (1995) 5}8.

[7] G.Castellano,A.M.Fanelli,M.Pelillo,An iterative method for pruning feed-forward neural

networks,IEEE Trans.Neural Networks 8 (3) (May 1997) 519}531.

[8] T.Cibas,F.F.Soulie`,P.Gallinari,S.Raudys,Variable selection with Optimal Cell Damage,

Proceedings of the International Conference on Arti"cial Neural Networks,Sorrento,Italy,May

1994,pp.727}730.

[9] K.Fukunaga,Introduction to Statistical Pattern Recognition,Academic Press,San Diego,CA,1990.

[10] A.Jain,D.Zongker,Feature selection:evaluation,application,and small sample performance,IEEE

Trans.Pattern Anal.Mech.Intell.19 (2) (1997) 153}158.

[11] J.R.Jang,Input selection for ANFIS learning,Proceedings of the 5th IEEE International Conference

on Fuzzy Systems,New Orleans,September 1996,pp.1493}1499.

[12] E.D.Karnin,A simple procedure for pruning back-propagation trained neural networks,IEEE

Trans.Neural Networks 1 (1990) 239}242.

[13] J.Kittler,Feature selection and extraction in:T.Y.Young,K.S.Fu (Eds.),Handbook of Pattern

Recognition and Image Processing,Academic Press,New York,1986,pp.60}81.

[14] Y.Le Cun et al.,Optimal brain damage,in:D.S.Touretzky (Ed.),Neural Information Processing

Systems II,Morgan Kaufmann,San Mateo,CA,1990,pp.598}605.

[15] A.U.Levin,T.K.Leen,J.E.Moody,Fast pruning using principal components,in:J.Cowan,G.

Tesauro,J.Alspector (Eds.),Advances in Neural Information Processing System VI,Morgan

Kaufmann Publishers,San Francisco,CA,1994.

[16] J.Mao,A.K.Jain,Arti"cial neural networks for feature extraction and multivariate data projection,

IEEE Trans.Neural Networks 6 (1995) 296}317.

[17] J.Mao,K.Mohiudden,A.K.Jain,Parsimonious network design and feature selection through node

pruning,Proceedings of the 12th International Conference on Pattern Recognition,Jerusalem,1994,

pp.622}624.

[18] M.C.Mozer,P.Smolensky,Skeletonization:A technique for trimming the fat from a network via

relevance assessment,in:D.S.Touretzky (Ed.),Advances in Neural Information Processing Systems I,

Morgan Kaufmann,San Mateo,CA,1990.

[19] K.Murase,Y.Matsunaga,Y.Nakade,Abackpropagation algorithmwhich automatically determines

the number of association units,Proceeding of the International Journal Conference on Neural

Networks,Singapore,1991,pp.783}788.

[20] P.M.Narendra,K.Fukunaga,A branch and bound algorithm for feature subset selection,IEEE

Trans.Comput.26(9) (September 1997) 917}922.

[21] E.Oja,Subspace Methods of Pattern Recognition,Research Studies Press Ltd.,Letchworth,England,

1983.

[22] P.Pudil,J.Novovicova,J.Kittler,Floating search methods in feature selection,Pattern Recognition

Lett.15 (1994) 1119}1125.

[23] R.Reed,Pruning algorithms } a survey,IEEE Trans.Neural Networks 5 (1993) 740}747.

[24] J.M.Stepps,K.W.Bauer,Improved feature screening in feedforward neural networks,Neurocomput-

ing 13 (1996) 47}58.

[25] M.Sugeno,T.Yasukawa,A fuzzy-logic-based approach to qualitative modeling,IEEE Trans.Fuzzy

Systems 1 (1) (1993) 7}31.

12 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13

Giovanna Castellano received the`Laureaa degree in Computer Science from the

University of Bari,Italy,in 1993.From1993 to 1995 she was a fellowresearcher at

the Institute for Signal and Image Processing (CNR-Bari) with a scholarship

under a grant from the`Consiglio Nazionale delle Ricerchea.Currently,she is

attending the Ph.D.in Computer Science at the Computer Science Department of

the University of Bari.Her research interests include arti"cial neural networks,

fuzzy systems,neuro-fuzzy modeling,intelligent hybrid systems.

Anna Maria Fanelli received the`Laureaa degree in Physics fromthe University of

Bari,Italy,in 1974.From 1975 to 1979,she was a full time researcher at the

Physics Department of the University of Bari,Italy,where she became an Assist-

ant Professor in 1980.In 1985 she joined the Department of Computer Science at

the University of Bari,Italy,as Professor of Computer Science.Currently,she is

responsible of the courses`Computer Systems Architecturesa and`Neural Net-

worksa at the degree course in Computer Science.Her research activity has

involved issues related to pattern recognition,image processing and computer

vision.Her work in these areas has been published in several journals and

conference proceedings.Her current research interests include arti"cial neural

networks,genetic algorithms,fuzzy systems,neuro-fuzzy modeling and hybrid

systems.Dr.Fanelli is a member of the IEEE Society,the International Neural

Network Society and AI

*

IA (Italian Association for Arti"cial Intelligence).She is in the editorial board of

the International Journal of Knowledge-Based Intelligent Engineering Systems.

G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 13

## Σχόλια 0

Συνδεθείτε για να κοινοποιήσετε σχόλιο