Variable selection using neural-network models

prudencewooshΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

101 εμφανίσεις

0925-2312/00/$- see front matter ( 2000 Elsevier Science B.V.All rights reserved.
PII:S 0 9 2 5 - 2 3 1 2 ( 9 9 ) 0 0 1 4 6 - 0
Neurocomputing 31 (2000) 1}13
Variable selection using neural-network models
Giovanna Castellano*,Anna Maria Fanelli
Dipartimento di Informatica,Universita% di Bari,Via E.Orabona,4-70126 Bari,Italy
Received 3 April 1998;accepted 22 March 1999
Abstract
In this paper we propose an approach to variable selection that uses a neural-network model
as the tool to determine which variables are to be discarded.The method performs a backward
selection by successively removing input nodes in a network trained with the complete set of
variables as inputs.Input nodes are removed,along with their connections,and remaining
weights are adjusted in such a way that the overall input}output behavior learnt by the network
is kept approximately unchanged.A simple criterion to select input nodes to be removed is
developed.The proposed method is tested on a famous example of system identi"cation.
Experimental results show that the removal of input nodes from the neural network model
improves its generalization ability.In addition,the method compares favorably with respect to
other feature reduction methods.( 2000 Elsevier Science B.V.All rights reserved.
Keywords:Variable selection;Neural network pruning;Least-squares methods;Principal
component analysis
1.Introduction
A crucial issue in many problems of pattern recognition or system identi"cation is
reducing the amount of data to process,which is often a key factor in determining the
performance of the information processing system.The problem of data reduction,
also termed`feature reductiona,is de"ned as follows:given a set of available features,
select a subset of features that retain most of the intrinsic information content of data.
There are two di!erent approaches to achieve feature reduction:feature extraction and
feature selection.Feature extraction,linearly or non linearly,transforms the original
*Corresponding author.
E-mail address:castellano@di.uniba.it (G.Castellano)
set of features into a reduced one.Well-known feature extraction methods are
Principal Component Analysis (PCA) and Discriminant Analysis [9].On the other
hand,feature selection,also referred to as variable selection,selects a subset of features
from the initial set of available features.
A number of di!erent methods have been proposed to approach the optimal
solution to feature selection [10].Signi"cant contributions have come from statisti-
cians in the"eld of Pattern Recognition,ranging from techniques that"nd the
optimal feature set (e.g.Exhaustive search or Branch and Bound algorithm[20]) and
those that result in a sub-optimal feature set that is near to the optimal solution
[13,22].More recently,some variable selection methods for arti"cial neural networks
have been developed [1,16,4].However,no optimal and generally applicable solution
to the feature selection problemexists:some methods are more suitable under certain
conditions and some under others,depending on the degree of knowledge about the
problem at hand.When the only source of available information is provided by
training data,the feature selection task can be well performed using a neural ap-
proach.In fact neural networks do not make any assumption about probability
distribution functions of data,thus relieving the restricted formal conditions of the
statistical approach.
This paper is concerned with the problem of variable or feature selection using
arti"cial neural networks.In this context,variable selection can be seen as a special
case of network pruning.The pruning of input nodes is equivalent to removing the
corresponding features from the original feature set.Several pruning procedures for
neural networks have been proposed [23],but most of them focus on removing
hidden nodes or connections,and they are not directly applicable to prune irrelevant
input nodes.Pruning procedures extended to the removal of input nodes were
proposed in [8,12,14,17,18,24],where the variable selection process is typically based
on a measure of the relevance of an input node,so that the less relevant features are
removed.However,most of these techniques evaluate the relevance of input nodes
during the training process,thus they strictly depend on the adopted learning
algorithm.
We propose a variable selection method based on an algorithm,that we developed
for pruning hidden nodes in neural networks [5}7].The method performs a backward
feature selection by successively removing input nodes (along with their connections)
in a satisfactorily trained network and adjusting the remaining weights in such a way
that the overall input}output behavior learnt by the network is kept approximately
unchanged.This condition leads to the formulation of a linear systemthat is solved in
the least-squares sense by means of a very e$cient preconditioned conjugate gradient
procedure.The criterion for choosing the features to be removed is derived from
a property of the particular least-squares method employed.This procedure is
repeated until the desired trade-o!between accuracy and parsimony of the network is
achieved.
Unlike most variable selection methods,that remove all useless features in one step,
our algorithm removes features iteratively,thus enabling a systematic evaluation of
the reduced network models produced during the progressive elimination of features.
Therefore,the number of input nodes (i.e.the"nal number of features) is determined
2 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13
just according to the performance required to the network,without making a priori
assumptions or evaluations about the importance of the input variables.This gives
more#exibility to the variable selection algorithm that can be iterated until either
a predetermined number of features have been eliminated or the performance of the
current reduced network falls below speci"ed requirements.Moreover,the method
does not depend on the learning procedure,since it removes input nodes after the
training phase.
The paper is organized as follows:Section 2 introduces notations and de"nitions of
the neural network,in Section 3 the proposed variable selection algorithm is de-
scribed,while in Section 4 experimental results are given.Finally,in Section 5 some
conclusions about the method are drawn.
2.The neural network
A neural network of arbitrary topology can be represented by a directed graph
N"(<,E,w).<is the set of nodes,which is divided into the subset <
I
of input nodes,
the subset <
H
of hidden nodes and the subset <
O
of output nodes.E-<]<is the set
of connections.Each connection (j,i)3E is associated with a weight w
ij
3R.For each
unit i3<,let us de"ne its`projectivea"eld P
i
"M j3<D (i,j)3EN and its`receptivea
Fig.1.An example of a neural network architecture.Here,the set of input units is <
I
"M1,2,3,4N,the set of
hidden units is <
H
"M5,6,7N,and the set of output units is <
O
"M8,9,10N.As an example the receptive"eld
of unit 7 is R
7
"M1,3,5N (light-gray units) whereas its projective"eld is P
7
"M8,10N (dark-gray units).
G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 3
"eld R
i
"Mj3<D (j,i)3EN.We denote by p
i
and r
i
the cardinality of P
i
and R
i
,
respectively.
Every non-input node i3<
H
X<
O
receives from its receptive"eld R
i
a net input
given by
u
i
"+
j|R
i
w
ij
y
j
,
where y
j
represents the output value of node j,and sends to its projective"eld P
i
an
output equal to
y
i
"f (u
i
),
where f is an arbitrary activation function.
No computation is done by input nodes:they just transmit an n-dimensional input
pattern x"(x
1
,
2
,x
n
).
Thus,the output of input node h3<
I
is the hth feature x
h
of the input pattern.
Fig.1 shows an example of the network architecture and illustrates the notations
introduced above.
3.The variable selection algorithm
In this section we describe our variable selection algorithm,that is an extension of
an iterative pruning method that we previously developed for complexity reduction
(i.e.hidden layer size) in neural networks of arbitrary topology [7].To select input
variables,we performan iterative backward selection search which begins with the set
of original n features and eliminates one feature at each step.Given a trained
network with the complete set of features as its inputs,the elimination of a feature is
equivalent to prune o!the corresponding input node along with its outgoing
connections.It is assumed that training is carried out over a set of M training
patterns xm"(x
m
1
,
2
,x
m
n
),m"1,
2
,M by means of any learning procedure.
The process of feature elimination during one step of the algorithm is outlined
below.
Suppose that an input node h3<
I
has been identi"ed to be removed (elimination
criterion will be discussed below).The elimination of h involves removing all its
outgoing connections (see Fig.2) and updating the remaining weights incoming into
h's projective"eld in such a way that the net input of every node i3P
h
remains
approximately unchanged.This amounts to requiring that the following relation
holds:
+
j|R
i
w
ij
x
m
j
"+
j|R
i
~
M
h
N
(w
ij
#d
ij
)x
m
j
(1)
for each node i3P
h
and for each training pattern x
m
,m"1,
2
,M.
4 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13
Fig.2.Illustration of a step of the variable selection algorithm.Unit 3 has been selected to be removed.
After the elimination all its outgoing connections (dashed lines) are excised and the weights of connections
incoming into its projective"eld (bold lines) are adjusted.
The quantities d
ij
's are the adjusting factors for the weights w
ij
's.Simple algebraic
manipulations yield the following linear system:
+
j|R
i
~
M
h
N
d
ij
x
m
j
"w
ih
x
m
h
.(2)
Typically system (2),which can be conveniently represented as Ad"b,is overdeter-
mined (it contains Mp
h
linear equations in +
i|P
h
(r
i
!1) unknowns d
ij
's),thus it can be
solved by means of a standard least-square method.We chose a preconditioned
conjugate-gradient method called CGPCNE [2],because it is fast and provides good
least-squares solutions.
3.1.The elimination criterion
The criterion for identifying the input node h to be removed at each step has been
suggested by a property of the adopted least-squares method.The CGPCNE solves
a linear system by starting from an initial solution d
0
and iteratively producing
a sequence of solutions Md
k
N
k/1,2,
2
so as to decrease the residuals o
k
"DDAd
k
!b
k
DD.If
the initial residual o
0
is minimum,the convergence of CGPCNE method is faster.
G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 5
Since the initial point d
0
is usually chosen to be the null vector,minimizing the initial
residual amounts to minimize the norm of vector b.
In system(2) the vector of known terms Mw
ih
x
m
h
N
i|P
h
,m/1,
2
,M
depends essentially on
the node h being removed,therefore our idea is to select the node for which the norm
of such vector is minimum.Precisely,we adopted the following criterion:
remove,at each step,the input node h3<
I
such that the known term vector
Mw
ih
x
m
h
N
i|P
h
,m/1,
2
,M
is smallest,i.e.such that the quantity +
i|P
h
,m/1,
2
,M
(w
ih
x
m
h
)
2
is
minimum.
Moreover,to prevent the algorithm from producing`uselessa nodes,the selected
node h should also satisfy the following condition:R
i
!MhNO0 for each i3P
h
.
The proposed criterion could not provide the globally optimal choice,because there
is no guarantee that starting fromthe smallest initial residual will result in the smallest
"nal residual.Moreover,this criterion neglects possible correlations between input
variables.To avoid this problem,combined criteria which take into account some
correlation measure could well be employed,without altering the nature of the
algorithm.Otherwise,we could"rst orthogonalize the inputs (for example by a KL
transform) and then apply the variable selection algorithm using the transformed
inputs.The pruning would be performed on input eigennodes rather than the actual
input variables.A similar approach has been explored in [15].
We adopt the above-de"ned elimination criterion because it provides good results
in practice without requiring high computational cost.Additionally,the proposed
selection criterion has an interesting interpretation.In [19],in an attempt to identify
the goodness of a hidden node,a quantity similar to +
i|P
h
,/1,
2
,M
(w
ih
x
m
h
)
2
was
independently de"ned to represent the total signal propagated forwardly by the node.
Therefore,our criterion can be also interpreted as a criterion that removes input
nodes having the smallest total amount of feedforwardly propagated signals.
3.2.Dexnition of the algorithm
Now,we de"ne precisely the variable selection algorithm.Starting with an initial
trained network N
(0)
"(<
(0)
,E
(0)
,w
(0)
) having the complete set of features as set of
input nodes <
(0)
I
,the algorithm iteratively produces a sequence of networks
MN
(t)
N
t/1,2,...
with smaller and smaller set of input nodes <
(t)
I
,by"rst identifying the
node h3<
I
to be removed (according to the above-mentioned criterion),and then
solving the corresponding system (2) to properly adjust the remaining weights.The
process is iterated until a stopping condition (as discussed below) is satis"ed.Fig.3
summarizes the algorithm.
The iterative nature of our variable selection algorithmallows a systematic invest-
igation of the performance of reduced network models with fewer input nodes.At each
step the performance of the network with reduced set of variables as inputs can be
compared with that of the network with the whole set of variables as input,and it can
be retained as a potential"nal model according to the desired performance.Hence,
di!erent stopping conditions can be de"ned according to the speci"c performance
measure adopted.For example,if the application at hand requires to select the
relevant input variables while keeping the performance of the initial network over the
6 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13
Fig.3.The variable selection algorithm.
training data,the algorithm will be stopped as soon as such performance worsens
signi"cantly.As well,if a good generalization ability is requested,a stopping condition
that takes into account the performance of the network with the selected input
variables over the test set can be used,regardless of the behavior over the training data
(see stopping condition de"ned in Section 4).Also,the algorithmcan be stopped after
a predetermined number of features have been eliminated.
The proposed algorithm is computationally cheap,as the overall computational
cost of each iteration depends mainly on the solution of system (2) which requires
a number of operations per CGPCNE-step roughly proportional to M) e,being e the
total number of connections in the network.Besides,the number of cycles performed
by the CGPCNE procedure is typically very low [3],so that the overall computa-
tional complexity of each step of the algorithm is O(M) e).Moreover,after the
variable selection process,the"nal network need not be retrained because the
updating of the weights is embedded in the algorithm itself.
4.Experimental results
Our variable selection method was tested on a well-known problem of system
identi"cation given by Box and Jenkins [3] and used as test problemby many authors
[11,25].The process to be modeled is a gas furnace with gas#ow rate u(t) as single
input and CO
2
concentration y(t) as output.The original data set representing the
dynamics of this systemcontains 296 pairs of the form[u(t);y(t)] (see [25]).In order to
extract a dynamic process model to predict y(t) we use a feedforward neural network
with the ten variables y(t!1),
2
,y(t!4),u(t!1),
2
,u(t!6) as candidate inputs.
This requires a transformation of the original data set into a new data set containing
290 data points of the form [y(t!1),
2
,y(t!4),u(t!1),
2
,u(t!6);y(t)].In the
experiments,this data set is divided into a training set,composed of the"rst 145 data,
and a test set containing the remaining 145 points.
G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 7
Fig.4.Performance of the trained networks with di!erent number of hidden nodes.
Fig.5.Performance of a 10-4-1 network (N2) during the removal of input nodes.For each step the removed
feature is indicated at the top of the corresponding point.
Networks with 10 input nodes,one hidden layer and one output node were
considered.In order to de"ne the size of the hidden layer,we trained networks with 2,
3,4,and 5 hidden nodes.A Back Propagation training algorithm was used,starting
frominitial weights uniformly distributed in [!1.0,1.0].In all the trials the training
was stopped when all the training points were learnt,that is when,for each training
point x
m
,the following condition was met:
Do
m
!t
m
D4e,
where o
m
represents the output of the network and t
m
is the desired output.In our
experiments we chose e"0.01.Fig.4 shows the performance of the trained networks
in terms of RMSE over both the training and the test set.As it can be seen,the
network with four hidden nodes produced slightly better results,hence we decided to
adopt a 10-4-1 network architecture.
In order to evaluate the e!ects of our variable selection method,the algorithmwas
applied to three 10-4-1 trained networks,corresponding to three di!erent initial
weight con"gurations (in the following labeled with N1,N2,N3).As an illustrative
example,Fig.5 shows the RMSE of one (N2) of the three networks during the input
8 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13
Table 1
Results of the variable selection algorithm stopped according to the performance over the test set
Network RMSE on the test set Final input variables
With complete With reduced
set of input nodes set of input nodes
N1 0.4516 0.2365 y(t!1),y(t!4),u(t!2),u(t!5)
N2 0.1240 0.0687 y(t!1),u(t!4),u(t!6)
N3 0.1224 0.0787 u(t!3),u(t!4),u(t!6)
node selection process.In this case the algorithmwas iterated until all the input nodes
but one were removed.It can be noted that the network performance over the training
set is kept almost unchanged until three nodes are left in the input layer.More
important,the RMSE over the test set decreases during the feature reduction process
(even with the only feature u(t!6) as input,the RMSE decreases to 0.0804 from an
initial value of 0.124 with all 10 inputs).
Since we care about the generalizationability of the"nal reduced model,we de"ned
a stopping condition for the variable selection algorithm aiming at improving the
performance of the original network over the test set.Precisely,after the removal of an
input node,we checked the RMSE of the network over the test set and,as
soon as an increment of 0.1 or more was observed,the elimination process was
stopped.The results of the feature selection algorithm with such a stopping
condition are presented in Table 1,where the generalization of both the networks
with all input variables and the networks with reduced input variables
is shown,together with the"nal subsets of input variables.As it can
be seen,in each case our procedure is able to drastically reduce the number of
input features while decreasing the RMSE over the test set.Thus,once an appro-
priate stopping condition is de"ned,our method is able to reduce drastically
the number of input features and to improve generalization (about 50% in all
cases) without worsening excessively the behavior learnt by the network during
training.
Finally,two other feature reduction methods were considered to evaluate the
performance of the proposed algorithm.First,we compared our method with an
empirical input selection method proposed in [11] that uses the ANFIS-network
model to"nd the signi"cant input features for the Box and Jenkins gas furnace
problem.The input selection process is based on the simplistic assumption that two
inputs can be used,containing elements from both the set of historical inputs
u(t!1),
2
,u(t!6) and the set of historical outputs y(t!1),
2
,y(t!4).Therefore
the variable selection is made empirically:24 ANFIS networks } one for each input
combination (u(t),y(t)) } are built and trained by one epoch of an LS method,using
our same training and test sets.Then,the best performing network is furtherly trained
and taken as the"nal model with reduced input features.The"nal ANFIS network
G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 9
Table 2
Eigen values of the PCA method
Number of features Training set Testing set
Eigenvalue Cumul.eigenvalue Eigenvalue Cumul.eigenvalue
1 7.132918 7.13292 2.562015 2.56202
2 0.733853 7.86677 1.398529 3.96054
3 0.627533 8.49430 0.990468 4.95101
4 0.564570 9.05887 0.893866 5.84488
5 0.450442 9.50932 0.814355 6.65923
6 0.170958 9.68027 0.793716 7.45295
7 0.101631 9.78190 0.777535 8.23048
8 0.076483 9.85839 0.665687 8.89617
9 0.072200 9.93059 0.571943 9.46811
10 0.069413 10.00000 0.531886 10.00000
obtained in this way corresponds to the set of features (y(t!1),u(t!3)) and has an
RMSE of 0.13 over the training set and of 0.57 over the test set.Hence it performs
poorly in comparison to each of our three"nal networks (see Table 1) which exhibit
a slightly larger subset of input variables.
Then,we conducted additional tests using a popular feature reduction method such
as Principal Component Analysis (PCA).Indeed,fair empirical comparison of our
method with PCA is not directly applicable,since we are not considering feature
transformations like the PCA method,but only the selection of a subset of signi"cant
features fromthe original feature set.This leads to the need of using the same number
of features selected by our algorithm when evaluating the PCA method for the
problemat hand.Moreover,since we are facing with the problemof systemidenti"ca-
tion,the e!ect of reducing the number of features through the PCAmust be measured
in terms of RMSE,that is computed as follows.Dimensionality reduction through the
PCA is performed by discarding components of the transformed feature set that have
small variances (i.e.small eigenvalues) and retaining only those terms that have large
variances (i.e.large eigenvalues).This truncation causes an MSE equal to the sum of
the variances of the discarded terms,which is computed as the sum of the smallest
eigenvalues [21].Speci"cally,to compare the results of PCA method with our
method,that selects three features in the outstanding case (see network N2),the
original feature set was approximated by truncating the transformed feature set to the
"rst three components.Then the MSE was computed as the sum of the smallest
remaining seven eigenvalues.Table 2 shows the eigenvalues of the PCA method for
both the training and testing data set,along with accumulation of eigenvalues.Table 3
lists the RMSEfor both the training and testing set when the original features,features
selected by our method and features transformed by PCAare used.It can be seen that
the performance of the proposed method compares very favorably with that of the
PCA method.
10 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13
Table 3
Comparison of our feature selection method and the PCA method in terms of
RMSE
Original features Reduced features (3)
Our method PCA
Training set 0.040 0.051 1.227
Testing set 0.124 0.068 2.247
5.Conclusions
A method of variable selection using neural networks has been proposed.The key
idea consists of iteratively removing input nodes with associated weights in a trained
network and then adjusting the remaining weights so as to preserve the overall
network behavior.This leads to formulate the selection problemin terms of de"ning
a system of linear equations that we solve with a very e$cient conjugate gradient
least-squares procedure.Asimple and e!ective criterion for identifying input nodes to
be removed is also derived which does not require high computational cost and
proves to work well in practice.However,alternative selection rules can be adopted as
well,without altering the method as a whole.The iterative nature of the algorithm
allows monitoring the performance of the network with reduced input variables
obtained at each stage of the input elimination process,in order to de"ne an
appropriate stopping condition.This makes the algorithm very#exible.
Moreover,unlike most variable selection procedures existing in literature,
no parameter needs to be set and no relevance measure of variables must be
introduced.The validity of the method was examined through a systemidenti"cation
problem and a comparison study was made with other feature reduction methods.
Experimental results encourage the application of the proposed method to complex
tasks that need to identify a core of signi"cant input variables,such as pattern
recognition problems.
Acknowledgements
The authors wish to thank the anonymous reviewers for their helpful suggestions
and comments.
References
[1] R.Battiti,Using mutual information for selecting features in supervised neural net learning,IEEE
Trans.Neural Networks 5 (4) (1994) 537}550.
G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 11
[2] BjoKrck,T.Elfving,Accelerated projection methods for computing pseudoinverse solutions of
systems of linear equations,BIT 19 (1979) 145}163.
[3] G.Box,G.Jenkins,Time Series Analysis,Forecasting and Control,Holden Day,San Francisco,CA,
1970,pp.532}533.
[4] F.Z.Brill,D.E.Brown,W.N.Martin,Fast genetic selection of features for neural-network classi"ers,
IEEE Trans.Neural Networks 3 (March 1992) 324}328.
[5] G.Castellano,A.M.Fanelli,M.Pelillo,Pruning in recurrent neural networks,Proceedings
of the International Conference on Arti"cial Neural Networks,Sorrento,Italy,May 1994,
pp.451}454.
[6] G.Castellano,A.M.Fanelli,M.Pelillo,Iterative pruning in second-order recurrent neural networks,
Neural Process.Lett.2 (6) (1995) 5}8.
[7] G.Castellano,A.M.Fanelli,M.Pelillo,An iterative method for pruning feed-forward neural
networks,IEEE Trans.Neural Networks 8 (3) (May 1997) 519}531.
[8] T.Cibas,F.F.Soulie`,P.Gallinari,S.Raudys,Variable selection with Optimal Cell Damage,
Proceedings of the International Conference on Arti"cial Neural Networks,Sorrento,Italy,May
1994,pp.727}730.
[9] K.Fukunaga,Introduction to Statistical Pattern Recognition,Academic Press,San Diego,CA,1990.
[10] A.Jain,D.Zongker,Feature selection:evaluation,application,and small sample performance,IEEE
Trans.Pattern Anal.Mech.Intell.19 (2) (1997) 153}158.
[11] J.R.Jang,Input selection for ANFIS learning,Proceedings of the 5th IEEE International Conference
on Fuzzy Systems,New Orleans,September 1996,pp.1493}1499.
[12] E.D.Karnin,A simple procedure for pruning back-propagation trained neural networks,IEEE
Trans.Neural Networks 1 (1990) 239}242.
[13] J.Kittler,Feature selection and extraction in:T.Y.Young,K.S.Fu (Eds.),Handbook of Pattern
Recognition and Image Processing,Academic Press,New York,1986,pp.60}81.
[14] Y.Le Cun et al.,Optimal brain damage,in:D.S.Touretzky (Ed.),Neural Information Processing
Systems II,Morgan Kaufmann,San Mateo,CA,1990,pp.598}605.
[15] A.U.Levin,T.K.Leen,J.E.Moody,Fast pruning using principal components,in:J.Cowan,G.
Tesauro,J.Alspector (Eds.),Advances in Neural Information Processing System VI,Morgan
Kaufmann Publishers,San Francisco,CA,1994.
[16] J.Mao,A.K.Jain,Arti"cial neural networks for feature extraction and multivariate data projection,
IEEE Trans.Neural Networks 6 (1995) 296}317.
[17] J.Mao,K.Mohiudden,A.K.Jain,Parsimonious network design and feature selection through node
pruning,Proceedings of the 12th International Conference on Pattern Recognition,Jerusalem,1994,
pp.622}624.
[18] M.C.Mozer,P.Smolensky,Skeletonization:A technique for trimming the fat from a network via
relevance assessment,in:D.S.Touretzky (Ed.),Advances in Neural Information Processing Systems I,
Morgan Kaufmann,San Mateo,CA,1990.
[19] K.Murase,Y.Matsunaga,Y.Nakade,Abackpropagation algorithmwhich automatically determines
the number of association units,Proceeding of the International Journal Conference on Neural
Networks,Singapore,1991,pp.783}788.
[20] P.M.Narendra,K.Fukunaga,A branch and bound algorithm for feature subset selection,IEEE
Trans.Comput.26(9) (September 1997) 917}922.
[21] E.Oja,Subspace Methods of Pattern Recognition,Research Studies Press Ltd.,Letchworth,England,
1983.
[22] P.Pudil,J.Novovicova,J.Kittler,Floating search methods in feature selection,Pattern Recognition
Lett.15 (1994) 1119}1125.
[23] R.Reed,Pruning algorithms } a survey,IEEE Trans.Neural Networks 5 (1993) 740}747.
[24] J.M.Stepps,K.W.Bauer,Improved feature screening in feedforward neural networks,Neurocomput-
ing 13 (1996) 47}58.
[25] M.Sugeno,T.Yasukawa,A fuzzy-logic-based approach to qualitative modeling,IEEE Trans.Fuzzy
Systems 1 (1) (1993) 7}31.
12 G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13
Giovanna Castellano received the`Laureaa degree in Computer Science from the
University of Bari,Italy,in 1993.From1993 to 1995 she was a fellowresearcher at
the Institute for Signal and Image Processing (CNR-Bari) with a scholarship
under a grant from the`Consiglio Nazionale delle Ricerchea.Currently,she is
attending the Ph.D.in Computer Science at the Computer Science Department of
the University of Bari.Her research interests include arti"cial neural networks,
fuzzy systems,neuro-fuzzy modeling,intelligent hybrid systems.
Anna Maria Fanelli received the`Laureaa degree in Physics fromthe University of
Bari,Italy,in 1974.From 1975 to 1979,she was a full time researcher at the
Physics Department of the University of Bari,Italy,where she became an Assist-
ant Professor in 1980.In 1985 she joined the Department of Computer Science at
the University of Bari,Italy,as Professor of Computer Science.Currently,she is
responsible of the courses`Computer Systems Architecturesa and`Neural Net-
worksa at the degree course in Computer Science.Her research activity has
involved issues related to pattern recognition,image processing and computer
vision.Her work in these areas has been published in several journals and
conference proceedings.Her current research interests include arti"cial neural
networks,genetic algorithms,fuzzy systems,neuro-fuzzy modeling and hybrid
systems.Dr.Fanelli is a member of the IEEE Society,the International Neural
Network Society and AI
*
IA (Italian Association for Arti"cial Intelligence).She is in the editorial board of
the International Journal of Knowledge-Based Intelligent Engineering Systems.
G.Castellano,A.M.Fanelli/Neurocomputing 31 (2000) 1}13 13