VDetector:An Eﬃcient Negative Selection
Algorithm with “Probably Adequate” Detector
Coverage
Zhou Ji
Columbia University
New York,NY 10032
zhouji@acm.org
Dipankar Dasgupta
The University of Memphis
Memphis,TN 38152
dasgupta@memphis.edu
Abstract
This paper describes an enhanced negative selection algorithm (NSA) called V
detector.Several key characteristics make this method a stateoftheart advance
in the decadeold NSA.First,individualspeciﬁc size (or matching threshold) of
the detectors is utilized to maximize the anomaly coverage at little extra cost.
Second,statistical estimation is integrated in the detector generation algorithm
so the target coverage can be achieved with given probability.Furthermore,
this algorithm is presented in a generic form based on the abstract concepts of
data points and matching threshold.Hence it can be extended from the cur
rent realvalued implementation to other problem space with diﬀerent distance
measure,data/detector representation schemes,etc.By using oneshot process
to generate the detector set,this algorithm is more eﬃcient than strongly evo
lutionary approaches.It also includes the option to interpret the training data
as a whole so the boundary between the self and nonself areas can be detected
more distinctly.The discussion is focused on the features attributed to negative
selection algorithms instead of combination with other strategies.
keywords:negative selection algorithms,artiﬁcial immune systems,anomaly
detection,classiﬁcation,computational intelligence,algorithm
1 Introduction
The negative selection algorithms are inspired by natural immune system’s
self/nonself discrimination mechanism.It was designed by modeling the biolog
ical process in which Tcells mature in thymus through being censored against
self cells[14].It is one of the earliest models of artiﬁcial immune systems (AIS)
[9,15,16].
In a negative selection algorithm,a collection of detectors,usually called
1
detector set,is generated so not to match any self samples (training data).The
detectors are subsequently used to check whether incoming new data items are
normal (self) or not (nonself).It is typical regarded as an anomaly detection or a
oneclass classiﬁcation method because the training data are from normal cases
only.The common engineering goals in various negative selection algorithms
are:(1) to limit the number of the detectors needed to be generated;(2) to
make the detector set cover as many of the anomalies as possible (ideally all the
anomalies);(3) to generate the detector set eﬃciently.To a certain extent,all
the concerns are around the socalled detector coverage,namely the proportion
of the nonself space that is covered or recognized by the detector set.Two
intertwined questions arise at this point:First,howdo we estimate the coverage?
It is desirable to know how eﬀective the detector set is before using it to detect
anomaly.Second,how do we achieve enough coverage with relatively smaller
number of detectors?Because the number of detectors is the main factor that
decides the performance of the algorithms,especially during detection phase,it
is desirable to use as few detectors as possible.
From both ends,Vdetector[23,24],as introduced in this paper,handles
the issues with innovative and eﬃcient techniques
†
.Furthermore,this method
is potentially very useful when the nearperfect coverage is not necessary and
alternative estimation is inaccurate.The rest of this paper is organized as
following.In section 2,we brieﬂy review previous research eﬀorts to address the
issue of detector coverage and the related works that Vdetector is based on.In
section 3,we will describe the algorithm in detail.Section 4 uses experimental
results to illustrate the properties of Vdetectorand to discuss the applicability
of this method.Lastly,the conclusion is summarized in section 5.
2 Related Works
2.1 Knowledge of Detection Coverage in Negative Selec
tion Algorithms
Ever since the time when negative selection algorithm was ﬁrst proposed,the
detector coverage is a major concern.It is desirable but never trivial to deter
mine the coverage quantitatively for a speciﬁc negative selection algorithm,or
to decide the necessary number and distribution of detectors for a given cover
age.Statistical methods were used in several works.D’haeseleer et al [10] did
thorough probability analysis on the relation between the number of detectors
and the probability that a random anomaly can be detected.It used matching
probability or failure probability to decide or evaluate the number of detectors.
In some later works on negative selection algorithms,similar analysis was in
cluded as the part of theoretical support [1,11,12].Some other works focused
on diﬀerent aspects,e.g.lower bound for the fault probability [31],but used
statistics from the same point of view.
†
Implementation and more information about Vdetector can be found at http://
vdetector.zhouji.net.
2
For binary or ﬁnitealphabet string representation,such analysis is relatively
easier to carry out.When negative selection algorithmwas extended to diﬀerent
data representations,probability cannot always be computed by straightforward
combinatorics anymore.Realvalued representation plays a unique role in many
applications that cannot be represented eﬀectively in binary form.For the prob
lems with their natural realvalued representation,it is easier to interpret the
output and usually results in more stable algorithm by maintaining aﬃnity in
representation space.Coverage in such cases is relatively less explored because
the search space is continuous and hard to analyze by enumerative combina
torics.
In realvalued negative selection algorithms,the detectors are usually repre
sented as hyperspheres,hyperrectangles,or hyperellipsoids.They are generated
by either the original generationelimination method or various other methods.
Gonzalez et al [17] used a randomprocess to generate and redistribute detectors
so the total overlap can be minimized generally.In another work by Dasgupta
et al [6],the detectors are represented as rules or in fact rectangular areas in
a multidimensional real space and generated by a genetic algorithm.A frame
work of a multilevel learning algorithm (MILA)[8] consists mainly of negative
selections in realvalued representation using matching rule deﬁned in lower di
mensional subspaces.Another multilevel AIS compared binary and realvalued
representation [2].Some algorithms involve resizing and redistribution of the
initial detectors [7].
In realvalued representation,Gonzalez et al [17] successfully used Monte
Carlo method to estimate the volume of self region and then decided the number
of detectors based on the estimate.That method is more sophisticated than the
analysis in binary representation in the sense that (1) the proportion of possible
self samples are evaluated in a probabilistic way instead of deterministically and
(2) the geometry and assumed distribution of the detectors have to be taken
into consideration to carry out the analysis.
Nevertheless,these seemingly diﬀerent analysis tackled the problem with a
similar approach,namely,to determine the number of detectors that is con
sidered enough before the detector set is generated.It was oriented to general
analysis of the relation between the number of detectors and the coverage and a
given detector set is not in the question.A drawback is that eventually we have
no direct knowledge of detector coverage of the actual detectors that are gener
ated.Moreover,the link from volume of self or nonself region to the number of
detectors is not only an intuitive estimate,but also depends on the assumption
of detector distribution that may be very diﬀerent from the actual detector set.
For binary representation,the distribution is usually assumed to be uniform,
which is unrealistic considering the speciﬁc application and the detector gener
ation algorithm.For realvalued case [17],the distribution is assumed to follow
simple geometric pattern so the overlap barely can be estimated,though still
roughly.
Vdetector [24] approach the issue from a diﬀerent angle.It estimates the
coverage of the actual set of detector,which means (1) whatever information
we can obtain reﬂects the actual coverage instead of estimate based purely
3
on the number of detectors;(2) no assumption is needed for the distribution
of the detectors.As it will be shown in the algorithm details described in
section 3,it has the further advantage that the geometry of the detector doesn’t
matter so there is no diﬃculty to use for diﬀerent representation of detectors.
While the original experiments were done in realvalued space,it is possible
to implement a similar mechanism in other representations,e.g.the popular
binary or other ﬁnite alphabet string representation.More generally,it can be
used in any detection mechanism as long as individual point can be veriﬁed to
be recognized or not even though there is no explicit algorithm to evaluate the
rate of detection.The same methodology also applies to diﬀerent algorithms in
which the similar issue of proportion estimation exists.
2.2 Detectors with Individual Properties
In a negative selection algorithm,detectors are essentially artiﬁcial nonself sam
ples with a match threshold.In the original form,detectors are diﬀerent with
one other only in their location in the searching space.The rationale is simply
that we assume a uniform matching criterion should be used throughout the
data.
Later on,some works took up detector with variable properties,especially
variable size,for various reasons.Detectors of more than one size are allowed
in Branco et al’s work [3] as a secondary complement to improve the detection
performance.The rectangular detectors generated by genetic algorithm [6] are
naturally variable in size.The method by Dasgupta et al [7] resizes and moves
the detectors during the training stage to increase coverage.
Vdetector extends the idea of variable sized detectors to be more general
and simpler.Although the ‘V’ in Vdetector came from the word ‘variable’,
‘individualspeciﬁc size’ is a more accurate description.The scheme is simply
assigning each detector its own size or matching rule.The sizes are determined
individually when the detectors are generated.An important fact is that this
size property only brings very little cost in the implementation but provides
a signiﬁcant advantage in coverage compared with most of the earlier NSA re
ported.The purpose is clearly for maximizing the detector coverage  or avoiding
any unnecessary detectors.It also eliminates the need to redistribute detectors
for the purpose of minimal overlap.In the current version,only variable size
is considered,but the strategy can be extended to other variable properties.
Even hybrid matching rules can be introduced as a form of variable property
to maximize the coverage.Generally,the additional cost of allowing variable
property is limited in the representation of each detector.
2.3 Statistical Inference
Although Vdetector estimates a diﬀerent parameter,namely the detector cov
erage,from Gonzalez et al’s work [17] estimating the volume of self region,they
are based on the same statistical principle.The percentage of certain category
in the sample can be used to estimate the proportion in the entire population.
4
Considering a collection of random points in the nonself region,Vdetector al
gorithm uses the percentage of the points that are covered by the detectors to
estimate the covered proportion of the entire nonself region.However,an es
timate of the proportion value itself does not tell us how much or how likely
the estimate may be diﬀerent from the real population proportion  in our case,
the actual detector coverage.To draw a more meaningful conclusion,we should
construct a conﬁdence interval based on the Central limit theorem [21,13,20].
In other words,although we do not get the same percentage every time if we
repeat sampling of a ﬁxed size,the distribution of these percentage is close to a
normal distribution.
The central limit theorem justiﬁes using a normal distribution as an approx
imation for the distribution of ¯x when n is suﬃciently large.There are two
apparent sources of error in using the normal distribution as an approximation
of the binomial distribution:
1.The normal distribution is always symmetrical;the binomial distribution
is symmetric only if the probability of one outcome,p,is 0.5.
2.The normal distribution is continuous;the binomial distribution is dis
crete.
A rule of thumb taking into account both the problems of asymmetry and
discreteness is to use the normal distribution approximation only if np > 5,
n(1 −p) > 5 and n > 10.
There are alternative distributions that can be used to deal with the asym
metry problem,and there are mathematically strict corrections for discontinuity
too.Even though these are not the main concerns in the application in ques
tion,the issue of asymmetry is in fact not negligible.Because the proportion of
covered nonself points,p,is the variable to be considered,it is very likely that
we need to consider a large p,for example,90% or 99%.Fortunately,we can
circumvent the issue with proper strategy in our proposed algorithmconsidering
the fact that we care more about enough coverage than its exact value.
2.3.1 Conﬁdence interval
Conﬁdence level is the probability that the population parameter falls within
the range called conﬁdence interval around the sample parameter [13,21].For
normal distribution,we have
ˆp −E < p < ˆp +E,(1)
where p is the population parameter,ˆp is the sample statistic,
E = z
α/2
σ (2)
is the margin of error.σ is the standard error of the sample statistic.For
proportion,σ =
p(1−p)
n
,which can be estimated as
σ =
ˆpˆq
n
,(3)
5
where ˆq = 1 − ˆp and n is the sample size.In the case of estimating detector
coverage,we are more interested in making a conclusion about the lower limit
of coverage,p > p
min
,where p
min
is the minimum coverage we can presume
with some certainty.So we can use a oneside conﬁdence interval
p > ˆp −E,(4)
where
E = z
α
ˆpˆq
n
.(5)
To ensure the assumption that the binomial random variable is approximately
normally distributed with the mean = np and standard deviation σ =
√
npq,
we should have np ≥ 5,nq ≥ 5.
In Equation (2),z
α/2
is the z score for a conﬁdence level of 1 −α/2  the
positive standard z value that separates an area of α/2 in the right tail of
the standard normal distribution curve.For a standard normal distribution,
the probability of −z
α/2
≤ x ≤ z
α/2
,P(−z
α/2
≤ x ≤ z
α/2
) = 1 − α.The
probability of x ≤ z
α/2
,P(x ≤ z
α/2
) = 1 −α/2.Similarly,z
α
in Equation (5)
is where P(x ≤ z
α
) = 1 −α.
2.3.2 Hypothesis Testing
Hypothesis testing is another way of statistical inference also based on Equation
(4).It ﬁts our purpose better because the goal here is to decide when to stop
generating or including more detectors.In conducting a statistical hypothesis
test,we need to identify the null hypothesis.We assume that Type I Error
(rejecting the true null hypothesis) is more costly than Type II Error (accepting
a false null hypothesis).
The normal procedure of hypothesis testing involves the following steps:
1.State the null hypothesis and alternative hypothesis.The null hypothesis
is the statement that we’d rather take as true if there is not strong enough
evidence showing otherwise.
2.Determine the cost associated with the two types of decisionmaking er
rors.
3.Choose the signiﬁcant level,α.That is the maximum probability we are
willing to accept in making Type I Error.Typical values are 0.05 or 0.01.
4.Collect the data and compute the sample statistic.To test based on
proportion we can use z score
z =
ˆp −p
pq
n
.
5.Reject or accept the null hypothesis.The traditional method is to check
whether the test statistic is in critical region (z > z
α
) or not.If z > z
α
,
we reject the null hypothesis.An alternative way is to use a pvalue test,
which is easier [21].
6
2.4 Inspiration from Learning Theory
Statistical inference does not tell us the exact value of the detector coverage.As
we will see later,it tells the probability that the approximation of coverage is
good enough.This idea of “Probably Adequate” becomes more comprehensible
when we look into the similar concepts in machine learning.One of the major
models of computational learning theory is Probably Approximately Correct
learning (PAC learning) [5,19].In terms of PAC learning,successful learning
of an unknown target concept should entail obtaining,with high probability,
a hypothesis that is a good approximation of it.Accuracy,or how good the
approximation is,is described by ǫ:the hypothesis returned,h,should satisfy
error(h) ≤ ǫ.Conﬁdence,or the chance we can correctly obtain the hypothesis
h,is described by σ:the probability of returning h is at least 1−σ.h is parallel
to the target coverage.
Learning theory such as PAC learning could provide guidance to the devel
opment of negative selection algorithms.Although analyzing a speciﬁc nega
tive selection algorithm,e.g.Vdetector,may not be a straightforward task,
it should be a potentially helpful work to analyze whether the problem that is
solved by a negative selection algorithm,or more particularly,by Vdetector,is
PAC learnable or not.As the ﬁrst step leading to formalization of the problem,
regardless of the algorithm that may be used to solve it,we should clarify the
basic assumptions about the training data our algorithms take.Some of the
previous works in this area,especially those which were not based on binary
representation and did not assume all self features are present in training data,
are hard to compare with one another shouldertoshoulder due to the lack of
equivalent assumptions.
In the following analysis,we assume
• Both self and nonself points appear in some bounded ndimensional real
space.For simplicity,let us assume it is [0,1]
n
.
• Some ﬁnite number of self samples are provided as input.They are ran
domly distributed over the self region.
• The training data is noise free,meaning all the self samples are real self
points.This is not necessary in principle,but used to simplify the discus
sion.
• To evaluate the detection performance,the testing data are ﬁnite number
of random points over the entire space in question described above.Each
of those points can be veriﬁed to be self or nonself.
3 Algorithm
3.1 Coverage  Proportion  Probability
Deﬁnition 1 The detector coverage of a given detector set is deﬁned as the
ratio of the volume of the nonself region that can be recognized by any detector
7
in the detector set to the volume of the entire nonself region.
Generally,it can be written as
p =
~x∈D
d~x
~x∈
S
d~x
,
where
S is the set of nonself points and D is the set of nonself points that are
recognized by the detectors.In the case of 2dimensional continuous space,it is
reduced to the ratio of the area covered to the area of the entire nonself region
p =
(x,y)∈D
dxdy
x∈
S
dxdy
.
If the space in question is discrete and ﬁnite,it can rewritten as
p =
D
S
,
where A denotes the cardinality of a set A.
Figure 1 illustrates the three regions in the question:self region,covered
nonself region,and uncovered nonself region in a 2Ddiagram.The area without
hatched shade on the rightside of the diagram is
¯
S and the dotted area covered
by circular detectors are D.
Self region
Nonself region
Covered nonself region
Uncovered
nonselfregion
Figure 1:Negative selection algorithm using realvalued representation:Diﬀer
ent regions
In statistical term,the points of the nonself region are our population.Gen
erally speaking,the population size is inﬁnite.The probability of each point to
8
be covered by detectors has a binomial distribution.The detector coverage is
the same as the proportion of the covered points,which equals to the proba
bility that a random point in the nonself region is a covered point.Assuming
all the points from the entire nonself region are equally likely to be chosen in
a random sampling process,the probability of a sample point being recognized
by the detectors is thus equal to p.For a sample of ﬁxed size,the proportion of
covered points is
ˆp =
ˆ
D
ˆ
S
,
where
ˆ
S is the sample;and
ˆ
D is the set of sample points that are recognized by
the detectors.
ˆ
S is thus the sample size.ˆp is the sample statistic that is the
point estimate of the population proportion.
3.2 Integration of Hypothesis Testing and Detector Gen
eration
The main idea of this method is to ﬁnish generating detectors when the coverage
is close enough to the target value.This contrasts with other works that replies
on the number of detectors to provide enough coverage.
The original Vdetector [23] has a simple estimate to stop the detector gen
eration procedure.Random points are generated to be detector candidates.If
it is a nonself point but not covered,a new detector is generated on it.If it is a
covered nonself point,it is discarded as a candidate but the attempt is recorded
in a counter which will be used to estimate the coverage.If the counter of con
secutive attempts that fall on covered point reaches a limit m,the generation
stage ﬁnishes with the belief that the coverage is large enough.m is not preset.
It is decided by the target coverage.
m=
1
1 −α
,(6)
where α is the target coverage,a control parameter.Equation (6) is explained
as following.If there is 1 uncovered point in a sample of size m
′
,the point
estimate of proportion of uncovered region is
1
m
′
,and the estimate of coverage
is
α
′
= 1 −
1
m
′
.(7)
If in fact there is 0 uncovered point in a sample of size m
′
,we have a better
than average chance that the actual coverage is larger than α
′
.Because m is
decided by Equation (6),when we see mconsecutive points that are all covered,
we can estimate that the actual coverage is more likely to be at least α.As men
tioned before,that is based on point estimation without a conﬁdence interval.
Compared with the new algorithm,we call that method “na¨ıve estimate”.
To extend to more strict statistical inference,estimating with a conﬁdence
interval directly does not ﬁt the problem as well as hypothesis testing because
9
our goal is to make a decision of adding more detectors or not.What makes
this paper’s method diﬀerent from traditional statistical inference is that the
testing can be done as part of the detector generation algorithm.Although it
may be implemented as a relatively independent module,we still have to face
a dilemma:the detector coverage or the proportion to be estimated is actually
changing during the detector generation.So we need to design a process in
which the hypothesis testing happens only when we temporarily stop adding new
detectors.Otherwise,the testing will be meaningless.At the same time,we also
try to reuse the random samples we use in hypothesis testing as the candidate
detectors.This doubles the advantage of integrating hypothesis testing in V
detector.
In the case of estimating coverage,the null hypothesis would be “The cover
age of the nonself region by all the existing detectors is below percentage p
min
.”
If we accept the null hypothesis,we would include more detectors.If the null
hypothesis is actually false,the cost of a Type II Error would be more unneces
sary detectors.On the other hand,if we reject the null hypothesis by mistake,
we would end up with lower than actual coverage.The latter,so called Type I
Error,is exactly our concern.The signiﬁcant level α is the maximumacceptable
probability that we may make a Type I Error  end up with fewer than needed
detectors.We need a ﬁxed sample size to do the hypothesis testing.If the con
clusion is that we need more detectors,we take all the uncovered sample points
to make new detectors.This largely saves the cost of the entire algorithm.
Figure 2 shows the diagram of the modiﬁed Vdetector that uses hypothesis
testing to estimate the detector coverage.
To guarantee the assumption np ≥ 5 and nq ≡ n(1 −p) ≥ 5 is valid,we can
choose sample size by
n > max(5/p,5/(1 −p)).
If there is x points covered,ˆp = x/n,where n is the sample size,we have
z =
x
√
npq
−
np
q
.
During the procedure to test more points,x will either increase (when the
point is covered) or stay unchanged (when the point is uncovered).So does z.
Before the procedure ﬁnishes for all n points,if z based on the tested points is
larger than z
α
,it is enough to reject the null hypothesis and claim enough cov
erage.At that point,the test can be stopped.Because the ultimate conclusion
from the procedure is either rejection or acceptance of the null hypothesis,not
the estimate of p and conﬁdence interval,it is not necessary to ﬁnish trying to
get a “better” answer.
If the the assumption nq > 5 is in fact invalid because the real p is larger
than the p we used,then the actual coverage is more than what we want to test.
Our conﬁdence in the coverage is not comprised in this case.If the assumption
np > 5 is in fact invalid because p is so small,the hypothesis test will pass only
when it could pass a test using the actual nonnormal distribution.Because the
probability curve skew to the left side (origin side),z
α
of such a distribution
10
Begin
Choose p and α
Choose sample size n
n > max(5/p,5/(1 −p))
N = 0,x = 0
Sample a point
Self?
N = N +1
Covered?
Save the candidate
x = x +1
z =
x
√
npq
−
np
q
z > z
α
?
N = n?
End:enough coverage
Accept all saved
candidates
as new detectors
No
Yes
No
Yes
No
Yes
Yes
No
Figure 2:Vdetector generation algorithm with statistical estimate of coverage
would be smaller than z
α
of normal distribution.If z does not pass this skewed
z
α
,it will not pass normal distribution’s z
α
either:z ≤ z
α

p<5/n
≤ z
α
.
3.3 Using Detectors of Variable Size
In Vdetector,the sizes of detectors are not predeﬁned by the matching thresh
old as in most realvalued negative selection algorithms.Instead,the radius of
each detector is decided individually simply as the maximum value that does
not match any self points.In fact,this makes use of the key idea of the original
negative selection algorithm and allows to achieve the maximum coverage by
each detector.Because each detector is implemented as a center point and its
radius,there is no diﬀerence in computational cost between a large detector and
small detector.Therefore,we are not concerned about the overlap of the detec
tors.As long as a detector has its contribution to the total coverage,namely
the part that is not overlapped,it is as useful as any other detectors.
Figure 3 illustrates the core idea of variable detectors in 2dimensional space.
The dark grey area represents the actual self region,which is usually given
through the training data (self samples).The light grey circles are the possible
11
(a) Constantsized detectors (b) Variablesized detectors
Figure 3:Main concept of detectors with variable properties
detectors covering the nonself region.Figure 3(a) shows the case where the
detectors are of constant size.In this case,a large number of detectors are
needed to cover the large area of nonself space.The wellknown issues of “holes”
are illustrated in black.In ﬁgure 3(b),using variablesized detectors,the larger
area of nonself space can be covered by fewer detectors,and at the same time,
smaller detectors can cover the holes.Since the total number of detectors is
controlled by using the large detectors,it becomes more feasible to use smaller
detectors when necessary.
Another advantage of this new method is that it facilitates the usage of the
above described statistical inference.
It can be further extended to variable matching rules or at least diﬀerent
distance measures.That would be an easy way to realize detectors of diﬀerent
geometric shapes pursued by many other works.
3.4 BoundaryAware Algorithm
What exactly does a self sample or a collection of self samples mean in term of
presenting or deﬁning the self region?This needs to be answered ﬁrst before
we judge any soft computing algorithms of anomaly detection.Regardless of
speciﬁc algorithm or scheme of negative selection,what a self sample means
eventually comes down to the matching rules.
Fig.4 shows how we may interpret a self sample in diﬀerent ways.In
Fig.4(a),we assume that a circular area around the self (normal) sample is
entirely normal.It is a straightforward way to achieve generalization,similar to
partial matching in binary representation in principle.We call an interpretation
that allows much variability a “conservative interpretation”,referring to the
“conservative” attitude to claim a new sample to be abnormal.In Fig.4(b),
we only consider the exact point of the self sample to be normal.In reality,
we may still allow some small deviance because we have to compare ﬂoat point
numbers,but basically we only regard the samples we already see as normal.
We call an interpretation that doesn’t allow much variability an “aggressive
12
interpretation”.
self sample
“self radius”
abnormal region
self sample
abnormal region
(a) Conservative (b) Aggressive
Figure 4:Possible interpretations of a single self sample:Conservative or Aggressive
At the ﬁrst look,it seems that in an extremely aggressive interpretation like
ﬁg.4(b),no generalization could happen.That doesn’t have to be the case.Fig.
5 shows a group of three self sample points.Even if we do not take any circular
surrounding area of a single self sample as normal,we can still generalize to a
self region by considering the neighboring self points together,as shown in ﬁg.
5(c).Compared with ﬁg.5(a) or (b),this is more aggressive to detect anomaly,
but only to the outside of the perceived “self region”.
self sample
self sample
self sample
abnormal region
self sample
self sample
self sample
abnormal region
self sample
self sample
self sample
abnormal region
(a) Large threshold (b) Small threshold (c) As a collection
Figure 5:Possible interpretations of a group of self samples
Naturally,each self sample point can be interpreted as an evidence that its
vicinity is self region.On the other hand,we can fairly assume that the self
samples can be drawn anywhere over the entire self region.There is no reason
to exclude the points that are close to the boundary between self and nonself
regions no matter what kind of matching rule or distance measure is used.
Fig.6 illustrates the “boundary dilemma”,the scenario that the self samples
close to the boundary inevitably extend the actual self region due to the vari
ability allowed by the algorithm.In this ﬁgure,the shaded area is the “real”
self region;the dots are the self samples and the circles are their generaliza
tion.If the self threshold is too small,the space between self samples could not
be represented.In other words,more samples are needed to train the system
13
Figure 6:“Boundary Dilemma”
properly.On the other hand,if the self threshold is large,the false self region
represented by the boundary samples may be too large to accept.
In the case that the overcovered area is too large compared with the real
nonself region,the error would be large.When the nonself region is a thin stripe
between two self regions,it may not be able to be represented at all.In those
cases,the issue of boundary dilemma will be more considerable.
The issue described above is tackled by an ingenious simple strategy using a
negative selection algorithmto achieve the interpretation illustrated in ﬁg.5(c).
The above discussion concerns general interpretation of self samples.From
the view point of a negative selection algorithm,the diﬀerence in interpretation
is shown in ﬁg.7.Fig.7(a) shows the coverage of a detector set using a
conservative interpretation;ﬁg.7(b) shows one using an extremely aggressive
interpretation.Similar to the conceptual discussion,it is possible to generalize
the self samples to a ﬁnite self region even if detection is extremely aggressive
outside the self region.This is illustrated in ﬁg.7(c).
self sample
“self radius”
self sample
self sample
self sample
self sample
(a) Conservative (b) Aggressive (c) Boundaryaware
Figure 7:Detectors enclosing the perceived “self region”
Therefore,we end up with two versions of Vdetector algorithm.One treats
each training data point (self sample) individually [23].We call it pointwise V
detector.The other brings out a new advantage of negative selection algorithm
14
so that it is able to detect the boundary of self region.We call it boundaryaware
Vdetector [22].
4 Experiments and Discussion
To understand the behavior of the algorithm described in the previous section,
experiments were carried out using 2dimensional synthetic data.Over the
unit square [0,1]
2
,various shapes are used as the ‘real’ self regions in these
experiments.They belong to one of the six types listed in Table 1,which
also shows the geometric parameters that extend each type to diﬀerent sizes or
variations.Figure 8 shows the basic shapes of the six types of self region.
Table 1:Shapes of self area
Type of Shape
Geometric Parameters
Cross
thickness and location of the cross
Triangle
size(radius of circumscribed circle)
Ring
outer and inner radius
Stripe
width
Intersection
cross size and location,circle radius
Pentagram
size (radius of circumscribed circle)
Figure 8:Diﬀerent types of shape
A ﬁxed number of random points from the self region are used as the self
samples to generate the detector set.Another number of random points,in
which some are self,some nonself,are used to test the detection performance of
the detector set.Figure 9 shows examples of training data (self samples) and
test data:9(a) is a self sample of 100 points;9(b) is a self sample of 1000 points;
9(c) is 1000 test data including both self points and nonself points.It can be
predicted from this ﬁgure that the number of training data will have obvious
15
inﬂuence on the detection results.Figure 10 shows the detectorcovered area
using these two diﬀerent numbers of training points (boundaryaware algorithm,
hypothesis testing,99%target coverage):10(a) the area trained with 100 points;
10(b) the area trained with 1000 points.When other control parameters are
diﬀerent,e.g.using pointwise algorithm,the covered area will not be the same
as in Figure 10,but the number of training points still plays an important role.
(a) 100 points of self sample (b) 1000 points of self sample
(c) 1000 points of test data
Figure 9:Self samples and test data
The inﬂuence of the control parameters and the diﬀerences of strategies were
explored with more experiments.From the data side,the diﬀerence in results
may come from the number of sample points or the diﬀerent shapes (including
their speciﬁc geometric parameters) of the self region.From the algorithm side,
the diﬀerence may come from:target coverage,signiﬁcance level of hypothe
sis testing,methods of estimation (na¨ıve estimate or hypothesis testing),self
threshold,and Vdetector strategy (pointwise or boundaryaware).The per
formance we want to compare include detection rate,false alarm rate,and the
number of detectors.Signiﬁcant level α is set to be 0.1 in the results reported
16
(a) Trained with 100 points (b) Trained with 1000 points
Figure 10:Detectorcovered area
in this paper.
Figure 11 compares some results of detection rate using na¨ıve estimate and
hypothesis testing for target coverages from 90% through 99%.The number
of sample points is 1000.The boundaryaware algorithm was used.The self
region in ﬁgure 11(a) is an ‘intersection’ shape,which is basically four separated
regions.The one in ﬁgure 11(b) is a pentagram whose radius of circumscribed
circle is 1/3.The plot shows the mean of 100 repeated tests;standard deviation
is shown as error bar on the graph.Results obtained with na¨ıve estimate and
hypothesis testing are plotted together to compare.Hypothesis testing has a
small but consistent advantage over the na¨ıve method.
0.6
0.7
0.8
0.9
1
1.1
1.2
0.9
0.92
0.94
0.96
0.98
1
Detection Rate
Target Coverage
hypothesis testing
naive estimate
0.6
0.7
0.8
0.9
1
1.1
1.2
0.9
0.92
0.94
0.96
0.98
1
Detection Rate
Target Coverage
hypothesis testing
naive estimate
(a) Intersection shape (b) Pentagram shape
Figure 11:Inﬂuence of target coverage
Table 2 again highlights the diﬀerence between na¨ıve estimate and hypothesis
testing.The results were from the following setting:boundaryaware strategy,
1000 self sample points,target coverage 90%,and self threshold 0.05.The
numbers are the mean of 100 repeated tests and the standard deviation σ is also
tabulated with the corresponding variables.Results for two diﬀerent shapes of
17
self region (‘intersection’ and pentagram) are shown.
Table 2:Performance diﬀerence between na¨ıve estimate and hypothesis testing
detection rate/σ
false alarm rate/σ
number of detectors/σ
na¨ıve estimate (‘intersection’)
81.46%/6.99%
2.42%/1.64%
19.5/3.79
hypothesis testing (‘intersection’)
93.41%/2.56%
5.17%/2.21%
55.72/3.71
na¨ıve estimate (pentagram)
86.19%/5.52%
0.2%/0.48%
12.53/2.39
hypothesis testing (pentagram)
96.39%/1.88%
0.53%/0.76%
51.31/1.19
Figure 12 shows the diﬀerence between the pointwise and boundaryaware
Vdetector algorithms when all the other settings are the same.The target
coverage is 99% and 1000 training points are used.The actual self region is
ringshaped.Figure 12(a) shows the detection rate;(b) shows the false alarm
rate;(c) shows the number of detectors.The boundaryaware algorithm has
obvious better detection rate under this setting.Although it has higher false
alarm,especially for very low self threshold,than pointwise interpretation,it
is not an issue generally.The diﬀerence in the two strategies’ performance
is related to the fact that the concept of ‘self’ here is deﬁned by the discrete
self points.The improvement brought by the boundaryaware Vdetector is
more obvious when it is important to detect the boundary more accurately.At
least two characteristics are noteworthy in ﬁgure 12(c).First,the number of
detectors is near constant as long as the self threshold is larger than 0.05;second,
boundaryaware algorithm resulted in slightly fewer detectors.The reason for
the ﬁrst inclination is that the detector candidates were processed in groups of
proper sample size required by hypothesis testing.That disadvantage is limited
and will not scale with the number of training points or other parameters.It can
be avoided by not using all the uncovered randompoints to make new detectors.
Figure 13 shows the detection rate results of diﬀerent shapes of self regions
for a range of self threshold using the boundaryaware algorithm.Totally 10
diﬀerent shapes are shown in this ﬁgure including ﬁve types in Figure 8 plus their
complementary shapes.The results are consistent without major diﬀerence.100
self points were used to train in those results.When 1000 points were used,the
diﬀerence were even smaller.
Figure 14 shows the detection rate and false alarm rate,respectively,com
paring 100 points and 1000 points of the self samples on a pentagram region.
The boundaryaware algorithm plus hypothesis testing was used.The advan
tage of more training points in detection rate seems small,but the false alarm
using 100 points is signiﬁcantly higher.On the other hand,if the pointwise
algorithm is used,the false alarm rate can be controlled over a range of self
thresholds,but the detection rate of 100 points will be much lower.It is not
surprising that the number of self sample points has a major aﬀect on detection
performance.Improvement in detector generation and detection process can
hardly eliminate the false alarms mainly coming from the deﬁnition of self that
is totally based on the discrete samples.
As discussed in the previous sections,this method is not limited to speciﬁc
18
0
0.2
0.4
0.6
0.8
1
1.2
0
0.05
0.1
0.15
0.2
Detection Rate
Self Threshold
boundaryaware
pointwise
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0
0.05
0.1
0.15
0.2
False Alarm Rate
Self Threshold
boundaryaware
pointwise
(a) Detection rate (b) False alarm rate
0
1000
2000
3000
4000
5000
6000
7000
0
0.05
0.1
0.15
0.2
Number of Detectors
Self Threshold
boundaryaware
pointwise
(c) Number of detectors
Figure 12:Comparison of two strategies in Vdetector
problem space,detector representation,matching rule,etc.For example,Eu
clidean distance,or 2norm distance,widely used in realvalued representation
and in the earlier experiments can be generalized to Minkowski distance of or
der m,or L
m
distance,for any arbitrary m.For a point (x
1
,x
2
,∙ ∙ ∙,x
n
) and a
point (y
1
,y
2
,∙ ∙ ∙,y
n
) in ndimensional space the 1norm distance is Manhattan
distance
n
i=1
x
i
−y
i
.
The mnorm distance is deﬁned as
n
i=1
x
i
−y
i

m
1
m
.
The inﬁnity norm distance is deﬁned as
lim
m→∞
n
i=1
x
i
−y
i

m
1
m
= max(x
i
−y
i
,i = 1,2,∙ ∙ ∙,n)
19
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
0
0.05
0.1
0.15
0.2
Detection Rate
Self Threshold
cross
inverted cross
ring
inverted ring
intersection
inverted intersection
stripe
inverted stripe
pentagram
inverted pentagram
Figure 13:Detection rate for various shapes of self region
0.97
0.975
0.98
0.985
0.99
0.995
1
1.005
1.01
0
0.05
0.1
0.15
0.2
Detection Rate
Self Threshold
1000 points
100 points
0
0.2
0.4
0.6
0.8
1
0
0.05
0.1
0.15
0.2
False Alarm Rate
Self Threshold
1000 points
100 points
(a) Detection rate (b) False alarm rate
Figure 14:Performance for diﬀerent training sizes
For diﬀerent norm,the detector (or recognition region) will take diﬀerent geo
metric shapes and have diﬀerent covering area.Fig.15 illustrates the diﬀerent
shapes in 2dimensional space.They are shown with the same radius.If we use
radius r to indicate the size,r can be interpreted as the radius of the circle in the
case of 2norm distance.For Manhattan distance,the detector is a 45
◦
turned
square whose edge is
√
2r;for inﬁnity norm,the detector has the shape of a
square whose edge is 2r;for any norm between 2 and ∞,the shape is evidently
between the radius r circle and the edge 2r square.
Tables 3 and 4 are the results obtained using diﬀerent distance measures,for
the “intersection” self region and the “5circles” self region,respectively.There
are two diﬀerent implementations of Euclidean distance.One is the default
setting of Vdetector,in which the distance measure and matching process are
actually implemented using the square of Euclidean distance for better perfor
mance in speed.The other Euclidean distance is implemented as L
2
distance
in the general way.In term of detection results,there seems to be little diﬀer
20
1 −norm distance 2 −norm distance
(Manhattan) (Euclidean)
3 −norm distance ∞ norm distance
Figure 15:Various geometric shapes of detector (recognition region) correspond
ing to diﬀerent mnorm distances
ence between diﬀerent distance measures for these two examples,except that
the Manhattan distance is slightly more aggressive to raise alarm of anomaly.
However,the running time of the algorithm is noticeably diﬀerent with diﬀer
ent distance measures.The ∞ norm distance is the fastest.For general L
m
distance,the algorithm runs slower for higher m.
Although NSA were widely used in various applications and have developed
many variations,there are still some skepticism [28] or in some cases confusion
about whether and how they could be used [25].For example,detector coverage
and detection rate are two terms that may lead to misunderstanding when we
discuss how well the detector set works.Failure to make clear distinction may
muddle otherwise clear analysis.Coverage is the proportion of nonself space
that is covered by detectors.For a given instance,we usually do not know the
actual value because the nonself space is the unknown we are seeking for.If we
21
Table 3:Eﬀects of diﬀerent distance measure:‘Intersection’ shape
Distance measure
detection rate
SD
false alarm rate
SD
Euclidean (default eﬃcient implementation)
96.36
1.49
9.77
1.58
Manhattan
97.05
0.83
10.53
2.12
Euclidean
96.23
1.57
9.59
1.38
3norm
94.69
1.79
9.55
1.56
inﬁnity norm
89.62
3.01
11
1.5
Table 4:Eﬀects of diﬀerent distance measure:‘Five circles’ shape
Distance measure
detection rate
SD
false alarm rate
SD
Euclidean (default eﬃcient implementation)
97.64
0.58
8.2
0.85
Manhattan
99.38
0.62
9.84
1.55
Euclidean
97.63
0.57
8.25
0.8
3norm
98.16
0.23
8.25
0.6
inﬁnity norm
98.68
0.1
9.28
0.79
discuss coverage in terms of a number,we are making assumption about how
the nonself space (or self space) can be induced from the self sample points we
have,at least conceptually.Detection rate,on the other hand,refers to the
percentage of nonself sample points that are detected by the detector set in a
particular experiment.Thus,the diﬀerence is twofolded:
• Coverage depends on how we interpret the training data set.Even for
a deﬁned set of detectors,the value of coverage must be based on some
assumption that cannot be veriﬁed.For example,Stibor et al [29] showed
examples of coverage provided by Vdetector[23] at termination.Nine self
points were used to train the system.The discussion of coverage was based
on the assumption that real self region is all the perfect circles around the
training points.Vdetector’s termination is decided by estimated cover
age.Detection rate is inﬂuenced by the coverage as well as the validity of
the assumption or interpretation we make about the training data.
• Coverage is the ratio of covered nonself space to the entire nonself space.
The probability distribution is usually not considered to evaluate the cov
erage.Detection rate,on the other hand,depends on the actual frequency
distribution of test data.The distribution is usually reﬂected in the real
data.This exposes a weakness of Vdetector’s termination criterion.The
statistical estimate of coverage using random sampling does not take into
consideration the probability distribution of the data to be detected.Thus
the conclusion of enough coverage or not are always bias depending on how
diﬀerent the actual distribution is from uniform distribution.Logically,
this cannot be totally solved because the self training data at best can
only provide distribution of the self space.
In fact,either NSA in general or a speciﬁc ﬂavor like Vdetector have their
22
limitations.
• Limitation of speciﬁc matching rules
Following the above discussion,we notice that matching rule,which usu
ally takes the form of a distance measure plus a matching threshold,plays
a very important role.Visually,the same concept can be expressed as
the geometric shape of the detectors.Hart [18] noticed that importance
of choosing the proper recognition region,which refers to the similar idea
as the shape of detectors.It should be pointed out that this is as impor
tant in any other AIS systems or any learning paradigms as in negative
selection algorithms.
Sometimes,the apparent limitation of a negative selection algorithm is
in fact the limitation of a speciﬁc matching rule or detector shape.For
realvalued negative selection algorithms,Euclidean distance and therefore
hyperspherical detectors are commonly used,but they are not the only
possibility.Limitation of Euclidean distance or hyperspherical detectors
is not the special problem of realvalued negative selection algorithms.In
fact,other matching rules and detector shapes were used in several works,
for example,rectangular detectors [6],hyperellipsoid detectors [27],etc.
Negative selection algorithms are methods with great ﬂexibility.First,the
concept of negative selection can be realized in very diﬀerent ﬂavors of so
called negative selection algorithms.Second,even for a speciﬁc negative
selection algorithm,e.g.,Vdetector,there are many elements in the model
that are not inherently limited as it appears.For example,for realvalued
representation,which is not necessarily the only choice in the ﬁrst place,
we could use very diﬀerent distance measures or matching rules.We see
that Euclidean distance can be easily extended to be L
m
distance,which
could result in diﬀerent shapes of detectors.
• Limitation of oneclass classiﬁcation
Generally speaking,performance of a classiﬁcation algorithms or a learn
ing method depends on the probability distribution of the data.Any
serious analysis cannot be done without taking into consideration that
distribution.Oneclass classiﬁcation,however,is an eﬀort to learn when
no information of the second class is available [30].That means that the
probability distribution of the abnormal data (or nonself data) is never
known according to the basic assumption.That is the main reason that
Freitas et al [15] cast doubt on negative selection algorithms.On the other
hand,oneclass learning is a valid need and has been studied from various
aspects and used in many applications [26].In summary,limitation does
exists,but it is not speciﬁc to negative selection algorithms or Vdetector.
It is noteworthy that the probability distribution of only self space could be
taken into account in oneclass classiﬁcation,including negative selection
algorithms.
23
Nevertheless,when used in suitable problems,Vdetector can showits strength
compared with more timetested methods.SVM (Support Vector Machine) is
a popular statistical learning algorithm and did very well in many experiments.
However,such good results do not guarantee that it can replace alternative
methods like Vdetector under any conditions.As a simple example,let us con
sider a scenario when Vdetector is much easier to use than SVM.Two cases
were designed so that the self region is a disconnected region.(1) Fig.16(a) is
a self region that is a circle partially cut by a cross,which we will call “intersec
tion”.This is one of the synthetic data sets tested in earlier work [24].(2) Fig.
16(b) is a self region made of ﬁve small circles.Both are over the unit square
2dimensional search space.
✚✙
✛✘
✚✙
✛✘
✚✙
✛✘
✚✙
✛✘
✚✙
✛✘
(a) ‘Intersection’shaped (b) ‘5 circles’shaped
Figure 16:Two shapes of disconnected self region
Tables 5 and 6 show clearly that SVM does not work as well as negative
selection algorithm when default kernel function is used as in previous experi
ments.That means at least we need to choose proper kernel function to make
SVM work.The correct choice depends on extra knowledge of the problem.
Vdetector got signiﬁcantly better results without the need to reﬁne the control
parameters.
The choice of the kernel function is a known big limitation of SVM [4].In
SVM,when the decision function is not a linear function of the data,the data
needs to be mapped to a higher dimensional space in which a linear separation
can be done.The kernel function plays the key role in the mapping.The best
choice is still a research issue even with prior knowledge.Vdetector and other
approaches that do not use decision function have obvious advantage in this as
pect.The disconnected self region in the examples mentioned above is designed
to make the possible mapping complicated and a simple kernel function hard
to work.Nonlinear problems are very common in the real world applications,
where Vdetector can be used more easily.Furthermore,SVM also has diﬃcul
ties for very large training dataset and discrete data [4].Both cases are where
Vdetector shows its advantage.
24
Table 5:Results over Intersection self region
detection rate
false alarm rate
SVM ν = 0.05
77.67
6.25
Vdetector r
s
= 0.05
99.82
11.44
SVM ν = 0.1
81.84
54.69
Vdetector r = 0.1
96.58
9.69
Table 6:Results over 5circles self region
detection rate
false alarm rate
SVM ν = 0.05
47.51
4.38
Vdetector r
s
= 0.05
99.96
11.21
SVM ν = 0.1
65.58
49.64
Vdetector r = 0.1
97.63
8.33
5 Conclusions
A novel strategy of negative selection algorithm called Vdetector was intro
duced.Its unique features give negative selection algorithms more chances to
be applied successfully in more applications.
• A statistical approach is integrated to analyze the detector coverage in a
negative selection algorithm.It makes the algorithm more reliable.An
eﬀective strategy was developed for implementation.
• Variable sized detectors make maximum coverage with limited number of
detectors.
• Boundaryaware algorithm interprets the training points as a collection
instead of independently.Thus,the boundary of the group of the training
points can be detected.
• The simple generation process makes this method highly eﬃcient.
The detector generation process in Vdetector makes it a prefect platform
to integrate hypothesis testing as a component.Furthermore,it can be imple
mented partly as a byproduct of the generation process without adding much
extra computational cost.
Another advantage of this method is that it applies to any detector schemes
and detection mechanisms as long as it is veriﬁable whether a sample point is
covered or not.For example,extension to other representation will make this
method applicable to a much larger variety of applications.
Many issues in the performance of negative selection algorithms are related
with the properties of the training data.For the comparison and analysis of
negative selection algorithms to be more meaningful,it is important to develop
a framework concerning the fundamental assumptions and to categorize the
types of data to be processed.
25
References
[1] M.Ayara,J.Timmis,R.de Lemos,L.de Castro,and R.Duncan.Nega
tive selection:How to generate detectors.In J.Timmis and P.J.Bentley,
editors,Proceedings of the 1st International Conference on Artiﬁcial Im
mune Systems (ICARIS),volume 1,pages 89–98,University of Kent at
Canterbury,September 2002.University of Kent at Canterbury Printing
Unit.
[2] M.Bereta and T.Burczy´nski.Comparing binary and realvalued coding
in hybrid immune algorithm for feature selection and classiﬁcation of ecg
signals.Eng.Appl.Artif.Intell.,20(5):571–585,August 2007.
[3] P.J.C.Branco,J.A.Dente,and R.V.Mendes.Using immunology
principle for fault detection.IEEE Transactions on Industrial Electron
ics,50(2):362–373,April 2003.
[4] C.J.C.Burges.A tutorial on support vector machines for pattern recog
nition.Data Mining and Knowledge Discovery,2:121–167,1998.
[5] F.Cucker and S.Smale.On the mathematical foundations of learning.
Bulletin (New Series) of the American Mathematical Society,39(1):1–49,
October 2001.
[6] D.Dasgupta and F.Gonzalez.An immunitybased technique to character
ize intrusion in computer networks.IEEE Transactions on Evolutionary
Computation,6(3):1081–1088,June 2002.
[7] D.Dasgupta,K.KrishnaKumar,D.Wong,and M.Berry.Negative selec
tion algorithm for aircraft fault detection.In Proceedings of Third Inter
national Conference on Artiﬁcial Immune Systems (ICARIS 2004),pages
1 – 13,2004.
[8] D.Dasgupta,S.Yu,and N.S.Majumdar.MILA  multilevel immune
learning algorithm.In Proceedings of the Genetic and Evolutionary Com
putation Conference (GECCO 2003),LNCS 2723,pages 183–194,Chicago,
IL,July 1216 2003.Springer.
[9] L.N.de Castro and J.Timmis.Artiﬁcial Immune System:A New Com
putational Intelligence Approach.Springer,2002.
[10] P.D’haeseleer,S.Forrest,and P.Helman.An immunological approach to
change detection:Algorithms,analysis,and implications.In Proceedings
of the 1996 IEEE Symposium on Computer Security and Privacy,pages
110–119,Washington,DC,USA,1996.IEEE Computer Society.
[11] F.Esponda,E.S.Ackley,S.Forrest,and P.Helman.Online negative
databases.In G.N.et al,editor,Proceedings of Third International Con
ference on Artiﬁcial Immune Systems (ICARIS 2004),pages 175 – 188,
September 2004.
26
[12] F.Esponda,S.Forrest,and P.Helman.A formal framework for positive
and negative detection schemes.IEEE Transactions on System,Man,and
Cybernetics,34:357–373,February 2004.
[13] J.L.Fleiss.Statistical Methods for Rates and Proportions.John Wiley &
Sons,1981.
[14] S.Forrest,A.Perelson,L.Allen,R.,and Cherukuri.Selfnonself discrim
ination in a computer.In Proceedings of the 1994 IEEE Symposium on
Research in Security and Privacy,pages 202–212,Los Alamitos,CA,1994.
IEEE Computer Society Press.
[15] A.A.Freitas and J.Timmis.Revisiting the foundation of artiﬁcial immune
systems:A problemoriented perspective.In Proceedings of Second Inter
national Conference on Artiﬁcial Immune System (ICARIS 2003),pages
229–241,2003.
[16] S.M.Garrett.Howdo we evaluate artiﬁcial immune systems?Evolutionary
Computation,13(2):145–178,2005.
[17] F.Gonzalez,D.Dasgupta,and L.F.Nino.A randomized realvalue nega
tive selection algorithm.In Proceedings of Second International Conference
on Artiﬁcial Immune System (ICARIS 2003),pages 261–272,September
2003.
[18] E.Hart.Not all balls are round:An investigation of alternative recognition
region shapes.In ICARIS,pages 29–42,2005.
[19] D.Haussler.Probably approximately correct learning.In National Confer
ence on Artiﬁcial Intelligence,pages 1101–1108,citeseer.ist.psu.edu/
haussler90probably.html,1990.
[20] C.A.Hawkins and J.E.Weber.Statistical Analysis  Applications to
Business and Economics.Harper & Row,Publishers,New York,1980.
[21] R.V.Hogg and E.A.Tanis.Probability and Statistical Inference.Prentice
Hall,6th edition,2001.
[22] Z.Ji.A boundaryaware negative selection algorithm.In Proceedings of
IASTED International Conference of Artiﬁcial Intelligence and Soft Com
puting (ASC 2005),pages 379–384,Spain,September 2005.
[23] Z.Ji and D.Dasgupta.Realvalued negative selection algorithm with
variablesized detectors.In LNCS 3102,Proceedings of GECCO,pages
287–298,2004.
[24] Z.Ji and D.Dasgupta.Estimating the detector coverage in a negative
selection algorithm.In H.G.Beyer and et al,editors,GECCO 2005:Pro
ceedings of the 2005 conference on Genetic and evolutionary computation,
volume 1,pages 281–288,Washington DC,USA,2529 June 2005.ACM
Press.
27
[25] Z.Ji and D.Dasgupta.Applicability issues of the realvalued negative
selection algorithms.In Genetic and Evolutionary Computation Conference
(GECCO 2006),pages 111–118,Seattle,Washington,812 July 2006.
[26] R.E.SanchezYanez,E.V.Kurmyshev,and A.Fernandez.Oneclass
texture classiﬁer in the CCR feature space.Pattern Recognition Letters,
24:1503–1511,2003.
[27] J.M.Shapiro,G.B.Lamont,and G.L.Peterson.An evolutionary al
gorithm to generate hyperellipsoid detectors for negative selection.In
H.G.Beyer,U.M.O’Reilly,D.V.Arnold,W.Banzhaf,C.Blum,E.W.
Bonabeau,E.CantuPaz,D.Dasgupta,K.Deb,J.A.Foster,E.D.de Jong,
H.Lipson,X.Llora,S.Mancoridis,M.Pelikan,G.R.Raidl,T.Soule,A.M.
Tyrrell,J.P.Watson,and E.Zitzler,editors,GECCO 2005:Proceedings
of the 2005 conference on Genetic and evolutionary computation,volume 1,
pages 337–344,Washington DC,USA,2529 June 2005.ACM Press.
[28] T.Stibor,P.Mohr,J.Timmis,and C.Eckert.Is negative selection ap
propriate for anomaly detection?In H.G.Beyer,U.M.O’Reilly,D.V.
Arnold,W.Banzhaf,C.Blum,E.W.Bonabeau,E.CantuPaz,D.Das
gupta,K.Deb,J.A.Foster,E.D.de Jong,H.Lipson,X.Llora,S.Man
coridis,M.Pelikan,G.R.Raidl,T.Soule,A.M.Tyrrell,J.P.Watson,and
E.Zitzler,editors,GECCO 2005:Proceedings of the 2005 conference on
Genetic and evolutionary computation,volume 1,pages 321–328,Washing
ton DC,USA,2529 June 2005.ACM Press.
[29] T.Stibor,J.Timmis,and C.Eckert.A comparative study of realvalued
negative selection to statistical anomaly detection techniques.In ICARIS,
pages 262–275,2005.
[30] D.M.J.Tax.Oneclass classiﬁcation.PhD thesis,Technische Universiteit
Delft,2001.
[31] S.Wierzchon.Discriminative power of the receptors activated by k
contiguous bits rule.Journal of Computer Science and Technology,1(3):1–
13,2000.
28
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο