1
Statistical Performance Evaluation of Biometric
Authentication Systems using Random Effects
Models
Sinjini Mitra
∗
Marios Savvides Anthony Brockwell
Information Sciences Institute ECE Department Statistics Department
University of Southern California Carnegie Mellon University Carnegie Mellon University
mitra@isi.edu msavvid@ece.cmu.edu abrock@stat.cmu.edu
Abstract
As biometric authentication systems become more prevalent,it is becoming increasingly important
to evaluate their performance.The current paper introduces a novel statistical method of performance
evaluation for these systems.Given a database of authentication results from an existing system,the
method uses a hierarchical random effects model,along with Bayesian inference techniques yielding
posterior predictive distributions,to predict performance in terms of error rates using various explanatory
variables.By incorporating explanatory variables as well as random effects,the method allows for
prediction of error rates when the authentication system is applied to potentially larger and/or different
groups of subjects than those originally documented in the database.We also extend the model to allow
for prediction of the probability of a false alarm on a “watchlist” as a function of the list size.We
consider application of our methodology to three different face authentication systems:a ﬁlterbased
system,a Gaussian Mixture Model (GMM) based system,and a system based on frequency domain
representation of facial asymmetry.
2
Index Terms
biometrics,face,authentication,performance evaluation,random effects model,watchlist
I.I
NTRODUCTION
In statistical literature,the terms “biometrics” and “biometry” have been used since the early
20th century to refer to the ﬁeld of development of mathematical methods applicable to data
analysis problems in the biological sciences.More recently,however,the term biometrics has
been used to denote the unique biological traits (physical or behavioral) of individuals,such as
face images,ﬁngerprints,iris,voiceprint,etc.that can be used for identiﬁcation.Since these traits
cannot be stolen,lost or forgotten,they offer better inherent security and reliability in identifying
people ([3]) and there is concerted effort to replace traditional means of identiﬁcation such as
use of passwords or PINs with biometricbased authentication systems.The recentlyintroduced
practice of recording biometric information (photo and ﬁngerprint) of foreign passengers at U.S.
airports and the proposed inclusion of digitized photos in passports demonstrates the growing
importance of biometric authentication to the U.S.federal government.
With any biometric authentication system,it is important to be able to carry out performance
evaluation,that is,to assess how well it serves its purpose of matching biometric samples
obtained from people to stored templates synthesized from training data.To this end,several
statistical methods have been proposed to date,exploiting the correspondence between the
authentication problem and statistical decision theory.An overview of these methods can
be found in [14] and [11].[13] discusses certain decision landscapes that can be used to
characterize several forms of biometric decision making.[8] and [9] suggest the use of binomial
distributions,normal approximations,and bootstrapping for estimating the error rate conﬁdence
intervals and developing tests of signiﬁcance,while [15] proposes the use of betabinomial
3
distribution.Computations are much simpliﬁed if the underlying score distributions conform
to a Gaussian distribution,and this can be checked via various descriptive statistics including
sample skewness and kurtosis coefﬁcients ([8]).An effective statistical means of presenting the
matching performance of any diagnostic device is the “receiver operating characteristic” (ROC)
curve,used widely in clinical studies to measure the effectiveness of drugs and medical devices
([16]),and in many signal processing applications ([12]).ROC curves are in common use today
for evaluating the performance of biometric systems ([1]).
In a number of cases,an authentication system may be applied in conjunction with a watch
list.A watchlist refers to a database of people who are of some interest.For example,the
FBI may be watching criminals who are on a socalled “do not ﬂy” list at airports.Watchlist
systems like these,based only on the use of names,tend to produce a lot of false alarms.For
example,according to the Washington Post (August 20,2004),U.S.Senator Edward M.“Ted”
Kennedy was stopped and questioned at airports on the East Coast ﬁve times in March 2004
because his name appeared on the government’s secret “noﬂy” list ([2]).This kind of incident
demonstrates the fragility of the namebased system,and highlights the potential usefulness of
biometric identiﬁers,such as face,ﬁngerprints,to be associated with the name for better and
more reliable outcomes.The Face Recognition Vendor Test (FRVT,2002) conducted by NIST
reported that the probability that a system correctly identiﬁes an individual on the watchlist
usually deteriorates as the watch list size grows ([17]),and hence such lists should be kept as
small as possible for effective results.
This paper addresses the issue of performance evaluation of a biometric system using a novel
statistical framework that allows for prediction of misclassiﬁcation rates on a population,along
with false alarm probabilities for watchlist detection.The rest of the paper is organized as
follows.Section II presents our statistical framework of performance evaluation and Section III
4
introduces our random effects model methodology.Section IV describes the application of our
technique to three face authentication systems,and an extension of our method to the watchlist
problem is included in Section V.We conclude with some additional discussion in Section VI.
II.T
HE
S
TATISTICAL
F
RAMEWORK FOR
P
ERFORMANCE
E
VALUATION
The result of an authentication algorithm is a match score Y between the test image of a
person and a stored template,and a threshold τ.If Y > τ,the system returns a match and the
person tested is called an “authentic”,otherwise if Y ≤ τ,the system decides that a match has
not been made and the person tested is an “impostor”.Then the False Rejection Rate (FRR) and
the False Acceptance Rate (FAR) are deﬁned as:
FRR = P(Y ≤ τY ∈ Authentic) =
τ
−∞
f
A
(x)dx
FAR = P(Y > τY ∈ Impostor) =
∞
τ
h
I
(y)dy,(1)
where f
A
(·) and h
I
(·) respectively denote the distributions of the match scores for the authentics
and the impostors.These are analogous to Type I and Type II error respectively in statistical
inference and form the quantities of interest in terms of evaluating the matching performance of
a biometric device.Generally they are unknown and must be estimated from the observed data.
Many biometric authentication systems developed today are intended for use with databases
containing information on millions of people,but only tested on databases with hundreds of
people.(For example,millions of people pass everyday through airports.False alarms in noﬂy
watchlists lead to much inconvenience for travelers,[3].) The limited scope of testing makes it
difﬁcult to address questions about the expected performance of the system when used with the
full database.For example,a system with a false alarm rate of 1% would yield around 10,000
false alarms in use on a database of size 1,000,000,but could well appear to be functioning
perfectly in a test database of size 100.Moreover,it is known that systems based on the use
5
of images,particularly faces,have difﬁculty when images are taken under different illumination
levels,orientations,expression types,etc.In other words,false alarm rates can be dependent on
a number of factors.In light of these issues,the following questions are pertinent.
•
Howdo different image properties affect the match score of a biometric systemin the general
population,and what are the predicted score distributions for authentics and impostors,based
on these image properties?
•
What error rates (both FAR and FRR) can be expected when a certain biometric system is
applied to a large unknown database?
•
How does the performance of a system on a watchlist (measured in terms of the false
alarms) change with the list size?
The effect of image properties on the score can be studied with the help of a simple regression
model ([18]) using the match score as the response variable.The estimated regression coefﬁcients
tell us to what extent the performance of the system varies with variations in the particular image
properties.[24] proposed the use of ANOVA to study the statistical effects of demographic
covariates such as age,sex,race,facial hair,etc.on face recognition algorithms,and [25] made
use of logistic regression to relate subject covariates to rankone recognition rates.Although
effective,the regression/ANOVA models are ﬁxed effects models,meaning that the inference
about the covariate effects are restricted to the database at hand,and cannot easily be generalized
to a larger (or different) database drawn from the same population.These analyses,although
useful for any particular dataset,provide only a baseline for more extensive studies required to
make general inference for a population.Such a framework can be provided by a random effects
model ([21]) and is the focus of this paper.
A random effects model assumes that the particular subset of subjects in the present database
is a random sample from a bigger population,so that the inference easily extends to that bigger
6
population.In other words,valid inference about covariate effects and predicted error rates
(conﬁdence intervals,say) can be made if another subset of individuals (different from the
current sample) is drawn from this population.(Some work has been done on this kind of
problem [30] proposed a binomial model for comparing watchlist recognition performance with
empirical identiﬁcation and false match rates for large populations,but we aim to develop a more
generally applicable framework.) The random effects model takes into account heterogeneity
across individuals in their regression coefﬁcients with the help of a probability distribution unlike
ﬁxed effects models,thus capturing more inherent variability in the data that involve repeated
measures or multilevel data structures.[26] ﬁrst made use of random effects in a Generalized
Linear Mixed Model (GLMM) framework for predicting the probability of correct veriﬁcation
of the PCA algorithm at different false acceptance rates based on subject covariates including
hairstyle,gender,age,etc.In this paper,we develop this random effects framework in the context
of predicting performance of biometric systems in terms of various explanatory variables.
III.T
HE
R
ANDOM
E
FFECTS
M
ODEL
Consider a certain biometric authentication system whose performance we wish to evaluate.
Suppose that there are k people in a certain database,and n
i
test images for the ith person,
i = 1,...,k (which gives a total of n =
k
i=1
n
i
test images for the entire database).For each
of these k individuals,there is a stored template developed in the training stage.Typically in
the authentication stage,each of the n images is matched to each of these k templates.
Let Y
ij
denote the match score for the j
th
test image of the i
th
subject in the database,when
tested against one of the templates stored in the database.Also let x
(m)
ij
,m= 1,2,...,M be a
collection of explanatory variables associated with the image.Rather than explicitly indicating in
the notation for Y
ij
what particular template the image is matched against,we typically assume
(for the sake of notational convenience) that one of the covariates is an indicator variable equal
7
to one if the subject is matched against his/her own template,and equal to zero otherwise.In
addition,any factor that is expected to affect the matching performance of a biometric device
can be used as a covariate,including subject factors such as age,sex,hairstyle.Image properties,
such as level of noise,levels of occlusion (if present),different expressions,different illumination
levels,can also be used,along with other concomitant variables like the number of training
images used,and system design parameters.
Our random effects model is given by
g(Y
ij
)
ind.
∼ N(α
i
+
M
m=1
β
m
i
x
(m)
ij
,σ
2
),i = 1,...,k,j = 1,...,n
i
,(2)
θ
i
= (α
i
,β
1
i
,...,β
M
i
)
T
ind.
∼ MV N(θ
0
,Σ),i = 1,...,k,(3)
where g(·) is a monotonic function referred to as the link function,θ
0
= (α
0
,β
1
0
,...,β
M
0
) is
an (M +1)dimensional vector,and Σ is an (M +1) ×(M +1) matrix.The link function is
a transformation,so chosen as to ensure conformity of the response variable to the underlying
assumptions of the model,such as,normality,homoscedasticity.etc.
This model is a multivariate generalization of the hierarchical random effects model (used
for example,by [21] for the analysis of the weights of young laboratory rats).Apart from the
possibly nonlinear link function,it supposes linear dependency with homogeneous errors,but
allows for the possibility of different slopes and intercepts for each individual,thus accounting
for the heterogeneity in the effects across individuals.Although not included in the form of
the model given above,it is straightforward to construct variants including interaction effects
between covariates.
We adopt a Bayesian approach for estimating model parameters and making inference.First,
let
y = (g(y
ij
),i = 1,...,k,j = 1,...,n
i
)
8
denote our observed match scores,after transforming using the link function g(·).We assign
(conjugate) prior distributions to σ
2
,θ
0
and Σ,
σ
2
∼ IG(a,b),θ
0
∼ N(η,C),Σ
−1
∼ Wishart((ρR)
−1
,ρ),(4)
where R is a matrix and ρ ≥ 2 is a scalar degreesoffreedom (df) parameter.Note that this
notion of df is different from the one used in [13] − we use it as a parameter characterizing
the Wishart distribution ([21]) while the latter was used as a means to assess the complexity
of a biometric and essentially measured by the number of independent dimensions of variation,
or the number of independent yesno questions that a biometric decision is based upon.All the
hyperparameters in the model (parameters of the prior distributions) a,b,η,C,ρ,R are assumed
known,and the parameters to be estimated are θ
i
,θ
0
,Σ and σ
2
.Owing to the use of conjugate
priors,the posterior distributions of the unknown parameters have closedform expressions.We
use Gibbs sampling ([22]) to simulate from the conditional posterior distributions of each of the
four unknown parameters given the remaining three,termed full conditionals,given by:
θ
i
y,θ
0
,Σ
−1
,σ
2
∼ N
D
i
(
1
σ
2
X
T
i
y
i
+Σ
−1
θ
0
),D
i
,i = 1,...,k,(5)
where D
−1
i
=
1
σ
2
X
T
i
X
i
+Σ
−1
,y
i
=
⎛
⎜
⎜
⎜
⎜
⎜
⎝
y
i1
.
.
.
y
in
i
⎞
⎟
⎟
⎟
⎟
⎟
⎠
,X
i
=
⎛
⎜
⎜
⎜
⎜
⎜
⎝
1 x
1
i1
...x
M
i1
.
.
.
1 x
1
in
i
...x
M
in
i
⎞
⎟
⎟
⎟
⎟
⎟
⎠
,
θ
0
y,θ
i
,Σ
−1
,σ
2
∼ N(V (kΣ
−1
¯
θ+C
−1
η),V ),where V = (kΣ
−1
+C
−1
)
−1
and
¯
θ =
1
k
k
i=1
θ
i
,
(6)
Σ
−1
y,θ
i
,θ
0
,σ
2
∼ Wishart
⎛
⎝
k
i=1
(θ
i
−θ
0
)(θ
i
−θ
0
)
T
+ρR
−1
,k +ρ
⎞
⎠
,(7)
σ
2
y,θ
i
,θ
0
,Σ
−1
∼ IG
n
2
+a,
1
2
k
i=1
(y
i
−X
i
θ
i
)
T
(y
i
−X
i
θ
i
) +b
,where n =
k
i=1
n
i
,
(8)
9
where IG denotes an inverse Gamma distribution (if Y ∼ Gamma,1/Y ∼ IG).The hyperprior
values are so chosen as to determine very vague priors,namely,C
−1
= 0 (so that η disappears
completely from the full conditionals),a = b = 1/ where = 0.001 (this is done in accordance
with [21]).To ensure proper mixing of our Markov chains,we use 10 different starting values for
our chains and assess convergence and mixing properties with the help of trace plots,cumulative
sums and autocorrelation plots ([27]).
Let Ψ ≡ {θ
i
,θ
0
,Σ,σ
2
} denote the collection of the parameters to be estimated for the
model.Then the Gibbs sampler yields a Markov chain {Ψ
(k)
,k = 1,2,...} whose distribution
converges to the true posterior distribution of the parameters.The parameters are estimated using
the posterior mean formed by the ergodic average of the Markov chain.To reduce bias associated
with the fact that the chain takes time to converge,we discard the ﬁrst N
1
samples as burnin,
that is,our parameter estimates are
ˆ
Ψ=
1
(N −N
1
)
N
j=N
1
+1
Ψ
(j)
E{Ψy}.(9)
(We choose N
1
by visual inspection,choosing it so that after iteration N
1
,the Markov chain
appears to have settled into its steadystate behavior.)
A.Inference
We make inference from this model based on the marginal posteriors for the population
parameters θ
0
= (α
0
,β
1
0
,...,β
M
0
) and posterior predictive distributions of the match scores.The
parameters θ
0
determine quantitatively the effects of the different covariates on the authentication
score that is expected in a general population;more precisely,they determine by how much the
score changes for unit changes in the corresponding covariate value while keeping the others
ﬁxed.Standard deviations of the posterior distributions as well as credible intervals (conﬁdence
intervals based on the posterior distributions) can be constructed to assess the reliability of the
10
point estimates.The posterior predictive distributions are computed by generating new data from
Equation 2 using the N − N
1
postconvergence values of the parameters (after burnin).The
predictive distribution of the linkfunctiontransformed match score g(Y ) for a score Y of a new
individual from the population can be estimated using a kernel density of the form
p(g(y)y) =
p(g(y)θ
i
,σ
2
)p(θ
i
,σ
2
y)∂θ
i
∂σ
2
.(10)
We ﬁt a mixture of Gaussian kernels to the posterior predictive distributions of g(y) since
the authentic and impostor score distributions are expected to be wellseparated.We also use
Gaussian kernels to estimate the posterior distribution of θ
0
using the postconvergence values
of the parameter estimates.
The posterior predictive distributions of the authentication score statistics can be used to
estimate the predicted FAR and FRR for the system.One way to do this is simply to simulate
directly from the distribution of g(Y ) as speciﬁed above,invert the linkfunction transformation
g(·),and repeat many times (with appropriately chosen values of x
·j
) to form empirical estimates
of the FAR and FRR for speciﬁed threshold values τ.Another approach is based on the
observation that the posterior predictive distribution of g(Y
qj
) can be wellapproximated by
a Gaussian distribution.Under this approximation,we can obtain exact closedform expressions
for the FAR and FRR as functions of the threshold τ.The two error rates are
FRR = P(g(Y ) ≤ g(τ)Y ∈ Authentic)
FAR = P(g(Y ) > g(τ)Y ∈ Impostor).(11)
Approximating g(Y ) ∼ N(μ
A
,σ
2
A
) when Y is the score of an authentic and g(Y ) ∼ N(ν
I
,η
2
I
)
when Y is the score of an impostor,the error rates can be written in closed form in terms of
the cumulative distribution function Φ of a standard normal random variable as
FRR = Φ
g(τ) −μ
A
σ
A
,FAR = 1 −Φ
g(τ) −ν
I
η
I
.(12)
11
The parameters of the Gaussian distributions are estimated from the predictive posteriors,with
appropriately chosen covariate values x
·j
.One would typically then plot FRR and FAR for
different values of τ and the point where FRR and FAR are equal gives the predicted Equal
Error Rate (EER) for the system.
IV.A
PPLICATIONS
In this section we describe the application of our proposed methodology to three existing face
authentication systems − (i) the Minimum Average Correlation Energy (MACE) ﬁlter system
([1]) (ii) a system based on Gaussian Mixture Model (GMM) and Fourier domain phase spectra
([23]),and (iii) a system based on the frequency domain representation of facial asymmetry
([28]).These systems are applied to a database at hand,and then the match scores from those
applications are used along with the covariate information that is available from the data,to
develop the random effects models for making inference and prediction.
A.Data
For the ﬁrst two face authentication systems,we use a subset of the “CMUPIE Database”
([4]) which contains frontal images of 65 people under 21 different illumination conditions and
neutral expressions.Images belonging to one person from the database are shown in Figure
1.All these images have been normalized using afﬁne transformations based on locations of
Fig.1.Sample images of a person from the CMUPIE database.
the eyes and nose as is common and necessary for most computer vision problems.The ﬁnal
12
cropped images are grayscale and of dimension 100 ×100 pixels.
For the third authentication system,we use a subset of the “CohnKanade Facial Expression
Database” ([29]),consisting of images of 55 individuals expressing three different kinds of
emotions − joy,anger and disgust.Each person was asked to express one emotion at a time by
starting with a neutral expression and gradually evolving into its peak form.The data thus consist
of video clips of people showing an emotion,each clip being broken down into several frames.
We work with 9 frames in all per person − 3 neutral images and 6 images with peak forms of the
3 emotions.The raw images are cropped and normalized like the PIE images,and the ﬁnal images
are grayscale and of dimension 128×128 pixels.Some normalized images from our database are
shown in Figure 2.This database was used to develop this asymmetrybased face authentication
Fig.2.Sample images from the CohnKanade database.
system ([28]),therefore we use it.The CMUPIE images have illumination variations in them
and hence are unsuitable for studying the role of facial asymmetry in authentication tasks.
B.Application:The MACE ﬁlter System
The MACE ﬁlter system was introduced as a face authentication system based on a linear
ﬁlter constructed in the frequency domain ([1]).One ﬁlter is synthesized for each subject based
on 3 training images so chosen as to represent the three different types of lighting variations
in the PIE images − left shadows,balanced and right shadows.We treat the rest 18 images
per person as test images and apply each of the 65 ﬁlters to each image via correlation.The
13
resulting authentication criterion for this system is known as the PeaktoSidelobe Ratio (or,PSR
for short) and it measures the height of the correlation peak at the origin of the correlation plane
relative to the neighboring area.More details about this method are available in [1].The random
effects model in this case can be written as:
log(Y
ij
)
iid.
∼ N(α
i
+β
i
x
0
ij
+γ
i
x
1
ij
,σ
2
).(13)
Here Y denotes the PSR value,and the logarithmic link function is used because log(PSR)
conforms better to the normality and homoscedasticity assumptions of the random effects model
than PSR (shown by Figure 3 and Table I).X
0
represents the authenticity covariate (binary
variable assuming the value “1” for an authentic test image and “0” for an impostor test image)
and X
1
represents the covariate denoting the illumination condition of an image (values 1 −21
for the 21 images of a person).
0
2
0
4
0
60
80
1
00
12
0
14
0
0
1
2
3
4
5
6
7
8
9
x 10
4
1
1.
5
2
2.
5
3
3
.
5
4
4.
5
5
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
4
(a) PSR (b) log(PSR)
2.
5
3
3
.
5
4
4.
5
5
0
50
100
150
200
250
300
350
1.4
1.
6
1.
8
2
2.2
2.4
2.
6
2.
8
3
3
.2
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
(c) log(Authentic PSR) (d) log(Impostor PSR)
Fig.3.Histograms of the PSR values,log(PSR) values from the CMUPIE database (all combined,and separately for authentics
and impostors).The xaxis shows the PSRor log(PSR),whichever applicable,and the yaxis shows the corresponding frequencies.
14
TABLE I
S
OME DESCRIPTIVE STATISTICS FOR
PSR
AND LOG
(PSR)
DISTRIBUTIONS FROM APPLYING THE
MACE
FILTER TO THE
TEST IMAGES
.
Variable
Measure
Authentic
Impostor
All combined
PSR
Skewness
1.2389
0.1254
6.9209
Kurtosis
3.8564
3.0722
91.7597
Standard deviation
23.4618
3.2542
7.3048
log(PSR)
Skewness
0.3437
0.7087
0.3954
Kurtosis
2.8386
1.5201
4.1944
standard deviation
5.4122
8.2985
8.2872
The choice of the starting values of the parameter chains did not affect the results in any way
− similar convergence values are obtained from the multiple chains.We run the simulations from
the posterior conditionals given in Equations (5)(8) for 5000 iterations and some diagnostic plots
to assess convergence of the parameter chains are shown in Figure 4.Both sets of plots show
satisfactory convergence and mixing − no signiﬁcant correlation at different iteration lags and
sample paths apparently exploring the region of support after moving away quickly from the
starting values (β
1
,θ
i
,σ
2
and Σ also converged but we do not include those plots here for space
considerations).No crosscorrelation was observed among the different parameter chains.
The parameter estimates of θ
0
are formed by using the posterior means after a burnin of length
2000,and they appear in Table II along with the associated 95% credible intervals.These form
the basis for making population inference.α
0
denotes the mean log(PSR) value over the entire
population,β
0
denotes the difference in the mean log(PSR) values between an authentic and an
impostor person in the population and γ
0
denotes the change in the mean log(PSR)value when
the illumination level changes by unity.The credible interval for γ
0
shows that the illumination
level of an image does not signiﬁcantly affect the PSR (since it contains zero) and hence the
15
0
1
000
2
000
3000
4
000
5000
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
1
000
2
000
3000
4
000
5000
1
1.2
1.4
1.6
1.8
2
2.2
2.4
0
5
10
15
20
0.2
0
0.2
0.4
0.6
0.8
Lag
autocorrelation
0
5
10
15
20
0.2
0
0.2
0.4
0.6
0.8
Lag
autocorrelation
Fig.4.Diagnostics to assess convergence of the parameter chains for θ
0
 α
0
(left) and β
0
(right).The ﬁrst row shows the
trace plots where the xaxis shows the iterations and the yaxis the values of the respective parameters.The second row shows
the autocorrelation functions.
MACE ﬁlter system is expected to be robust to illumination variations.The authenticity variable
however,is seen to have a statistically signiﬁcant effect on PSR,(interval does not contain
zero),and this is reasonable and shows that the MACE system is able to distinguish between the
authentic and the impostor score distributions.The estimated posterior marginal distributions of
θ
0
appear in Figure 5,giving an idea about the nature of its values for a general population of
face images (we have omitted the distribution for γ
0
since it was not observed to be signiﬁcant).
TABLE II
P
OINT ESTIMATES AND
95%
CREDIBLE INTERVALS FOR THE POPULATION PARAMETERS
θ
0
FOR THE
MACE
SYSTEM
.
Parameter
Estimate
Lower Limit
Upper Limit
α
0
1.9737
0.7504
3.1970
β
0
1.4634
1.2874
1.6394
γ
0
0.0184
0.1965
0.1597
Next,we generate values of log(PSR) from the model using the postconvergence values of
16
2
0
2
4
6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
.
5
1
1.
5
2
0
0.01
0.02
0.03
0.04
(a) α
0
(b) β
0
Fig.5.Estimated marginal posterior distributions of θ
0
,using Gaussian kernel densities.The xaxis shows the respective
parameter values and the yaxis the probability values.
θ
i
and σ
2
(iterations 2001 − 5000 of the Gibbs sampler),and estimate the density with the
help of Gaussian kernels (Figure 6).As can be seen clearly,there exists a clear separation
among the predicted log(PSR) values of authentic and impostor people;in fact,the distribution
of log(PSR) appears to be a mixture of two distributions − one very nearly Gaussian (impostor)
and the other a slightly positivelyskewed one (authentic).The mean of the authentic cases is
also higher than that of the impostor cases.The little mass on the rightmost end of the tail of the
authentic score distribution shows that the systempredicts a few very high values of the authentic
log(PSR) values in the population (the bimodality here is not so serious here to cause concern
for subsequent analysis).The amount of overlap in the tails of the two distributions in Figure
6(a) − the right tail of the impostor distribution and the left tail of the authentic distribution −
determines the false alarm error rates and the false negative rates.
Based on the posterior predictive distributions,the parameters of the Gaussian densities for the
authentic and the impostor log(PSR) are estimated as:ˆμ
A
= 4.1331,ˆν
I
= 1.9265,ˆσ
A
= 0.6316
and ˆη
I
= 0.1471.The resulting FAR and FRR for different selected thresholds on the PSR values
(exponentiating the log(PSR) values) computed according to Equation 12 are shown in Figure
7(a) (the theoretical curve),while the empirical error curve appears in Figure 7(b).Note that
these curves are variants of the popular ROC curves and we use these instead to represent the
17
1
2
3
4
5
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
2
2.
5
3
3
.
5
4
4.
5
0
0.02
0.04
0.06
0.08
1.
6
1.
8
2
2.2
2.4
0
0.01
0.02
0.03
0.04
(a) All values (b) Authentic (c) Impostor
Fig.6.Estimated predictive posterior distribution of log(PSR) using a Gaussian kernel density function.The xaxis shows the
log(PSR) values,and the yaxis shows the associated probability values.
performance of a biometric system because we use EER as our single quantitative performance
measure.Both the curves yield a similar predicted EER value around 1.2 − 1.5% at optimal
threshold PSR values of 10 −15.The threshold value is also close to the ruleofthumb value
of 20 which is conventional in its applications ([1]).
0
20
40
60
80
0
0.2
0.4
0.6
0.8
1
Thresholds
Error Rates (FAR, FRR)
0
20
40
60
80
0
0.01
0.02
0.03
0.04
Thresholds
Error Rates (FAR, FRR)
(a) Theoretical curve (b) Empirical curve
Fig.7.Predicted error curves for authentication using the MACE ﬁlter system.The solid descending curve represents the FAR
and the ascending dashed one the FRR.The point at which the FAR and FRR curves meet is the EER.
C.Application:GMMbased System
We next apply the random effects model technique to a GMMbased authentication system
synthesized in the frequency domain based on the phase spectra of a face image ([23]).For
18
this system,the classiﬁcation and veriﬁcation are performed using a MAP estimate and hence
the authentication match score is the posterior loglikelihood of the test images.10 images per
person are used to build the model and the rest 11 are used for testing.Using similar covariates
as for the MACE ﬁlter system,the random effects model in this case is:
Y
ij
iid
∼ N(α
i
+β
i
x
0
ij
+γ
i
x
1
ij
,σ
2
),(14)
where X
0
denotes the explanatory variable representing authenticity of an image,X
1
represents
the illumination level of an image and Y is the posterior loglikelihood (the match score).The
histograms of the posterior loglikelihood values for the PIE database test images are shown
in Figure 8,and they indicate that the assumptions of normality and homoscedasticity hold
sufﬁciently well (the impostor distribution is slightly skewed but the departure is insigniﬁcant).
1700
1680
1660
1640
1620
1600
1580
1560
1540
0
10
20
30
40
50
60
70
80
90
100
Authentic posterior loglikelihood
1950
1900
1850
1800
1750
1700
0
1000
2000
3000
4000
5000
6000
7000
8000
Impostor posterior loglikelihood
(a) Authentic (b) Impostor
Fig.8.Histograms of the posterior loglikelihood used for authenticating the PIE database using the GMMbased method.
The Gibbs sampler used to simulate from the posterior distribution stabilized around 500
iterations in this case which we use as burnin for a total run of 2000 iterations.The trace plots
for the parameter of interest θ
0
in Figure 9 show satisfactory convergence.As with the MACE
ﬁlter,we looked at autocorrelation and crosscorrelation of the parameter chains and they all
indicated good mixing (we omit those plots from here due to space reasons).Table III shows the
point estimates of θ
0
and the associated credible intervals.α
0
denotes the mean loglikelihood
19
0
500
1
000
1
500
2
000
1700
1695
1690
1685
1680
1675
1670
0
500
1
000
1
500
2
000
60
65
70
75
80
85
90
Fig.9.Trace plots for α
0
and β
0
.The xaxis shows the iterations and the yaxis the corresponding parameter values.
value over the entire population and β
0
denotes the difference in the mean loglikelihood values
between an authentic and an impostor person in the population.The estimated posterior marginals
for θ
0
appear in Figure 10.Moreover,both the parameters of α and β
0
are signiﬁcantly different
from zero (intervals do not include 0),hence the effects of the corresponding covariates on
the loglikelihood values are statistically signiﬁcant for any large population.The illumination
variation does not have a signiﬁcant effect on the match score like the MACE system,and this
is reasonable since the GMM method was also shown to be robust to illumination changes.
TABLE III
P
OINT ESTIMATES AND
95%
CREDIBLE INTERVALS FOR THE POPULATION PARAMETERS
θ
0
FOR THE
GMM
BASED SYSTEM
.
Parameter
Estimate
Lower Limit
Upper Limit
α
0
1694.4
1698.0
1690.8
β
0
81.7
76.7
86.7
γ
0
2.8
5.4
11.0
Next,we generate values of the loglikelihood from the model using the postconvergence
values of θ
i
and σ
2
(iterations 501 −2000 of the Gibbs sampler),and estimate the density with
the help of Gaussian kernels (Figure 11).The amount of overlap in the tails of the authentic and
impostor distributions seems to be fairly negligible in this case,indicating much reduced risk of
false alarms and false negatives than the MACE ﬁlter system.
20
17
05
17
00
1
695
1
690
1
685
0
0.005
0.01
0.015
0.02
0.025
0.03
7
0
7
5
80
85
90
0
0.005
0.01
0.015
0.02
0.025
(a) α
0
(b) β
0
Fig.10.Estimated marginal posterior distributions of θ
0
,using Gaussian kernel densities.
17
50
17
00
1
650
1
600
1
550
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
1
660
1
6
4
0
1
6
2
0
1
600
1
580
1
560
0
0.01
0.02
0.03
0.04
174
0
172
0
17
00
1
680
1
660
0
0.01
0.02
0.03
0.04
(a) All values (b) Authentic (c) Impostor
Fig.11.Predictive posterior distribution of posterior loglikelihood,using Gaussian kernel density functions.The xaxis shows
the loglikelihood values and the yaxis the associated probabilities.
The parameter estimates of the Gaussian posterior predictive distributions for the authentic
and the impostor loglikelihoods are:ˆμ
A
= −1610.3,ˆν
I
= −1701.2,ˆσ
A
= 19.83 and ˆη
I
= 14.7.
The predicted FAR and FRR for the system appear in Figure 12.Both the curves have a similar
EER value,around 0.8 − 1% at optimal threshold loglikelihood values between −1650 and
−1700.These are veriﬁcation results one would expect when applying the GMMbased system
to a general large database of face images.
D.Application:Facial Asymmetrybased System
Our next application pertains to a face authentication system based on a frequency domain
representation of facial asymmetry ([28]).Onebit codes called the Facial Asymmetry Code (or,
FAC for short) are used as features along with a Hamming distance classiﬁer that computes the
number of matched bits between a test image and trained templates to decide if the image is
21
1750
1700
1650
1600
1550
0
0.2
0.4
0.6
0.8
1
Thresholds
Error Rates (FAR, FRR)
1750
1700
1650
1600
1550
0
0.2
0.4
0.6
0.8
1
Thresholds
Error Rates (FAR, FRR)
(a) theoretical (b) empirical
Fig.12.Predicted error curves for authentication for the GMMbased system.The descending solid curve represents the FAR
and the ascending dashed one the FRR.
authentic or not.The templates are synthesized based on the 3 neutral images of each person,
and the 6 images with expression variations are used as the test images in this case.The random
effects model in this case is:
Y
ij
iid
∼ N(α
i
+β
i
x
0
ij
+γ
i
x
1
ij
,σ
2
),(15)
where X
0
denotes the explanatory variable representing authenticity,X
1
denotes the covariate
representing different expressions (0 for neutral,1 for joy,2 for anger and 3 for disgust) and Y
ij
is the number of matched bits of the jth test image for the ith person when compared to a stored
template.Figure 13 shows the distributions of the number of matched bits for the authentic and
the impostors using the CohnKanade database,which as in the case of the two earlier systems,
do not exhibit any major departure from the assumptions of normality and equality of variance.
A Gibbs sampler based on the posterior distributions is used to ﬁt the model,and some trace
plots for θ
0
appear in Figure 14 which show satisfactory convergence in about 2000 iterations.
The chains stabilized around 500 which was used as the burnin.Again autocorrelation plots are
not included here but they showed good mixing properties of the chains.Table IV shows the
parameter estimates based on the posterior means and the credible intervals for θ
0
.α denotes
22
1200
1400
1600
1800
2000
2200
2400
2600
0
10
20
30
40
50
60
70
80
Matched bits
300
400
500
600
700
800
900
1000
1100
1200
0
1000
2000
3000
4000
5000
6000
Matched bits
(a) Authentic (b) Impostor
Fig.13.Histograms of the number of matched bits used for authenticating the CohnKanade database using the FACbased
system.
0
500
1
000
1
500
2
000
10
11
12
13
14
15
16
17
0
500
1
000
1
500
2
000
600
650
700
750
800
850
Fig.14.Trace plots of the parameters θ
0
− α
0
and β
0
.
the mean number of matched bits over the entire population and β
0
denotes the difference in the
mean number of matched bits between an authentic and an impostor person in the population.
The estimated posterior marginals for θ
0
appear in Figure 15.Moreover,both the parameters of
α and β
1
0
are signiﬁcantly different from zero,hence the effect of authenticity on the number
of matched bits is signiﬁcant for the general population.As with illumination variations in the
other authentication methods,expression changes are also found to not affect the match score
signiﬁcantly for this system.This bears evidence to the fact that the asymmetrybased system is
expected to be sufﬁciently robust to expression changes in the test images for any database.
The posterior predictive distribution p(y
ij
y) of the number of match bits using Gaussian
kernel density functions is shown in Figure 16.It shows a clear separation among the predicted
values of number of matched bits of authentic and impostor people;in fact,the amount of
overlap in the tails of the authentic and impostor distributions is negligible which indicates
23
TABLE IV
T
HE POINT ESTIMATES AND THE
95%
CREDIBLE INTERVALS OF
θ
0
FOR THE ASYMMETRY

BASED SYSTEM
.
Parameter
Estimate
Lower Limit
Upper Limit
α
0
13.83
12.19
15.46
β
0
752.11
704.65
799.57
γ
0
8.56
2.64
19.76
1
0
12
14
1
6
1
8
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
650
7
00
7
50
800
850
0
0.005
0.01
0.015
(a) α
0
(b) β
0
Fig.15.Estimated marginal posterior distributions of θ
0
,using Gaussian kernel density.
much reduced risk of false alarms and false negatives than the two earlier systems.One thing
500
1
000
1
500
2
000
2
500
0
1
2
3
4
x 10
3
1
000
1
500
2
000
2
500
0
0.5
1
1.5
2
2.5
3
x 10
3
600
7
00
800
900
1
000
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
(a) All values (b) Authentic (c) Impostor
Fig.16.Estimated predictive posterior distribution of the number of matched bits,using Gaussian kernel density functions.
to note for these posterior distributions is that they look far removed from Gaussianity with
occasional bumps in the density.So unlike the two earlier cases,we do not ﬁt a Gaussian kernel
to these distributions but just use the posterior predictive values to obtain empirical estimates
of the predicted error rates.The resulting error curves appear in Figure 17,which shows an
EER value around 0.01% at an optimal threshold of 1000 matched bits.This shows that the
24
predicted authentication performance of the asymmetrybased system for the general population
is better than that of either the MACE system or the GMMbased system.We wish to explore
options of ﬁtting other robust kernels to the posterior distributions that will yield nice closedform
expressions for the error rates.
0
1000
2000
3000
4000
0
0.2
0.4
0.6
0.8
1
Thresholds
Error Rates (FAR, FRR)
Fig.17.Predicted error curve for authentication using the asymmetrybased system.The descending solid curve represents the
FAR and the ascending dashed one the FRR.
E.Extensions:Accounting for Correlations
All the three face authentication systems considered in this paper have been observed to
satisfy the primary underlying assumptions of the random effects model framework − normality
and homoscedasticity.However,our model is based on the assumption of independence among
successive response values of match scores,which is not reasonable as multiple test images
belonging to a particular individual are likely to be highly correlated.However,note that the
proposed model is just an initial study into the utility of such an approach to largescale inference
and was therefore tested under the simplistic condition of independence.But the model can be
extended in quite a straightforward manner to incorporate a correlation structure among the match
scores of each individual using a longitudinal setup,but assuming that different individuals
25
are independent.Thus our hierarchical random effects model for correlated data obtained by
generalizing the independence model is:
y
i
∼ N(X
i
θ
i
,V
i
),i = 1,...,k,(16)
where V
i
is the variancecovariance matrix for individual i,X
i
are the covariates and θ
i
the
model parameters.The distribution of θ
i
is the same as that speciﬁed in Equation??,and the
second stage conjugate priors deﬁned in the same way as in Equation 4,with σ
2
replaced by V
i
whose prior is given by:
V
−1
i
∼ Wishart(C
i
,r
i
),(17)
where C
i
and r
i
for each i are additional hyperparameters and supplied by the user.We then use
the Gibbs sampler to simulate from the modiﬁed posterior distributions which are as follows:
θ
i
y,θ
0
,Σ
−1
,V
i
∼ N(D
i
(V
−1
i
X
T
i
y
i
+Σ
−1
θ
0
),D
i
),i = 1,...,k,where D
−1
i
= V
−1
i
X
T
i
X
i
+Σ
−1
,
(18)
θ
0
y,θ
i
,Σ
−1
,V
i
∼ N(V (kΣ
−1
¯
θ+C
−1
η),V ),where V = (kΣ
−1
+C
−1
)
−1
and
¯
θ =
1
k
k
i=1
θ
i
,
(19)
Σ
−1
y,θ
i
,θ
0
,V
i
∼ Wishart
⎛
⎝
k
i=1
(θ
i
−θ
0
)(θ
i
−θ
0
)
T
+ρR
−1
,k +ρ
⎞
⎠
,(20)
Finally,the full conditional for V
i
(i = 1,...,k) is the updated Wishart distribution:
V
−1
i
y,θ
i
,θ
0
,Σ
−1
∼ Wishart(
(y
i
−X
i
θ
i
)(y
i
−X
i
θ
i
)
T
+C
i
−1
,n
i
+r
i
).(21)
A similar inference procedure based on θ
0
and the posterior predictive distributions is used,and
we expect this model to provide more accurate estimates of the regression coefﬁcients than the
independence model.One drawback of this model is that V
i
introduces additional parameters
which may call for extra computing time in the form of longerrunning Markov chains.
26
So far we have only applied this extended model to the MACE ﬁlterbased system but hope
to apply it to the other two systems as well in the near future.The results are very similar to
the ones obtained from the independence model and hence we only include some of them here.
Table V shows the parameter estimates and the associated 95% credible intervals.None of the
estimates differ signiﬁcantly from those obtained in the case with the independence assumption
(Table II).The only noteworthy thing is that the intervals for all the coefﬁcients are narrower than
earlier which shows that the current model represents the variability in the data more efﬁciently
and thus these values are more reliable for the purpose of inference than the earlier estimates
(although only minor difference in the values of the estimates).These observations also imply
that the posterior predictive distributions will not differ signiﬁcantly from earlier,and so we do
not include these here.Given that the posteriors are so similar,the predicted error curves are
TABLE V
T
HE POINT ESTIMATES AND THE
95%
CREDIBLE INTERVALS FOR
θ
0
UNDER THE MODEL WITH CORRELATIONS
.
Parameter
Estimate
Lower Limit
Upper Limit
α
2.2567
1.2664
3.2468
β
0
1.5763
1.3633
1.7893
γ
0
0.0567
0.1826
0.0692
also very close and a same EER of 1.2 −1.5% at optimal threshold values of 12 was obtained.
The empirical error curve based on the posterior distribution of log(PSR) is shown in Figure 18.
F.Model Checking and Validation
We now carry out a crude model checking procedure based on crossvalidation in order to
assess the usefulness of our proposed technique.To this end,we build the random effects model
based on the authentication results from a part of the databases consisting of a randomly selected
27
0
20
40
60
80
0
0.01
0.02
0.03
0.04
Thresholds
Error Rates (FAR, FRR)
Fig.18.The empirical error curve  the descending curve represents the FAR and the ascending one represents the FRR.
group of 30 individuals and compute the predicted EERs.These are then compared to the
actual authentication results obtained from the rest of the database which was constituted by
the remaining individuals (35 for CMUPIE and 25 for CohnKanade).We repeated this random
selection 10 times and the ﬁnal prediction errors were averaged over these 10 repetitions.Some
results are summarized in Table VI which show clearly the close proximity of the predicted and
the actual EERs for all the systems.We use the independence models here but we expect very
similar outcomes in case of the correlated model.
TABLE VI
C
ROSS

VALIDATION RESULTS FOR THE
3
AUTHENTICATION SYSTEMS
.
System
Predicted EER ± Std.Dev.over 10 repetitions
Actual EER ± Std.Dev.over 10 repetitions
MACE
1.4% ± 0.1%
1.3% ± 0.1%
GMM
1.2% ± 0.1%
1.3% ± 0.2%
Asymmetry
0.2% ± 0.2%
0.8% ± 0.1%
Figure 19 shows the distribution of the actual and the predicted log(PSR) values,both for
the authentics and the impostors on a particular validation set.Note here that it is not crucial
how closely each individual prediction matches the original,what matters is the proximity of the
distributions of the actual and the predicted values that will be used to estimate the EER (which
we see here).Similar plots for the other two systems are not included for space constraints,but
28
they show similar correspondence between the actual and the predicted distributions.
3
3.5
4
4.5
5
0
50
100
150
Actual authentic log(PSR)
Frequencies
2.5
3
3.5
4
4.5
0
50
100
150
Predicted authentic log(PSR)
Frequencies
0.5
1
1.5
2
2.5
0
1000
2000
3000
4000
5000
6000
7000
Actual impostor log(PSR)
Frequencies
0.5
1
1.5
2
2.5
0
1000
2000
3000
4000
5000
6000
7000
Predicted impostor log(PSR)
Frequencies
Fig.19.Distributions of the authentic and impostor log(PSR) values on a validation set of 35 individuals (PIE database).
V.T
HE
“W
ATCH

LIST
” P
ROBLEM
Suppose a watchlist contains N individuals,that is,there are N stored templates in the
database,one for each person on the relevant watchlist.Now if we need to determine whether
a person randomly chosen from the general population belongs to that watchlist,his image will
be tested against each of these N templates to see if a match occurs.The probabilities of the
following two events that create errors are of interest:
•
A false match − the image matches one of the stored templates when the person tested is
not actually on the watchlist.
•
A false nonmatch − the image does not match any of the stored templates when the person
tested is actually a member of the watchlist.
29
Let p
0
denote the probability of an incorrect match (FAR),and let p
1
denote the probability
of a correct match (1FRR).Then the probabilities of the two events can be computed as:
p
FM
= Probability that an image will produce a false match with the watchlist database
= Probability that the image matches at least one of the N templates,given he is not on the list
= 1  Probability that the image matches none of the N templates,given that he is not on the list
= 1 −(1 −p
0
)
N
≈ Np
0
,if p
0
is small.
Hence this probability increases with N in a linear fashion for small p
0
.Similarly,
p
FNM
= Probability that an image will produce a false nonmatch with the watchlist database
= Probability that a watchlist person will not be identiﬁed
Probability that it does not match its own template
= (1 −p
1
) = 1 −p
1
(= FRR).
So this probability does not depend on the watchlist size,and this is reasonable there is only
one template for each person on the watchlist which an image needs to be matched against.
Here we have made the approximation that the probability of a match for a watchlist individual
against a different watchlist individual is negligible.
Thus using values of the FAR and FRR from the predicted error curves for an authentication
system (Figures 7,12,17),one can obtain predicted values of p
FM
and p
FNM
.Alternatively,one
can use the theoretical values of the FAR and FRR from the Gaussian kernels in Equation 12
to see that these predicted false match rates for different watchlist sizes actually depend on the
threshold τ which helps determine the “optimal” watchlist false alarm rates.These probabilities
of false match and mismatch can therefore be written in terms of the modeled link function g(y)
of the match scores as:
p
FM
= 1 −
1 −Φ
g(τ) −ν
I
η
I
N
p
FNM
= 1 −Φ
g(τ) −μ
A
σ
A
.(22)
30
One choice for for these probabilities is obtained by choosing matching thresholds so that FAR
and FRR are equal (the EER value),although other tradeoffs might be desirable in certain
situations.Interpreting these results in practical terms,the results indicate that increasing size
of the watchlist leads to an increasing number of false positives,without signiﬁcantly changing
the chance of correctly identifying a particular watchlist target (intuitively obvious as well).
A.Application to the Face Authentication Systems
Figures 20(a)(c) show the variation in the probability of false match with the watchlist size
according to Equation 22,for the three face authentication systems considered in this paper.We
used the FAR values corresponding to the predicted EER for each system to construct these
graphs.These ﬁgures show a roughly linear trend for all the 3 systems,that for the asymmetry
based system being the most pronounced,especially as the watchlist gradually grows in size.
In fact,of the 3 systems,the convergence of the probability of at least one falsematch to 1 is
the slowest for the asymmetrybased system,followed by the GMMbased system.According to
these plots,the probability of detecting at least one false match in the general population reaches
1 for a watchlist size of around 400 (quickest) for the MACE ﬁlter system;the corresponding
ﬁgures for the GMM and the asymmetry systems being around 700 and 1000 respectively.All
these are consistent with expectation since the predicted EER is lowest for the asymmetry system
and highest for the MACE ﬁlter system (p
FM
being linear in FAR).
VI.D
ISCUSSION
This paper has presented a novel technique based on a random effects model to predict the
performance of a biometric system on unknown large databases.The methodology provides an
alternative means of performance evaluation to those based on empirical observational studies
by providing modelbased prediction.We have demonstrated how it can be applied to three face
31
0
200
400
600
800
1000
1200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Watchlist size
Prob. of falsematch
0
200
400
600
800
1000
1200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Watchlist size
Prob. of falsematch
0
200
400
600
800
1000
1200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Watchlist size
prob of false match
(a) MACE system (b) GMMbased system (c) Asymmetrybased system
Fig.20.Variation in the probability of false match as watchlist size increases for the two face authentication systems.
authentication systems,and we also considered watchlist performance and proposed a method
of studying the variation on the probabilities of false match as a function of watchlist size.
Although we concentrated on facial image biometrics,the method can be adapted for use with
almost any biometric,as long as the match scores from a database are available.For instance,in
theory,one could use the techniques we have introduced to analyze systems based on ﬁngerprint
analysis,retinal scans,or any combination of various techniques.
It is also worth pointing out that the framework can be extended to include many potentially
useful covariates.Beyond simply leading to better goodnessofﬁt,the inclusion of covariates
in the model can potentially provide valuable information to the designer of an authentication
system.For example,if one knows in advance what is the expected effect of the number of
training images on the authentication score of a certain system,he/she can roughly determine
the amount of training data needed for the system to be effective.Furthermore,given an idea
of the effects of the different system parameters on the authentication score,one can tune them
for optimal results.
Although the models we considered contained many parameters to be estimated,the imple
mentation of the Gibbs sampling approach was not overly burdensome in terms of computational
32
load.One possible limitation is the use of Gaussian kernel functions to evaluate EER from our
Markov chain output.So it would be natural to consider more accurate approximations instead,
for instance,based on nonparametric kernel density estimation on the Markov chain output.
We also anticipate increasing scope for application of the general method as more and more
biometric data is recorded and made available over time,and also to other biometricbased
systems,such as those based on ﬁngerprints,iris and multibiometrics.
A
CKNOWLEDGMENT
The ﬁrst author’s research was supported in part by a contract from the Army Research Ofﬁce
(ARO) to CyLab,CMU.The authors would also like to thank Professors B.V.K.Vijaya Kumar
and Stephen E.Fienberg for valuable comments on this work.
R
EFERENCES
[1] M.Savvides,B.V.K.Vijaya Kumar and P.Khosla,“Face Veriﬁcation using Correlation Filters”,Proc.3rd IEEE Automatic
Identiﬁcation Advanced Technologies (AutoID),pp.5661,2002.
[2] S.K.Goo,“Sen.Kennedy Flagged by NoFly List”,The Washington Post,pp.A01,August 20,2004.
[3] S.Prabhakar,S.Pankanti and A.K.Jain,“Biometric Recognition:Security and Privacy Concerns”,IEEE Magazine on
Security and Privacy,pp.3342,MarchApril,2003.
[4] T.Sim,S.Baker and M.BStat,“The CMU Pose,Illumination,and Expression (PIE) Database”,Proc.5th International
Conference on Automatic Face and Gesture Recognition,2002.
[5] A.V.Oppenheim and R.W.Schafer,Discretetime Signal Processing,PrenticeHall,1989.
[6] M.A.Turk and A.P.Pentland,“Face Recognition using Eigenfaces”,Proc.IEEE Conference on Computer Vision and Pattern
Recognition (CVPR),1991.
[7] M.Savvides and B.V.K.Vijaya Kumar,“Efﬁcient Design of Advanced Correlation Filters for Robust Distortiontolerant
Face Identiﬁcation”,Proc.IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS),pp.4552,2003.
[8] R.M.Bolle,S.Pankanti and N.K.Ratha,“ Evaluation Techniques for biometricsbased authentication systems (FRR)”,Proc.
International Conference on Pattern Recognition (ICPR),pp.28312837,2000.
[9] R.M.Bolle,N.K.Ratha and S.Pankanti,“Evaluating Authentication Systems using Bootstrap conﬁdence intervals”,Proc.
AutoID’99,pp.913,Summit,NJ,1999.
33
[10] S.Weisberg,Applied Linear Regression,Wiley,1985.
[11] J.L.Wayman,“A scientiﬁc approach to evaluating biometric systems using mathematical methodology”,Card Tech/Secure
Tech,pp.477492,Orlando,1997.
[12] J.P.Egan,Signal Detection Theory and ROC Analysis,Academic Press,New York,1975.
[13] D.Daugman,“Biometric Decision Landscapes”,Tech.Report,University of Cambridge Computer Laboratory,no.482,
2000.
[14] W.Shen,M.Surette and R.Khanna,“Evaluation of automated biometricsbased identiﬁcation and veriﬁcation systems”,
Proc.of IEEE,vol.85,no.9,00.14641478,1997.
[15] M.E.Schuckers,“Using the betabinomial distribution to assess performance of a biometric identiﬁcation device”,
International Journal of Image Graphics,vol.3,no.3,pp.523529,2003.
[16] J.A.Hanley and B.J.McNeil,“The meaning and use of the area under a receiver operating characteristic (ROC) curve”,
Radiology,vol.143,pp.2936,1982.
[17] NIST,“Face Recognition Vendor Test (FRVT)”,http://www.frvt.org,2002.
[18] G.McLachlan and D.Peel,Finite Mixture Models,John Wiley and Sons,2000.
[19] C.P.Robert,“Mixtures of distributions:inference and estimation’,Markov Chain Monte Carlo in Practice,Eds.W.R.
Gilks,S.Richardson and D.J.Spiegelhalter,pp.441464,1996.
[20] H.Bensmail,G.Celeux and A.Raftery,“Inference in modelbased cluster analysis”,Statistics and Computing,vol.7,pp.
110,1997.
[21] A.E.Gelfand,S.E.Hills,A.RacinePoon and A.F.M.Smith,“Illustration of Bayesian Inference in Normal Data Models
using Gibbs Sampling”,Journal of the American Statistical Association,vol.85,no.412,pp.972985,1990.
[22] A.Gelman,J.B.Carlin,H.S.Stern and D.B.Rubin,Bayesian Data Analysis,Chapman and Hall,1995.
[23] S.Mitra and M.Savvides,“Gaussian Mixture Models based on Phase Spectra for Human identiﬁcation and illumination
classiﬁcation”,Proc.4th IEEE Workshop on Automatic Identiﬁcation Advanced Technologies AutoID),Buffalo,2005.
[24] G.H.Givens,J.R.Beveridge,B.A.Draper and D.Bolme,“A Statistical assessment of subject factors in the PCA recognition
of human faces”,Proc.IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2003.
[25] G.H.Givens,J.R.Beveridge,B.A.Draper,P.Grother and P.J.Phillips,“How features of the human face affect recognition:
A statistical comparison of three face recognition algorithms”,Proc.IEEE Conference on Computer Vision and Pattern
Recognition (CVPR),pp.381388,2004.
[26] G.H.Givens,J.R.Beveridge,B.A.Draper and P.J.Phillips,“Repeated Measures GLMM estimation of subjectrelated
factors and false positive threshold effects on human face veriﬁcation performance”,Proc.IEEE Conference on Computer
Vision and Pattern Recognition (CVPR),pp.4047,2005.
34
[27] A.Gelman and D.B.Rubin,“Inference from iterative simulation using multiple sequences (with discussion)”,Statistical
Science,vol.7,pp.457511,1992.
[28] S.Mitra,M.Savvides and B.V.K.Vijaya Kumar,“Facial Asymmetry in the Frequency Domain  A New Robust Biometric”,
Proc.of International Conference on Image Analysis and Recognition (ICIAR),Lecture Notes in Computer Science,Springer
Verlag,vol.3656,pp.10651072,2005.
[29] T.Kanade,J.F.Cohn and Y.L.Tian,“Comprehensive database for facial expression analysis”,Proc.4th IEEE Conference
on Automatic Face and Gesture Recognition,pp.4653,2000.
[30] P.Grother and P.J.Phillips,“Models of large population recognition”,Proc.IEEE Conference on Computer Vision and
Pattern Recognition (CVPR),vol.2,pp.6875,July 2004.
Comments 0
Log in to post a comment