Statistical Performance Evaluation of Biometric

licoricebedsSecurity

Feb 22, 2014 (3 years and 6 months ago)

120 views

1
Statistical Performance Evaluation of Biometric
Authentication Systems using Random Effects
Models
Sinjini Mitra

Marios Savvides Anthony Brockwell
Information Sciences Institute ECE Department Statistics Department
University of Southern California Carnegie Mellon University Carnegie Mellon University
mitra@isi.edu msavvid@ece.cmu.edu abrock@stat.cmu.edu
Abstract
As biometric authentication systems become more prevalent,it is becoming increasingly important
to evaluate their performance.The current paper introduces a novel statistical method of performance
evaluation for these systems.Given a database of authentication results from an existing system,the
method uses a hierarchical random effects model,along with Bayesian inference techniques yielding
posterior predictive distributions,to predict performance in terms of error rates using various explanatory
variables.By incorporating explanatory variables as well as random effects,the method allows for
prediction of error rates when the authentication system is applied to potentially larger and/or different
groups of subjects than those originally documented in the database.We also extend the model to allow
for prediction of the probability of a false alarm on a “watch-list” as a function of the list size.We
consider application of our methodology to three different face authentication systems:a filter-based
system,a Gaussian Mixture Model (GMM) based system,and a system based on frequency domain
representation of facial asymmetry.
2
Index Terms
biometrics,face,authentication,performance evaluation,random effects model,watch-list
I.I
NTRODUCTION
In statistical literature,the terms “biometrics” and “biometry” have been used since the early
20th century to refer to the field of development of mathematical methods applicable to data
analysis problems in the biological sciences.More recently,however,the term biometrics has
been used to denote the unique biological traits (physical or behavioral) of individuals,such as
face images,fingerprints,iris,voice-print,etc.that can be used for identification.Since these traits
cannot be stolen,lost or forgotten,they offer better inherent security and reliability in identifying
people ([3]) and there is concerted effort to replace traditional means of identification such as
use of passwords or PINs with biometric-based authentication systems.The recently-introduced
practice of recording biometric information (photo and fingerprint) of foreign passengers at U.S.
airports and the proposed inclusion of digitized photos in passports demonstrates the growing
importance of biometric authentication to the U.S.federal government.
With any biometric authentication system,it is important to be able to carry out performance
evaluation,that is,to assess how well it serves its purpose of matching biometric samples
obtained from people to stored templates synthesized from training data.To this end,several
statistical methods have been proposed to date,exploiting the correspondence between the
authentication problem and statistical decision theory.An overview of these methods can
be found in [14] and [11].[13] discusses certain decision landscapes that can be used to
characterize several forms of biometric decision making.[8] and [9] suggest the use of binomial
distributions,normal approximations,and bootstrapping for estimating the error rate confidence
intervals and developing tests of significance,while [15] proposes the use of beta-binomial
3
distribution.Computations are much simplified if the underlying score distributions conform
to a Gaussian distribution,and this can be checked via various descriptive statistics including
sample skewness and kurtosis coefficients ([8]).An effective statistical means of presenting the
matching performance of any diagnostic device is the “receiver operating characteristic” (ROC)
curve,used widely in clinical studies to measure the effectiveness of drugs and medical devices
([16]),and in many signal processing applications ([12]).ROC curves are in common use today
for evaluating the performance of biometric systems ([1]).
In a number of cases,an authentication system may be applied in conjunction with a watch-
list.A watch-list refers to a database of people who are of some interest.For example,the
FBI may be watching criminals who are on a so-called “do not fly” list at airports.Watch-list
systems like these,based only on the use of names,tend to produce a lot of false alarms.For
example,according to the Washington Post (August 20,2004),U.S.Senator Edward M.“Ted”
Kennedy was stopped and questioned at airports on the East Coast five times in March 2004
because his name appeared on the government’s secret “no-fly” list ([2]).This kind of incident
demonstrates the fragility of the name-based system,and highlights the potential usefulness of
biometric identifiers,such as face,fingerprints,to be associated with the name for better and
more reliable outcomes.The Face Recognition Vendor Test (FRVT,2002) conducted by NIST
reported that the probability that a system correctly identifies an individual on the watch-list
usually deteriorates as the watch list size grows ([17]),and hence such lists should be kept as
small as possible for effective results.
This paper addresses the issue of performance evaluation of a biometric system using a novel
statistical framework that allows for prediction of misclassification rates on a population,along
with false alarm probabilities for watch-list detection.The rest of the paper is organized as
follows.Section II presents our statistical framework of performance evaluation and Section III
4
introduces our random effects model methodology.Section IV describes the application of our
technique to three face authentication systems,and an extension of our method to the watch-list
problem is included in Section V.We conclude with some additional discussion in Section VI.
II.T
HE
S
TATISTICAL
F
RAMEWORK FOR
P
ERFORMANCE
E
VALUATION
The result of an authentication algorithm is a match score Y between the test image of a
person and a stored template,and a threshold τ.If Y > τ,the system returns a match and the
person tested is called an “authentic”,otherwise if Y ≤ τ,the system decides that a match has
not been made and the person tested is an “impostor”.Then the False Rejection Rate (FRR) and
the False Acceptance Rate (FAR) are defined as:
FRR = P(Y ≤ τ|Y ∈ Authentic) =

τ
−∞
f
A
(x)dx
FAR = P(Y > τ|Y ∈ Impostor) =


τ
h
I
(y)dy,(1)
where f
A
(·) and h
I
(·) respectively denote the distributions of the match scores for the authentics
and the impostors.These are analogous to Type I and Type II error respectively in statistical
inference and form the quantities of interest in terms of evaluating the matching performance of
a biometric device.Generally they are unknown and must be estimated from the observed data.
Many biometric authentication systems developed today are intended for use with databases
containing information on millions of people,but only tested on databases with hundreds of
people.(For example,millions of people pass everyday through airports.False alarms in no-fly
watch-lists lead to much inconvenience for travelers,[3].) The limited scope of testing makes it
difficult to address questions about the expected performance of the system when used with the
full database.For example,a system with a false alarm rate of 1% would yield around 10,000
false alarms in use on a database of size 1,000,000,but could well appear to be functioning
perfectly in a test database of size 100.Moreover,it is known that systems based on the use
5
of images,particularly faces,have difficulty when images are taken under different illumination
levels,orientations,expression types,etc.In other words,false alarm rates can be dependent on
a number of factors.In light of these issues,the following questions are pertinent.

Howdo different image properties affect the match score of a biometric systemin the general
population,and what are the predicted score distributions for authentics and impostors,based
on these image properties?

What error rates (both FAR and FRR) can be expected when a certain biometric system is
applied to a large unknown database?

How does the performance of a system on a watch-list (measured in terms of the false
alarms) change with the list size?
The effect of image properties on the score can be studied with the help of a simple regression
model ([18]) using the match score as the response variable.The estimated regression coefficients
tell us to what extent the performance of the system varies with variations in the particular image
properties.[24] proposed the use of ANOVA to study the statistical effects of demographic
covariates such as age,sex,race,facial hair,etc.on face recognition algorithms,and [25] made
use of logistic regression to relate subject covariates to rank-one recognition rates.Although
effective,the regression/ANOVA models are fixed effects models,meaning that the inference
about the covariate effects are restricted to the database at hand,and cannot easily be generalized
to a larger (or different) database drawn from the same population.These analyses,although
useful for any particular dataset,provide only a baseline for more extensive studies required to
make general inference for a population.Such a framework can be provided by a random effects
model ([21]) and is the focus of this paper.
A random effects model assumes that the particular subset of subjects in the present database
is a random sample from a bigger population,so that the inference easily extends to that bigger
6
population.In other words,valid inference about covariate effects and predicted error rates
(confidence intervals,say) can be made if another subset of individuals (different from the
current sample) is drawn from this population.(Some work has been done on this kind of
problem- [30] proposed a binomial model for comparing watch-list recognition performance with
empirical identification and false match rates for large populations,but we aim to develop a more
generally applicable framework.) The random effects model takes into account heterogeneity
across individuals in their regression coefficients with the help of a probability distribution unlike
fixed effects models,thus capturing more inherent variability in the data that involve repeated
measures or multi-level data structures.[26] first made use of random effects in a Generalized
Linear Mixed Model (GLMM) framework for predicting the probability of correct verification
of the PCA algorithm at different false acceptance rates based on subject covariates including
hairstyle,gender,age,etc.In this paper,we develop this random effects framework in the context
of predicting performance of biometric systems in terms of various explanatory variables.
III.T
HE
R
ANDOM
E
FFECTS
M
ODEL
Consider a certain biometric authentication system whose performance we wish to evaluate.
Suppose that there are k people in a certain database,and n
i
test images for the ith person,
i = 1,...,k (which gives a total of n =

k
i=1
n
i
test images for the entire database).For each
of these k individuals,there is a stored template developed in the training stage.Typically in
the authentication stage,each of the n images is matched to each of these k templates.
Let Y
ij
denote the match score for the j
th
test image of the i
th
subject in the database,when
tested against one of the templates stored in the database.Also let x
(m)
ij
,m= 1,2,...,M be a
collection of explanatory variables associated with the image.Rather than explicitly indicating in
the notation for Y
ij
what particular template the image is matched against,we typically assume
(for the sake of notational convenience) that one of the covariates is an indicator variable equal
7
to one if the subject is matched against his/her own template,and equal to zero otherwise.In
addition,any factor that is expected to affect the matching performance of a biometric device
can be used as a covariate,including subject factors such as age,sex,hairstyle.Image properties,
such as level of noise,levels of occlusion (if present),different expressions,different illumination
levels,can also be used,along with other concomitant variables like the number of training
images used,and system design parameters.
Our random effects model is given by
g(Y
ij
)
ind.
∼ N(α
i
+
M

m=1
β
m
i
x
(m)
ij

2
),i = 1,...,k,j = 1,...,n
i
,(2)
θ
i
= (α
i

1
i
,...,β
M
i
)
T
ind.
∼ MV N(θ
0
,Σ),i = 1,...,k,(3)
where g(·) is a monotonic function referred to as the link function,θ
0
= (α
0

1
0
,...,β
M
0
) is
an (M +1)-dimensional vector,and Σ is an (M +1) ×(M +1) matrix.The link function is
a transformation,so chosen as to ensure conformity of the response variable to the underlying
assumptions of the model,such as,normality,homoscedasticity.etc.
This model is a multivariate generalization of the hierarchical random effects model (used
for example,by [21] for the analysis of the weights of young laboratory rats).Apart from the
possibly nonlinear link function,it supposes linear dependency with homogeneous errors,but
allows for the possibility of different slopes and intercepts for each individual,thus accounting
for the heterogeneity in the effects across individuals.Although not included in the form of
the model given above,it is straightforward to construct variants including interaction effects
between covariates.
We adopt a Bayesian approach for estimating model parameters and making inference.First,
let
y = (g(y
ij
),i = 1,...,k,j = 1,...,n
i
)
8
denote our observed match scores,after transforming using the link function g(·).We assign
(conjugate) prior distributions to σ
2

0
and Σ,
σ
2
∼ IG(a,b),θ
0
∼ N(η,C),Σ
−1
∼ Wishart((ρR)
−1
,ρ),(4)
where R is a matrix and ρ ≥ 2 is a scalar degrees-of-freedom (df) parameter.Note that this
notion of df is different from the one used in [13] − we use it as a parameter characterizing
the Wishart distribution ([21]) while the latter was used as a means to assess the complexity
of a biometric and essentially measured by the number of independent dimensions of variation,
or the number of independent yes-no questions that a biometric decision is based upon.All the
hyper-parameters in the model (parameters of the prior distributions) a,b,η,C,ρ,R are assumed
known,and the parameters to be estimated are θ
i

0
,Σ and σ
2
.Owing to the use of conjugate
priors,the posterior distributions of the unknown parameters have closed-form expressions.We
use Gibbs sampling ([22]) to simulate from the conditional posterior distributions of each of the
four unknown parameters given the remaining three,termed full conditionals,given by:
θ
i
|y,θ
0

−1

2
∼ N

D
i
(
1
σ
2
X
T
i
y
i

−1
θ
0
),D
i

,i = 1,...,k,(5)
where D
−1
i
=
1
σ
2
X
T
i
X
i

−1
,y
i
=







y
i1
.
.
.
y
in
i







,X
i
=







1 x
1
i1
...x
M
i1
.
.
.
1 x
1
in
i
...x
M
in
i







,
θ
0
|y,θ
i

−1

2
∼ N(V (kΣ
−1
¯
θ+C
−1
η),V ),where V = (kΣ
−1
+C
−1
)
−1
and
¯
θ =
1
k
k

i=1
θ
i
,
(6)
Σ
−1
|y,θ
i

0

2
∼ Wishart



k

i=1

i
−θ
0
)(θ
i
−θ
0
)
T
+ρR

−1
,k +ρ


,(7)
σ
2
|y,θ
i

0

−1
∼ IG

n
2
+a,
1
2
k

i=1
(y
i
−X
i
θ
i
)
T
(y
i
−X
i
θ
i
) +b

,where n =
k

i=1
n
i
,
(8)
9
where IG denotes an inverse Gamma distribution (if Y ∼ Gamma,1/Y ∼ IG).The hyper-prior
values are so chosen as to determine very vague priors,namely,C
−1
= 0 (so that η disappears
completely from the full conditionals),a = b = 1/ where  = 0.001 (this is done in accordance
with [21]).To ensure proper mixing of our Markov chains,we use 10 different starting values for
our chains and assess convergence and mixing properties with the help of trace plots,cumulative
sums and autocorrelation plots ([27]).
Let Ψ ≡ {θ
i

0
,Σ,σ
2
} denote the collection of the parameters to be estimated for the
model.Then the Gibbs sampler yields a Markov chain {Ψ
(k)
,k = 1,2,...} whose distribution
converges to the true posterior distribution of the parameters.The parameters are estimated using
the posterior mean formed by the ergodic average of the Markov chain.To reduce bias associated
with the fact that the chain takes time to converge,we discard the first N
1
samples as burn-in,
that is,our parameter estimates are
ˆ
Ψ=
1
(N −N
1
)
N

j=N
1
+1
Ψ
(j)
 E{Ψ|y}.(9)
(We choose N
1
by visual inspection,choosing it so that after iteration N
1
,the Markov chain
appears to have settled into its steady-state behavior.)
A.Inference
We make inference from this model based on the marginal posteriors for the population
parameters θ
0
= (α
0

1
0
,...,β
M
0
) and posterior predictive distributions of the match scores.The
parameters θ
0
determine quantitatively the effects of the different covariates on the authentication
score that is expected in a general population;more precisely,they determine by how much the
score changes for unit changes in the corresponding covariate value while keeping the others
fixed.Standard deviations of the posterior distributions as well as credible intervals (confidence
intervals based on the posterior distributions) can be constructed to assess the reliability of the
10
point estimates.The posterior predictive distributions are computed by generating new data from
Equation 2 using the N − N
1
post-convergence values of the parameters (after burn-in).The
predictive distribution of the link-function-transformed match score g(Y ) for a score Y of a new
individual from the population can be estimated using a kernel density of the form
p(g(y)|y) =

p(g(y)|θ
i

2
)p(θ
i

2
|y)∂θ
i
∂σ
2
.(10)
We fit a mixture of Gaussian kernels to the posterior predictive distributions of g(y) since
the authentic and impostor score distributions are expected to be well-separated.We also use
Gaussian kernels to estimate the posterior distribution of θ
0
using the post-convergence values
of the parameter estimates.
The posterior predictive distributions of the authentication score statistics can be used to
estimate the predicted FAR and FRR for the system.One way to do this is simply to simulate
directly from the distribution of g(Y ) as specified above,invert the link-function transformation
g(·),and repeat many times (with appropriately chosen values of x
·j
) to form empirical estimates
of the FAR and FRR for specified threshold values τ.Another approach is based on the
observation that the posterior predictive distribution of g(Y
qj
) can be well-approximated by
a Gaussian distribution.Under this approximation,we can obtain exact closed-form expressions
for the FAR and FRR as functions of the threshold τ.The two error rates are
FRR = P(g(Y ) ≤ g(τ)|Y ∈ Authentic)
FAR = P(g(Y ) > g(τ)|Y ∈ Impostor).(11)
Approximating g(Y ) ∼ N(μ
A

2
A
) when Y is the score of an authentic and g(Y ) ∼ N(ν
I

2
I
)
when Y is the score of an impostor,the error rates can be written in closed form in terms of
the cumulative distribution function Φ of a standard normal random variable as
FRR = Φ

g(τ) −μ
A
σ
A

,FAR = 1 −Φ

g(τ) −ν
I
η
I

.(12)
11
The parameters of the Gaussian distributions are estimated from the predictive posteriors,with
appropriately chosen covariate values x
·j
.One would typically then plot FRR and FAR for
different values of τ and the point where FRR and FAR are equal gives the predicted Equal
Error Rate (EER) for the system.
IV.A
PPLICATIONS
In this section we describe the application of our proposed methodology to three existing face
authentication systems − (i) the Minimum Average Correlation Energy (MACE) filter system
([1]) (ii) a system based on Gaussian Mixture Model (GMM) and Fourier domain phase spectra
([23]),and (iii) a system based on the frequency domain representation of facial asymmetry
([28]).These systems are applied to a database at hand,and then the match scores from those
applications are used along with the covariate information that is available from the data,to
develop the random effects models for making inference and prediction.
A.Data
For the first two face authentication systems,we use a subset of the “CMU-PIE Database”
([4]) which contains frontal images of 65 people under 21 different illumination conditions and
neutral expressions.Images belonging to one person from the database are shown in Figure
1.All these images have been normalized using affine transformations based on locations of
Fig.1.Sample images of a person from the CMU-PIE database.
the eyes and nose as is common and necessary for most computer vision problems.The final
12
cropped images are gray-scale and of dimension 100 ×100 pixels.
For the third authentication system,we use a subset of the “Cohn-Kanade Facial Expression
Database” ([29]),consisting of images of 55 individuals expressing three different kinds of
emotions − joy,anger and disgust.Each person was asked to express one emotion at a time by
starting with a neutral expression and gradually evolving into its peak form.The data thus consist
of video clips of people showing an emotion,each clip being broken down into several frames.
We work with 9 frames in all per person − 3 neutral images and 6 images with peak forms of the
3 emotions.The raw images are cropped and normalized like the PIE images,and the final images
are gray-scale and of dimension 128×128 pixels.Some normalized images from our database are
shown in Figure 2.This database was used to develop this asymmetry-based face authentication
Fig.2.Sample images from the Cohn-Kanade database.
system ([28]),therefore we use it.The CMU-PIE images have illumination variations in them
and hence are unsuitable for studying the role of facial asymmetry in authentication tasks.
B.Application:The MACE filter System
The MACE filter system was introduced as a face authentication system based on a linear
filter constructed in the frequency domain ([1]).One filter is synthesized for each subject based
on 3 training images so chosen as to represent the three different types of lighting variations
in the PIE images − left shadows,balanced and right shadows.We treat the rest 18 images
per person as test images and apply each of the 65 filters to each image via correlation.The
13
resulting authentication criterion for this system is known as the Peak-to-Sidelobe Ratio (or,PSR
for short) and it measures the height of the correlation peak at the origin of the correlation plane
relative to the neighboring area.More details about this method are available in [1].The random
effects model in this case can be written as:
log(Y
ij
)
iid.
∼ N(α
i

i
x
0
ij

i
x
1
ij

2
).(13)
Here Y denotes the PSR value,and the logarithmic link function is used because log(PSR)
conforms better to the normality and homoscedasticity assumptions of the random effects model
than PSR (shown by Figure 3 and Table I).X
0
represents the authenticity covariate (binary
variable assuming the value “1” for an authentic test image and “0” for an impostor test image)
and X
1
represents the covariate denoting the illumination condition of an image (values 1 −21
for the 21 images of a person).
0
2
0
4
0
60
80
1
00
12
0
14
0
0
1
2
3
4
5
6
7
8
9
x 10
4
1
1.
5
2
2.
5
3
3
.
5
4
4.
5
5
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
4
(a) PSR (b) log(PSR)
2.
5
3
3
.
5
4
4.
5
5
0
50
100
150
200
250
300
350
1.4
1.
6
1.
8
2
2.2
2.4
2.
6
2.
8
3
3
.2
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
(c) log(Authentic PSR) (d) log(Impostor PSR)
Fig.3.Histograms of the PSR values,log(PSR) values from the CMU-PIE database (all combined,and separately for authentics
and impostors).The x-axis shows the PSRor log(PSR),whichever applicable,and the y-axis shows the corresponding frequencies.
14
TABLE I
S
OME DESCRIPTIVE STATISTICS FOR
PSR
AND LOG
(PSR)
DISTRIBUTIONS FROM APPLYING THE
MACE
FILTER TO THE
TEST IMAGES
.
Variable
Measure
Authentic
Impostor
All combined
PSR
Skewness
1.2389
-0.1254
6.9209
Kurtosis
3.8564
3.0722
91.7597
Standard deviation
23.4618
3.2542
7.3048
log(PSR)
Skewness
0.3437
-0.7087
0.3954
Kurtosis
2.8386
1.5201
4.1944
standard deviation
5.4122
8.2985
8.2872
The choice of the starting values of the parameter chains did not affect the results in any way
− similar convergence values are obtained from the multiple chains.We run the simulations from
the posterior conditionals given in Equations (5)-(8) for 5000 iterations and some diagnostic plots
to assess convergence of the parameter chains are shown in Figure 4.Both sets of plots show
satisfactory convergence and mixing − no significant correlation at different iteration lags and
sample paths apparently exploring the region of support after moving away quickly from the
starting values (β
1

i

2
and Σ also converged but we do not include those plots here for space
considerations).No cross-correlation was observed among the different parameter chains.
The parameter estimates of θ
0
are formed by using the posterior means after a burn-in of length
2000,and they appear in Table II along with the associated 95% credible intervals.These form
the basis for making population inference.α
0
denotes the mean log(PSR) value over the entire
population,β
0
denotes the difference in the mean log(PSR) values between an authentic and an
impostor person in the population and γ
0
denotes the change in the mean log(PSR)value when
the illumination level changes by unity.The credible interval for γ
0
shows that the illumination
level of an image does not significantly affect the PSR (since it contains zero) and hence the
15
0
1
000
2
000
3000
4
000
5000
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
1
000
2
000
3000
4
000
5000
1
1.2
1.4
1.6
1.8
2
2.2
2.4
0
5
10
15
20
0.2
0
0.2
0.4
0.6
0.8
Lag
autocorrelation
0
5
10
15
20
0.2
0
0.2
0.4
0.6
0.8
Lag
autocorrelation
Fig.4.Diagnostics to assess convergence of the parameter chains for θ
0
- α
0
(left) and β
0
(right).The first row shows the
trace plots where the x-axis shows the iterations and the y-axis the values of the respective parameters.The second row shows
the autocorrelation functions.
MACE filter system is expected to be robust to illumination variations.The authenticity variable
however,is seen to have a statistically significant effect on PSR,(interval does not contain
zero),and this is reasonable and shows that the MACE system is able to distinguish between the
authentic and the impostor score distributions.The estimated posterior marginal distributions of
θ
0
appear in Figure 5,giving an idea about the nature of its values for a general population of
face images (we have omitted the distribution for γ
0
since it was not observed to be significant).
TABLE II
P
OINT ESTIMATES AND
95%
CREDIBLE INTERVALS FOR THE POPULATION PARAMETERS
θ
0
FOR THE
MACE
SYSTEM
.
Parameter
Estimate
Lower Limit
Upper Limit
α
0
1.9737
0.7504
3.1970
β
0
1.4634
1.2874
1.6394
γ
0
-0.0184
-0.1965
0.1597
Next,we generate values of log(PSR) from the model using the post-convergence values of
16
2
0
2
4
6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
.
5
1
1.
5
2
0
0.01
0.02
0.03
0.04
(a) α
0
(b) β
0
Fig.5.Estimated marginal posterior distributions of θ
0
,using Gaussian kernel densities.The x-axis shows the respective
parameter values and the y-axis the probability values.
θ
i
and σ
2
(iterations 2001 − 5000 of the Gibbs sampler),and estimate the density with the
help of Gaussian kernels (Figure 6).As can be seen clearly,there exists a clear separation
among the predicted log(PSR) values of authentic and impostor people;in fact,the distribution
of log(PSR) appears to be a mixture of two distributions − one very nearly Gaussian (impostor)
and the other a slightly positively-skewed one (authentic).The mean of the authentic cases is
also higher than that of the impostor cases.The little mass on the rightmost end of the tail of the
authentic score distribution shows that the systempredicts a few very high values of the authentic
log(PSR) values in the population (the bi-modality here is not so serious here to cause concern
for subsequent analysis).The amount of overlap in the tails of the two distributions in Figure
6(a) − the right tail of the impostor distribution and the left tail of the authentic distribution −
determines the false alarm error rates and the false negative rates.
Based on the posterior predictive distributions,the parameters of the Gaussian densities for the
authentic and the impostor log(PSR) are estimated as:ˆμ
A
= 4.1331,ˆν
I
= 1.9265,ˆσ
A
= 0.6316
and ˆη
I
= 0.1471.The resulting FAR and FRR for different selected thresholds on the PSR values
(exponentiating the log(PSR) values) computed according to Equation 12 are shown in Figure
7(a) (the theoretical curve),while the empirical error curve appears in Figure 7(b).Note that
these curves are variants of the popular ROC curves and we use these instead to represent the
17
1
2
3
4
5
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
2
2.
5
3
3
.
5
4
4.
5
0
0.02
0.04
0.06
0.08
1.
6
1.
8
2
2.2
2.4
0
0.01
0.02
0.03
0.04
(a) All values (b) Authentic (c) Impostor
Fig.6.Estimated predictive posterior distribution of log(PSR) using a Gaussian kernel density function.The x-axis shows the
log(PSR) values,and the y-axis shows the associated probability values.
performance of a biometric system because we use EER as our single quantitative performance
measure.Both the curves yield a similar predicted EER value around 1.2 − 1.5% at optimal
threshold PSR values of 10 −15.The threshold value is also close to the rule-of-thumb value
of 20 which is conventional in its applications ([1]).
0
20
40
60
80
0
0.2
0.4
0.6
0.8
1
Thresholds
Error Rates (FAR, FRR)
0
20
40
60
80
0
0.01
0.02
0.03
0.04
Thresholds
Error Rates (FAR, FRR)
(a) Theoretical curve (b) Empirical curve
Fig.7.Predicted error curves for authentication using the MACE filter system.The solid descending curve represents the FAR
and the ascending dashed one the FRR.The point at which the FAR and FRR curves meet is the EER.
C.Application:GMM-based System
We next apply the random effects model technique to a GMM-based authentication system
synthesized in the frequency domain based on the phase spectra of a face image ([23]).For
18
this system,the classification and verification are performed using a MAP estimate and hence
the authentication match score is the posterior log-likelihood of the test images.10 images per
person are used to build the model and the rest 11 are used for testing.Using similar covariates
as for the MACE filter system,the random effects model in this case is:
Y
ij
iid
∼ N(α
i

i
x
0
ij

i
x
1
ij

2
),(14)
where X
0
denotes the explanatory variable representing authenticity of an image,X
1
represents
the illumination level of an image and Y is the posterior log-likelihood (the match score).The
histograms of the posterior log-likelihood values for the PIE database test images are shown
in Figure 8,and they indicate that the assumptions of normality and homoscedasticity hold
sufficiently well (the impostor distribution is slightly skewed but the departure is insignificant).
1700
1680
1660
1640
1620
1600
1580
1560
1540
0
10
20
30
40
50
60
70
80
90
100
Authentic posterior loglikelihood
1950
1900
1850
1800
1750
1700
0
1000
2000
3000
4000
5000
6000
7000
8000
Impostor posterior loglikelihood
(a) Authentic (b) Impostor
Fig.8.Histograms of the posterior log-likelihood used for authenticating the PIE database using the GMM-based method.
The Gibbs sampler used to simulate from the posterior distribution stabilized around 500
iterations in this case which we use as burn-in for a total run of 2000 iterations.The trace plots
for the parameter of interest θ
0
in Figure 9 show satisfactory convergence.As with the MACE
filter,we looked at autocorrelation and cross-correlation of the parameter chains and they all
indicated good mixing (we omit those plots from here due to space reasons).Table III shows the
point estimates of θ
0
and the associated credible intervals.α
0
denotes the mean log-likelihood
19
0
500
1
000
1
500
2
000
1700
1695
1690
1685
1680
1675
1670
0
500
1
000
1
500
2
000
60
65
70
75
80
85
90
Fig.9.Trace plots for α
0
and β
0
.The x-axis shows the iterations and the y-axis the corresponding parameter values.
value over the entire population and β
0
denotes the difference in the mean log-likelihood values
between an authentic and an impostor person in the population.The estimated posterior marginals
for θ
0
appear in Figure 10.Moreover,both the parameters of α and β
0
are significantly different
from zero (intervals do not include 0),hence the effects of the corresponding covariates on
the log-likelihood values are statistically significant for any large population.The illumination
variation does not have a significant effect on the match score like the MACE system,and this
is reasonable since the GMM method was also shown to be robust to illumination changes.
TABLE III
P
OINT ESTIMATES AND
95%
CREDIBLE INTERVALS FOR THE POPULATION PARAMETERS
θ
0
FOR THE
GMM-
BASED SYSTEM
.
Parameter
Estimate
Lower Limit
Upper Limit
α
0
-1694.4
-1698.0
-1690.8
β
0
81.7
76.7
86.7
γ
0
2.8
-5.4
11.0
Next,we generate values of the log-likelihood from the model using the post-convergence
values of θ
i
and σ
2
(iterations 501 −2000 of the Gibbs sampler),and estimate the density with
the help of Gaussian kernels (Figure 11).The amount of overlap in the tails of the authentic and
impostor distributions seems to be fairly negligible in this case,indicating much reduced risk of
false alarms and false negatives than the MACE filter system.
20
17
05
17
00
1
695
1
690
1
685
0
0.005
0.01
0.015
0.02
0.025
0.03
7
0
7
5
80
85
90
0
0.005
0.01
0.015
0.02
0.025
(a) α
0
(b) β
0
Fig.10.Estimated marginal posterior distributions of θ
0
,using Gaussian kernel densities.
17
50
17
00
1
650
1
600
1
550
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
1
660
1
6
4
0
1
6
2
0
1
600
1
580
1
560
0
0.01
0.02
0.03
0.04
174
0
172
0
17
00
1
680
1
660
0
0.01
0.02
0.03
0.04
(a) All values (b) Authentic (c) Impostor
Fig.11.Predictive posterior distribution of posterior log-likelihood,using Gaussian kernel density functions.The x-axis shows
the log-likelihood values and the y-axis the associated probabilities.
The parameter estimates of the Gaussian posterior predictive distributions for the authentic
and the impostor log-likelihoods are:ˆμ
A
= −1610.3,ˆν
I
= −1701.2,ˆσ
A
= 19.83 and ˆη
I
= 14.7.
The predicted FAR and FRR for the system appear in Figure 12.Both the curves have a similar
EER value,around 0.8 − 1% at optimal threshold log-likelihood values between −1650 and
−1700.These are verification results one would expect when applying the GMM-based system
to a general large database of face images.
D.Application:Facial Asymmetry-based System
Our next application pertains to a face authentication system based on a frequency domain
representation of facial asymmetry ([28]).One-bit codes called the Facial Asymmetry Code (or,
FAC for short) are used as features along with a Hamming distance classifier that computes the
number of matched bits between a test image and trained templates to decide if the image is
21
1750
1700
1650
1600
1550
0
0.2
0.4
0.6
0.8
1
Thresholds
Error Rates (FAR, FRR)
1750
1700
1650
1600
1550
0
0.2
0.4
0.6
0.8
1
Thresholds
Error Rates (FAR, FRR)
(a) theoretical (b) empirical
Fig.12.Predicted error curves for authentication for the GMM-based system.The descending solid curve represents the FAR
and the ascending dashed one the FRR.
authentic or not.The templates are synthesized based on the 3 neutral images of each person,
and the 6 images with expression variations are used as the test images in this case.The random
effects model in this case is:
Y
ij
iid
∼ N(α
i

i
x
0
ij

i
x
1
ij

2
),(15)
where X
0
denotes the explanatory variable representing authenticity,X
1
denotes the covariate
representing different expressions (0 for neutral,1 for joy,2 for anger and 3 for disgust) and Y
ij
is the number of matched bits of the jth test image for the ith person when compared to a stored
template.Figure 13 shows the distributions of the number of matched bits for the authentic and
the impostors using the Cohn-Kanade database,which as in the case of the two earlier systems,
do not exhibit any major departure from the assumptions of normality and equality of variance.
A Gibbs sampler based on the posterior distributions is used to fit the model,and some trace
plots for θ
0
appear in Figure 14 which show satisfactory convergence in about 2000 iterations.
The chains stabilized around 500 which was used as the burn-in.Again autocorrelation plots are
not included here but they showed good mixing properties of the chains.Table IV shows the
parameter estimates based on the posterior means and the credible intervals for θ
0
.α denotes
22
1200
1400
1600
1800
2000
2200
2400
2600
0
10
20
30
40
50
60
70
80
Matched bits
300
400
500
600
700
800
900
1000
1100
1200
0
1000
2000
3000
4000
5000
6000
Matched bits
(a) Authentic (b) Impostor
Fig.13.Histograms of the number of matched bits used for authenticating the Cohn-Kanade database using the FAC-based
system.
0
500
1
000
1
500
2
000
10
11
12
13
14
15
16
17
0
500
1
000
1
500
2
000
600
650
700
750
800
850
Fig.14.Trace plots of the parameters θ
0
− α
0
and β
0
.
the mean number of matched bits over the entire population and β
0
denotes the difference in the
mean number of matched bits between an authentic and an impostor person in the population.
The estimated posterior marginals for θ
0
appear in Figure 15.Moreover,both the parameters of
α and β
1
0
are significantly different from zero,hence the effect of authenticity on the number
of matched bits is significant for the general population.As with illumination variations in the
other authentication methods,expression changes are also found to not affect the match score
significantly for this system.This bears evidence to the fact that the asymmetry-based system is
expected to be sufficiently robust to expression changes in the test images for any database.
The posterior predictive distribution p(y
ij
|y) of the number of match bits using Gaussian
kernel density functions is shown in Figure 16.It shows a clear separation among the predicted
values of number of matched bits of authentic and impostor people;in fact,the amount of
overlap in the tails of the authentic and impostor distributions is negligible which indicates
23
TABLE IV
T
HE POINT ESTIMATES AND THE
95%
CREDIBLE INTERVALS OF
θ
0
FOR THE ASYMMETRY
-
BASED SYSTEM
.
Parameter
Estimate
Lower Limit
Upper Limit
α
0
13.83
12.19
15.46
β
0
752.11
704.65
799.57
γ
0
8.56
-2.64
19.76
1
0
12
14
1
6
1
8
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
650
7
00
7
50
800
850
0
0.005
0.01
0.015
(a) α
0
(b) β
0
Fig.15.Estimated marginal posterior distributions of θ
0
,using Gaussian kernel density.
much reduced risk of false alarms and false negatives than the two earlier systems.One thing
500
1
000
1
500
2
000
2
500
0
1
2
3
4
x 10
3
1
000
1
500
2
000
2
500
0
0.5
1
1.5
2
2.5
3
x 10
3
600
7
00
800
900
1
000
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
(a) All values (b) Authentic (c) Impostor
Fig.16.Estimated predictive posterior distribution of the number of matched bits,using Gaussian kernel density functions.
to note for these posterior distributions is that they look far removed from Gaussianity with
occasional bumps in the density.So unlike the two earlier cases,we do not fit a Gaussian kernel
to these distributions but just use the posterior predictive values to obtain empirical estimates
of the predicted error rates.The resulting error curves appear in Figure 17,which shows an
EER value around 0.01% at an optimal threshold of 1000 matched bits.This shows that the
24
predicted authentication performance of the asymmetry-based system for the general population
is better than that of either the MACE system or the GMM-based system.We wish to explore
options of fitting other robust kernels to the posterior distributions that will yield nice closed-form
expressions for the error rates.
0
1000
2000
3000
4000
0
0.2
0.4
0.6
0.8
1
Thresholds
Error Rates (FAR, FRR)
Fig.17.Predicted error curve for authentication using the asymmetry-based system.The descending solid curve represents the
FAR and the ascending dashed one the FRR.
E.Extensions:Accounting for Correlations
All the three face authentication systems considered in this paper have been observed to
satisfy the primary underlying assumptions of the random effects model framework − normality
and homoscedasticity.However,our model is based on the assumption of independence among
successive response values of match scores,which is not reasonable as multiple test images
belonging to a particular individual are likely to be highly correlated.However,note that the
proposed model is just an initial study into the utility of such an approach to large-scale inference
and was therefore tested under the simplistic condition of independence.But the model can be
extended in quite a straightforward manner to incorporate a correlation structure among the match
scores of each individual using a longitudinal setup,but assuming that different individuals
25
are independent.Thus our hierarchical random effects model for correlated data obtained by
generalizing the independence model is:
y
i
∼ N(X
i
θ
i
,V
i
),i = 1,...,k,(16)
where V
i
is the variance-covariance matrix for individual i,X
i
are the covariates and θ
i
the
model parameters.The distribution of θ
i
is the same as that specified in Equation??,and the
second stage conjugate priors defined in the same way as in Equation 4,with σ
2
replaced by V
i
whose prior is given by:
V
−1
i
∼ Wishart(C
i
,r
i
),(17)
where C
i
and r
i
for each i are additional hyper-parameters and supplied by the user.We then use
the Gibbs sampler to simulate from the modified posterior distributions which are as follows:
θ
i
|y,θ
0

−1
,V
i
∼ N(D
i
(V
−1
i
X
T
i
y
i

−1
θ
0
),D
i
),i = 1,...,k,where D
−1
i
= V
−1
i
X
T
i
X
i

−1
,
(18)
θ
0
|y,θ
i

−1
,V
i
∼ N(V (kΣ
−1
¯
θ+C
−1
η),V ),where V = (kΣ
−1
+C
−1
)
−1
and
¯
θ =
1
k
k

i=1
θ
i
,
(19)
Σ
−1
|y,θ
i

0
,V
i
∼ Wishart



k

i=1

i
−θ
0
)(θ
i
−θ
0
)
T
+ρR

−1
,k +ρ


,(20)
Finally,the full conditional for V
i
(i = 1,...,k) is the updated Wishart distribution:
V
−1
i
|y,θ
i

0

−1
∼ Wishart(

(y
i
−X
i
θ
i
)(y
i
−X
i
θ
i
)
T
+C
i

−1
,n
i
+r
i
).(21)
A similar inference procedure based on θ
0
and the posterior predictive distributions is used,and
we expect this model to provide more accurate estimates of the regression coefficients than the
independence model.One drawback of this model is that V
i
introduces additional parameters
which may call for extra computing time in the form of longer-running Markov chains.
26
So far we have only applied this extended model to the MACE filter-based system but hope
to apply it to the other two systems as well in the near future.The results are very similar to
the ones obtained from the independence model and hence we only include some of them here.
Table V shows the parameter estimates and the associated 95% credible intervals.None of the
estimates differ significantly from those obtained in the case with the independence assumption
(Table II).The only noteworthy thing is that the intervals for all the coefficients are narrower than
earlier which shows that the current model represents the variability in the data more efficiently
and thus these values are more reliable for the purpose of inference than the earlier estimates
(although only minor difference in the values of the estimates).These observations also imply
that the posterior predictive distributions will not differ significantly from earlier,and so we do
not include these here.Given that the posteriors are so similar,the predicted error curves are
TABLE V
T
HE POINT ESTIMATES AND THE
95%
CREDIBLE INTERVALS FOR
θ
0
UNDER THE MODEL WITH CORRELATIONS
.
Parameter
Estimate
Lower Limit
Upper Limit
α
2.2567
1.2664
3.2468
β
0
1.5763
1.3633
1.7893
γ
0
-0.0567
-0.1826
0.0692
also very close and a same EER of 1.2 −1.5% at optimal threshold values of 12 was obtained.
The empirical error curve based on the posterior distribution of log(PSR) is shown in Figure 18.
F.Model Checking and Validation
We now carry out a crude model checking procedure based on cross-validation in order to
assess the usefulness of our proposed technique.To this end,we build the random effects model
based on the authentication results from a part of the databases consisting of a randomly selected
27
0
20
40
60
80
0
0.01
0.02
0.03
0.04
Thresholds
Error Rates (FAR, FRR)
Fig.18.The empirical error curve - the descending curve represents the FAR and the ascending one represents the FRR.
group of 30 individuals and compute the predicted EERs.These are then compared to the
actual authentication results obtained from the rest of the database which was constituted by
the remaining individuals (35 for CMU-PIE and 25 for Cohn-Kanade).We repeated this random
selection 10 times and the final prediction errors were averaged over these 10 repetitions.Some
results are summarized in Table VI which show clearly the close proximity of the predicted and
the actual EERs for all the systems.We use the independence models here but we expect very
similar outcomes in case of the correlated model.
TABLE VI
C
ROSS
-
VALIDATION RESULTS FOR THE
3
AUTHENTICATION SYSTEMS
.
System
Predicted EER ± Std.Dev.over 10 repetitions
Actual EER ± Std.Dev.over 10 repetitions
MACE
1.4% ± 0.1%
1.3% ± 0.1%
GMM
1.2% ± 0.1%
1.3% ± 0.2%
Asymmetry
0.2% ± 0.2%
0.8% ± 0.1%
Figure 19 shows the distribution of the actual and the predicted log(PSR) values,both for
the authentics and the impostors on a particular validation set.Note here that it is not crucial
how closely each individual prediction matches the original,what matters is the proximity of the
distributions of the actual and the predicted values that will be used to estimate the EER (which
we see here).Similar plots for the other two systems are not included for space constraints,but
28
they show similar correspondence between the actual and the predicted distributions.
3
3.5
4
4.5
5
0
50
100
150
Actual authentic log(PSR)
Frequencies
2.5
3
3.5
4
4.5
0
50
100
150
Predicted authentic log(PSR)
Frequencies
0.5
1
1.5
2
2.5
0
1000
2000
3000
4000
5000
6000
7000
Actual impostor log(PSR)
Frequencies
0.5
1
1.5
2
2.5
0
1000
2000
3000
4000
5000
6000
7000
Predicted impostor log(PSR)
Frequencies
Fig.19.Distributions of the authentic and impostor log(PSR) values on a validation set of 35 individuals (PIE database).
V.T
HE
“W
ATCH
-
LIST
” P
ROBLEM
Suppose a watch-list contains N individuals,that is,there are N stored templates in the
database,one for each person on the relevant watch-list.Now if we need to determine whether
a person randomly chosen from the general population belongs to that watch-list,his image will
be tested against each of these N templates to see if a match occurs.The probabilities of the
following two events that create errors are of interest:

A false match − the image matches one of the stored templates when the person tested is
not actually on the watch-list.

A false non-match − the image does not match any of the stored templates when the person
tested is actually a member of the watch-list.
29
Let p
0
denote the probability of an incorrect match (FAR),and let p
1
denote the probability
of a correct match (1-FRR).Then the probabilities of the two events can be computed as:
p
FM
= Probability that an image will produce a false match with the watch-list database
= Probability that the image matches at least one of the N templates,given he is not on the list
= 1 - Probability that the image matches none of the N templates,given that he is not on the list
= 1 −(1 −p
0
)
N
≈ Np
0
,if p
0
is small.
Hence this probability increases with N in a linear fashion for small p
0
.Similarly,
p
FNM
= Probability that an image will produce a false non-match with the watch-list database
= Probability that a watch-list person will not be identified
 Probability that it does not match its own template
= (1 −p
1
) = 1 −p
1
(= FRR).
So this probability does not depend on the watch-list size,and this is reasonable there is only
one template for each person on the watch-list which an image needs to be matched against.
Here we have made the approximation that the probability of a match for a watch-list individual
against a different watch-list individual is negligible.
Thus using values of the FAR and FRR from the predicted error curves for an authentication
system (Figures 7,12,17),one can obtain predicted values of p
FM
and p
FNM
.Alternatively,one
can use the theoretical values of the FAR and FRR from the Gaussian kernels in Equation 12
to see that these predicted false match rates for different watch-list sizes actually depend on the
threshold τ which helps determine the “optimal” watch-list false alarm rates.These probabilities
of false match and mismatch can therefore be written in terms of the modeled link function g(y)
of the match scores as:
p
FM
= 1 −

1 −Φ

g(τ) −ν
I
η
I

N
p
FNM
= 1 −Φ

g(τ) −μ
A
σ
A

.(22)
30
One choice for for these probabilities is obtained by choosing matching thresholds so that FAR
and FRR are equal (the EER value),although other trade-offs might be desirable in certain
situations.Interpreting these results in practical terms,the results indicate that increasing size
of the watch-list leads to an increasing number of false positives,without significantly changing
the chance of correctly identifying a particular watch-list target (intuitively obvious as well).
A.Application to the Face Authentication Systems
Figures 20(a)-(c) show the variation in the probability of false match with the watch-list size
according to Equation 22,for the three face authentication systems considered in this paper.We
used the FAR values corresponding to the predicted EER for each system to construct these
graphs.These figures show a roughly linear trend for all the 3 systems,that for the asymmetry-
based system being the most pronounced,especially as the watch-list gradually grows in size.
In fact,of the 3 systems,the convergence of the probability of at least one false-match to 1 is
the slowest for the asymmetry-based system,followed by the GMM-based system.According to
these plots,the probability of detecting at least one false match in the general population reaches
1 for a watch-list size of around 400 (quickest) for the MACE filter system;the corresponding
figures for the GMM and the asymmetry systems being around 700 and 1000 respectively.All
these are consistent with expectation since the predicted EER is lowest for the asymmetry system
and highest for the MACE filter system (p
FM
being linear in FAR).
VI.D
ISCUSSION
This paper has presented a novel technique based on a random effects model to predict the
performance of a biometric system on unknown large databases.The methodology provides an
alternative means of performance evaluation to those based on empirical observational studies
by providing model-based prediction.We have demonstrated how it can be applied to three face
31
0
200
400
600
800
1000
1200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Watchlist size
Prob. of falsematch
0
200
400
600
800
1000
1200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Watchlist size
Prob. of falsematch
0
200
400
600
800
1000
1200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Watchlist size
prob of false match
(a) MACE system (b) GMM-based system (c) Asymmetry-based system
Fig.20.Variation in the probability of false match as watch-list size increases for the two face authentication systems.
authentication systems,and we also considered watch-list performance and proposed a method
of studying the variation on the probabilities of false match as a function of watch-list size.
Although we concentrated on facial image biometrics,the method can be adapted for use with
almost any biometric,as long as the match scores from a database are available.For instance,in
theory,one could use the techniques we have introduced to analyze systems based on fingerprint
analysis,retinal scans,or any combination of various techniques.
It is also worth pointing out that the framework can be extended to include many potentially
useful covariates.Beyond simply leading to better goodness-of-fit,the inclusion of covariates
in the model can potentially provide valuable information to the designer of an authentication
system.For example,if one knows in advance what is the expected effect of the number of
training images on the authentication score of a certain system,he/she can roughly determine
the amount of training data needed for the system to be effective.Furthermore,given an idea
of the effects of the different system parameters on the authentication score,one can tune them
for optimal results.
Although the models we considered contained many parameters to be estimated,the imple-
mentation of the Gibbs sampling approach was not overly burdensome in terms of computational
32
load.One possible limitation is the use of Gaussian kernel functions to evaluate EER from our
Markov chain output.So it would be natural to consider more accurate approximations instead,
for instance,based on non-parametric kernel density estimation on the Markov chain output.
We also anticipate increasing scope for application of the general method as more and more
biometric data is recorded and made available over time,and also to other biometric-based
systems,such as those based on fingerprints,iris and multi-biometrics.
A
CKNOWLEDGMENT
The first author’s research was supported in part by a contract from the Army Research Office
(ARO) to CyLab,CMU.The authors would also like to thank Professors B.V.K.Vijaya Kumar
and Stephen E.Fienberg for valuable comments on this work.
R
EFERENCES
[1] M.Savvides,B.V.K.Vijaya Kumar and P.Khosla,“Face Verification using Correlation Filters”,Proc.3rd IEEE Automatic
Identification Advanced Technologies (AutoID),pp.56-61,2002.
[2] S.K.Goo,“Sen.Kennedy Flagged by No-Fly List”,The Washington Post,pp.A01,August 20,2004.
[3] S.Prabhakar,S.Pankanti and A.K.Jain,“Biometric Recognition:Security and Privacy Concerns”,IEEE Magazine on
Security and Privacy,pp.33-42,March-April,2003.
[4] T.Sim,S.Baker and M.BStat,“The CMU Pose,Illumination,and Expression (PIE) Database”,Proc.5th International
Conference on Automatic Face and Gesture Recognition,2002.
[5] A.V.Oppenheim and R.W.Schafer,Discrete-time Signal Processing,Prentice-Hall,1989.
[6] M.A.Turk and A.P.Pentland,“Face Recognition using Eigenfaces”,Proc.IEEE Conference on Computer Vision and Pattern
Recognition (CVPR),1991.
[7] M.Savvides and B.V.K.Vijaya Kumar,“Efficient Design of Advanced Correlation Filters for Robust Distortion-tolerant
Face Identification”,Proc.IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS),pp.45-52,2003.
[8] R.M.Bolle,S.Pankanti and N.K.Ratha,“ Evaluation Techniques for biometrics-based authentication systems (FRR)”,Proc.
International Conference on Pattern Recognition (ICPR),pp.2831-2837,2000.
[9] R.M.Bolle,N.K.Ratha and S.Pankanti,“Evaluating Authentication Systems using Bootstrap confidence intervals”,Proc.
AutoID’99,pp.9-13,Summit,NJ,1999.
33
[10] S.Weisberg,Applied Linear Regression,Wiley,1985.
[11] J.L.Wayman,“A scientific approach to evaluating biometric systems using mathematical methodology”,Card Tech/Secure
Tech,pp.477-492,Orlando,1997.
[12] J.P.Egan,Signal Detection Theory and ROC Analysis,Academic Press,New York,1975.
[13] D.Daugman,“Biometric Decision Landscapes”,Tech.Report,University of Cambridge Computer Laboratory,no.482,
2000.
[14] W.Shen,M.Surette and R.Khanna,“Evaluation of automated biometrics-based identification and verification systems”,
Proc.of IEEE,vol.85,no.9,00.1464-1478,1997.
[15] M.E.Schuckers,“Using the beta-binomial distribution to assess performance of a biometric identification device”,
International Journal of Image Graphics,vol.3,no.3,pp.523-529,2003.
[16] J.A.Hanley and B.J.McNeil,“The meaning and use of the area under a receiver operating characteristic (ROC) curve”,
Radiology,vol.143,pp.29-36,1982.
[17] NIST,“Face Recognition Vendor Test (FRVT)”,http://www.frvt.org,2002.
[18] G.McLachlan and D.Peel,Finite Mixture Models,John Wiley and Sons,2000.
[19] C.P.Robert,“Mixtures of distributions:inference and estimation’,Markov Chain Monte Carlo in Practice,Eds.W.R.
Gilks,S.Richardson and D.J.Spiegelhalter,pp.441-464,1996.
[20] H.Bensmail,G.Celeux and A.Raftery,“Inference in model-based cluster analysis”,Statistics and Computing,vol.7,pp.
1-10,1997.
[21] A.E.Gelfand,S.E.Hills,A.Racine-Poon and A.F.M.Smith,“Illustration of Bayesian Inference in Normal Data Models
using Gibbs Sampling”,Journal of the American Statistical Association,vol.85,no.412,pp.972-985,1990.
[22] A.Gelman,J.B.Carlin,H.S.Stern and D.B.Rubin,Bayesian Data Analysis,Chapman and Hall,1995.
[23] S.Mitra and M.Savvides,“Gaussian Mixture Models based on Phase Spectra for Human identification and illumination
classification”,Proc.4th IEEE Workshop on Automatic Identification Advanced Technologies AutoID),Buffalo,2005.
[24] G.H.Givens,J.R.Beveridge,B.A.Draper and D.Bolme,“A Statistical assessment of subject factors in the PCA recognition
of human faces”,Proc.IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2003.
[25] G.H.Givens,J.R.Beveridge,B.A.Draper,P.Grother and P.J.Phillips,“How features of the human face affect recognition:
A statistical comparison of three face recognition algorithms”,Proc.IEEE Conference on Computer Vision and Pattern
Recognition (CVPR),pp.381-388,2004.
[26] G.H.Givens,J.R.Beveridge,B.A.Draper and P.J.Phillips,“Repeated Measures GLMM estimation of subject-related
factors and false positive threshold effects on human face verification performance”,Proc.IEEE Conference on Computer
Vision and Pattern Recognition (CVPR),pp.40-47,2005.
34
[27] A.Gelman and D.B.Rubin,“Inference from iterative simulation using multiple sequences (with discussion)”,Statistical
Science,vol.7,pp.457-511,1992.
[28] S.Mitra,M.Savvides and B.V.K.Vijaya Kumar,“Facial Asymmetry in the Frequency Domain - A New Robust Biometric”,
Proc.of International Conference on Image Analysis and Recognition (ICIAR),Lecture Notes in Computer Science,Springer-
Verlag,vol.3656,pp.1065-1072,2005.
[29] T.Kanade,J.F.Cohn and Y.L.Tian,“Comprehensive database for facial expression analysis”,Proc.4th IEEE Conference
on Automatic Face and Gesture Recognition,pp.46-53,2000.
[30] P.Grother and P.J.Phillips,“Models of large population recognition”,Proc.IEEE Conference on Computer Vision and
Pattern Recognition (CVPR),vol.2,pp.68-75,July 2004.