Face Recognition

brasscoffeeAI and Robotics

Nov 17, 2013 (3 years and 4 months ago)


Face Recognition
Jens Fagertun
Kongens Lyngby 2005
Master Thesis IMM-Thesis-2005-74
Technical University of Denmark
Informatics and Mathematical Modelling
Building 321,DK-2800 Kongens Lyngby,Denmark
Phone +45 45253351,Fax +45 45882673
This thesis presents a comprehensive overview of the problem of facial recogni-
tion.A survey of available facial detection algorithms as well as implementation
and tests of different feature extraction and dimensionality reduction methods
and light normalization methods are presented.
A new feature extraction and identity matching algorithm,the Multiple Indi-
vidual Discriminative Models (MIDM) algorithm,is proposed.
MIDM is in collaboration with AAM-API,a C++ open source implementation
of Active Appearance Models (AAM),implemented into the “FaceRec” Delphi
7 application,a real time automatic facial recognition system.AAM is used for
face detection and MIDM for face recognition.
Extensive testing of the MIDMalgorithm is presented and its performance eval-
uated by the Lausanne protocol.The Lausanne protocol is a precise and widely
accepted protocol for the testing of facial recognition algorithms.These test
evaluations showed that the MIDMalgorithm is superior to all other algorithms
reported by the Lausanne protocol.
Finally,this thesis presents a description of 3D facial reconstruction from a
single 2D image.This is done by using prior knowledge in form of a statistical
shape model of faces in 3D.
Keywords:Face Recognition,Face Detection,Lausanne Protocol,3D Face Re-
construction,Principal Component Analysis,Fisher Linear Discriminant Anal-
ysis,Locality Preserving Projections,Kernel Fisher Discriminant Analysis.
Denne afhandling præsenterer et omfattende overblik over problemet ansigts
genkendelse.En oversigt over de tilgængelige algoritmer til detektering af an-
sigter s˚avel som implementation og test af forskellige metoder til ekstraktion af
egenskaber og dimensionsreduktion samt metoder til lysnormalisering præsen-
En ny algoritme til ektraktion af egenskaber og matchning af identiteter (Mul-
tiple Individual Discriminative Models - MIDM) er blevet foresl˚aet.
MIDM,sammen med AAM-API,en open-source C++ implementering af Ac-
tive Appearance Models (AAM),er blevet implementeret som applikationen
”FaceRec” i Delphi 7.Denne applikation er et automatisk system til ansigts
genkendelse,der kører i sand tid.AAM er brugt til ansigts detektering og
MIDM er brugt til ansigts genkendelse.
Udførlig testning af MIDM algoritmen er præsenteret og dens ydelse evalueret
ved hjælp af Lausanne protokollen.Lausanne protokollen er en præcis og bredt
accepteret protokol for test af ansigts genkendelses algoritmer.Disse test eval-
ueringer viste at MIDM algoritmen er alle andre algoritmer rapporteret ved
hjælp af Lausanne protokollen overlegen.
Endeligt,præsenterer denne afhandling en beskrivelse af 3D ansigts rekonstruk-
tion fra et enkelt 2D billede.Dette er gjort ved at bruge a priori kendskab i
form af en statistisk model for formen af ansigter i 3D.
Nøgleord:Ansigts Genkendelse,Ansigts Detektering,Lausanne Protokollen,
3D Ansigts Rekonstruktion,Principal Komponent Analyse,Fisher Linear Dis-
kriminant Analyse,Locality Preserving Projections,Kernel Fisher Diskriminant
This thesis was prepared at the Section for Image Analysis,in the Department
of Informatics and Mathematical Modelling,IMM,located at the Technical
University of Denmark,DTU,as a partial fulfillment of the requirements for
acquiring the degree Master of Science in Engineering,M.Sc.Eng.
The thesis deals with different aspects of face recognition using both the geo-
metrical and photometrical information of facial images.The main focus will
be on face recognition from 2D images,but 2D to 3D conversion of data will
also be considered.
The thesis consists of this report,a technical report and two papers;one pub-
lished in Proceedings of the 14th Danish Conference on Pattern Recognition and
Image Analysis and one submitted to IEEE Transactions on Pattern Analysis
and Machine Intelligence,written during the period January to September 2005.
It is assumed that the reader has a basic knowledge in the areas of statistics
and image analysis.
Lyngby,September 2005
Jens Fagertun
Publication list for this thesis
[20] Jens Fagertun,David Delgado Gomez,Bjarne K.Ersbøll and Rasmus
Larsen.A face recognition algorithm based on multiple individual dis-
criminative models.Proceedings of the 14th Danish Conference on Pattern
Recognition and Image Analysis,2005.
[21] Jens Fagertun and Mikkel B.Stegmann.The IMMFrontal Face Database.
Technical Report,Informatics and Mathematical Modelling,Technical
University of Denmark,2005.
[27] David Delgado Gomez,Jens Fagertun and Bjarne K.Ersbøll.A face
recognition algorithm based on multiple individual discriminative models.
IEEE Transactions on Pattern Analysis and Machine Intelligence.To
appear - Submitted in 2005 - ID TPAMI-0474-0905.
I would like to thank the following people for there support and assistance in
my preparation of the work presented in this thesis:
First and foremost,I thank my supervisor Bjarne K.Ersbøll for his support
throughout this thesis.It has been an exciting experience and great opportunity
to work with face recognition,a very interesting area in image analysis and
pattern recognition.
I thank my co-supervisor Mikkel B.Stegmann for his huge initial support and
always having time to spare.
I thank my good friend David Delgado Gomez for all the productive conver-
sations on different issues of face recognition,and for an excellent partnership
during the writing of the two papers included in this thesis.
I thank Rasmus Larsen for his great patience when answering questions of a
statistical nature.
I thank my office-mates Mads Fogtmann Hansen,Rasmus Engholm and Steen
Lund Nielsen for productive conversations and spending time answering my
questions,which has been a great help.
I thank Mette Christensen and Bo Langgaard Lind for proof-reading the manus-
cript of this thesis.
I thank Lars Kai Hansen since he encouraged me to write my thesis at the image
analysis section.
In general I thank the staff of the image analysis- and computer graphics section
for providing a pleasant and inspiring atmosphere as well as for their participa-
tion in the construction of the IMM Frontal Face Database.
Finally,I thank David Delgado Gomez and the Computational Imaging Lab at
the Department of Technology at Pompeu Fabra University,Barcelona for their
partnership in the participation in the ICBA2006
Face Verification Contest in
Hong Kong in January,2006.
International Conference on Biometrics 2006.
xii Contents
Abstract i
Resum´e iii
Preface v
Publication list for this thesis vii
Acknowledgements ix
1 Introduction 1
1.1 Motivation and Objectives......................3
1.2 Thesis Overview...........................4
1.3 Mathematical Notation.......................5
1.4 Nomenclature.............................6
1.5 Abbreviations.............................7
I Face Recognition in General 9
2 History of Face Recognition 11
3 Face Recognition Systems 13
3.1 Face Recognition Tasks.......................13
3.1.1 Verification..........................14
3.1.2 Identification.........................14
3.1.3 Watch List..........................14
3.2 Face Recognition Vendor Test 2002.................16
3.3 Discussion...............................20
4 The Process of Face Recognition 21
4.1 Face Detection............................22
4.2 Preprocessing.............................22
4.3 Feature Extraction..........................23
4.4 Feature Matching...........................23
4.5 Thesis Perspective..........................23
5 Face Recognition Considerations 25
5.1 Variation in Facial Appearance...................25
5.2 Face Analysis in an Image Space..................26
5.2.1 Exploration of Facial Submanifolds.............27
5.3 Dealing with Non-linear Manifolds.................28
5.3.1 Technical Solutions......................28
5.4 High Input Space and Small Sample Size..............30
6 Available Data 33
6.1 Face Databases............................34
6.1.1 AR...............................34
6.1.2 BioID.............................34
6.1.3 BANCA............................35
6.1.4 IMM Face Database.....................35
6.1.5 IMM Frontal Face Database.................35
6.1.6 PIE..............................36
6.1.7 XM2VTS...........................36
6.2 Data Sets Used in this Work....................36
6.2.1 Data Set I...........................37
6.2.2 Data Set II..........................37
6.2.3 Data Set III..........................41
6.2.4 Data Set IV..........................41
II Assessment 45
7 Face Detection:A Survey 47
7.1 Representative Work of Face Detection...............49
7.2 Description of Selected Face Detection Methods..........49
7.2.1 General Aspects of Face Detections Algorithms......50
7.2.2 Eigenfaces...........................50
7.2.3 Fisherfaces..........................51
7.2.4 Neural Networks.......................51
7.2.5 Active Appearance Models.................52
8 Preprocessing of a Face Image 59
8.1 Light Correction...........................59
8.1.1 Histogram Equalization...................59
8.1.2 Removal of Specific Light Sources based on 2D Face Models 60
8.2 Discussion...............................62
9 Face Feature Extraction:Dimensionality Reduction Methods 65
9.1 Principal Component Analysis...................66
9.1.1 PCA Algorithm........................66
9.1.2 Computational Issues of PCA................68
9.2 Fisher Linear Discriminant Analysis................69
9.2.1 FLDA in Face Recognition Problems............70
9.3 Locality Preserving Projections...................70
9.3.1 LPP in Face Recognition Problems.............73
9.4 Kernel Fisher Discriminant Analysis................73
9.4.1 Problems of KFDA......................77
10 Experimental Results I 79
10.1 Illustration of the Feature Spaces..................80
10.2 Face Identification Tests.......................85
10.2.1 50/50 Test..........................85
10.2.2 Ten-fold Cross-validation Test................86
10.3 Discussion...............................87
III Development 89
11 Multiple Individual Discriminative Models 91
11.1 Algorithm Description........................92
11.1.1 Creations of the Individual Models.............92
11.1.2 Classification.........................96
11.2 Discussion...............................96
12 Reconstruction of 3D Face from a 2D Image 99
12.1 Algorithm Description........................99
12.2 Discussion...............................103
13 Experimental Results II 105
13.1 Overview...............................105
13.1.1 Initial Evaluation Tests...................105
13.1.2 Lausanne Performance Tests................106
13.2 Initial Evaluation Tests.......................106
13.2.1 Identification Test......................106
13.2.2 The Important Image Regions...............108
13.2.3 Verification Test.......................110
13.2.4 Robustness Test.......................112
13.3 Lausanne Performance Tests.....................113
13.3.1 Participants in the Face Verification Contest,2003....115
13.3.2 Results............................116
13.4 Discussion...............................118
IV Implementation 121
14 Implementation 123
14.1 Overview...............................123
14.2 FaceRec................................124
14.2.1 FaceRec Requirements....................125
14.2.2 AAM-API DLL........................125
14.2.3 Make MIDM Model File...................126
14.3 Matlab Functions...........................126
14.4 Pitfalls.................................127
14.4.1 Passing Arrays........................127
14.4.2 The Matlab Eig Function..................127
V Discussion 129
15 Future Work 131
15.1 Light Normalization.........................131
15.2 Face Detection............................132
15.3 3D Facial Reconstruction......................132
16 Discussion 133
16.1 Summary of Main Contributions..................133
16.1.1 IMM Frontal Face Database.................133
16.1.2 The MIDM Algorithm....................134
16.1.3 A Delphi Implementation..................134
16.1.4 Matlab Functions.......................134
16.2 Conclusion..............................135
A The IMM Frontal Face Database 143
B A face recognition algorithm based on MIDM 153
C A face recognition algorithm based on MIDM 163
D FaceRec Quick User Guide 173
F CD-ROM Contents 181
Chapter 1
Face recognition is a task so common to humans,that the individual does not
even notice the extensive number of times it is performed every day.Although
research in automated face recognition has been conducted since the 1960’s,it
has only recently caught the attention of the scientific community.Many face
analysis and face modeling techniques have progressed significantly in the last
decade [30].However,the reliability of face recognition schemes still poses a
great challenge to the scientific community.
Falsification of identity cards or intrusion of physical and virtual areas by crack-
ing alphanumerical passwords appear frequently in the media.These problems
of modern society have triggered a real necessity for reliable,user-friendly and
widely acceptable control mechanisms for the identification and verification of
the individual.
Biometrics,which is based on authentication on the intrinsic aspects of a spe-
cific human being,appears as a viable alternative to more traditional approaches
(such as PIN codes or passwords).Among the oldest biometric techniques is
fingerprint recognition.This technique was used in China as early as 700 AD
for official certification of contracts.Later on,in the middle of the 19
it was used for identification of persons in Europe [31].A currently developed
biometric technique is iris recognition [17].This technique is now used instead
of passport identification for frequent flyers in some airports in United King-
2 Introduction
dom,Canada and the Netherlands.As well as for access control of employees to
restricted areas in Canadian airports and in the New Yorks JFK airport.These
techniques are inconvenient due to the necessity of interaction with the individ-
ual who is to be identified or authenticated.Face recognition on the other hand
can be a non-intrusive technique.This is one of the reasons why this technique
has caught an increased interest from the scientific community in the recent
Facial recognition holds several advantages over other biometric techniques.It is
natural,non-intrusive and easy to use.In a study considering the compatibility
of six biometric techniques (face,finger,hand,voice,eye,signature) with ma-
chine readable travel documents (MRTD) [32] facial features scored the highest
percentage of compatibility,see Figure 1.1.In this study parameters like the en-
rollment,renewal,machine requirements and public perception were considered.
However,facial features should not be considered the most reliable biometric.
Figure 1.1:
Comparison of machine readable travel documents (MRTD)
compatibility with six biometric techniques;face,finger,hand,voice,eye,
signature.Courtesy of Hietmeyer [32].
The increased interest automated face recognition systems have gained,from
environments other than the scientific community is largely due to increasing
public concerns for security,especially due to the many events of terror around
the world after September 11
However,automated facial recognition can be used in a lot of areas other than
security oriented applications (access-control/verification systems,surveillance
systems),such as computer entertainment and customized computer-human in-
teraction.Customized computer-human interaction applications will in the near
future be found in products such as cars,aids for disabled people,buildings,etc.
The interest for automated facial recognition and the amount of applications will
most likely increase even more in the future.This could be due to increased
1.1 Motivation and Objectives 3
penetration of technologies,such as digital cameras and the internet,and due
to a larger demand for different security schemes.
Even though humans are experts in facial recognition is it not yet understood
how this recognition is performed.For many years psychophysicists and neu-
roscientists have been researching whether face recognition is done holistically
or by local feature analysis,i.e.is face recognition done by looking at the face
as a whole or by looking at local facial features independently [6,25].It is
however clear that humans are only capable of holding one face image in the
mind at a given time.Figure 1.2 shows a classical illusion called “The Wife and
the Mother-in-Law”,which was introduced into the psychological literature by
Edwin G.Boring.What do you see?A witch or a young lady?
Figure 1.2:
“The Wife and the Mother-in-Law” by Edwin G.Boring.
What do you see?A witch or a young lady?Courtesy of Danial Chandler
1.1 Motivation and Objectives
Face recognition has recently received a blooming attention and interest from
the scientific community as well as from the general public.The interest from
the general public is mostly due to the recent events of terror around the world,
which has increased the demand for useful security systems.Facial recognition
applications are far from limited to security systems as described above.
To construct these different applications,precise and robust automated facial
4 Introduction
recognition methods and techniques are needed.However,these techniques
and methods are currently not available or only available in highly complex,
expensive setups.
The topic of this thesis is to help solving the difficult task of robust face recog-
nition in a simple setup.Such a solution would be of great scientific importance
and would be useful to the public in general.
The objectives of this thesis will be:
• To discuss and summarize the process of facial recognition.
• To look at currently available facial recognition techniques.
• To design and develop a robust facial recognition algorithm.The algo-
rithm should be usable in a simple and easily adaptable setup.This im-
plies a single camera setup,preferably a webcam,and no use of specialized
Besides these theoretical objectives a proof-of-concept implementation of the
developed method will be carried out.
1.2 Thesis Overview
In the fulfilment with the objectives this thesis is naturally divided into five
parts,where each part requires knowledge from the preceding parts.
Part I Face Recognition in General.Presents a summary of the history of
face recognition.Discusses the different commercial face recognition sys-
tems,the general face recognition process and the different considerations
regarding facial recognition.
Part II Assessment.Presents an assessment of the central tasks of face recog-
nition identified in Part I,which include face detection,preprocessing of
facial images and feature extracting.
Part III Development.Documents the design,development and testing of
the Multiple Individual Discriminative Models face recognition algorithm.
Furthermore,preliminary work in retrieval of depth information from one
2D image and a statistical shape model of 3D faces are presented.
1.3 Mathematical Notation 5
Part IV Implementation.Documents the design and development of a face
recognition system using the algorithm devised in Part III.
Part V Discussion.Presents a discussion of possible ideas to future work and
conclude on the work done in this thesis.
1.3 Mathematical Notation
Throughout this thesis the following mathematical notations are used:
Scalar values are denoted with lower-case italic Latin or Greek letters:
Vectors,are denoted with lower-case,non-italic bold Latin or Greek letters.In
this thesis only column vectors are used:
x = [x
Matrices are denoted with capital,non-italic bold Latin or Greek letters:

a b
c d

Sets of objects such as scalars,vectors,images etc.are shown in vectors with
curly braces:
Indexing into a matrix is displayed,as row-column subscript of either scalars or
= M
,x = [x,y]
The mean vector of a specific data set,is denoted with lower-case,non-italic
bold Latin or Greek letters with a bar:
6 Introduction
1.4 Nomenclature
Landmarks set is a set of x and y coordinates that describes features (here
facial features) like eyes,ears,noses,and mouth corners.
Geometric information is the distinct information of an object’s shape,usu-
ally extracted by annotating the object with landmarks.
Photometric information is the distinct information of the image,i.e.the
pixel intensities of the image.
Shape is according to Kendall [33] all the geometrical information that remains
when location,scale and rotational effects are filtered out from an object.
Variables used throughout this thesis are listed below:
A sample vector in the input space.
A sample vector in the output space.
Φ An eigenvector matrix.
The i
Λ A diagonal matrix of eigenvalues.
The eigenvalue corresponding to the i
Σ A covariance matrix.
The between-class matrix,of Fisher Linear Discriminant Analysis.
The within-class matrix,of Fisher Linear Discriminant Analysis.
S The adjacency graph,of Locality Preserving Projections.
Ψ A non-linear mapping from an input space to a high dimensional
implicit output space.
K A Mercer kernel function.
I The identity matrix.
1.5 Abbreviations 7
1.5 Abbreviations
A list of the abbreviations used in thesis can be found below:
PCA Principal Component Analysis.
FLDA Fisher Linear Discriminant Analysis.
LPP Locality Preserving Projections.
KFDA Kernel Fisher Discriminant Analysis.
MIDM Multiple Individual Discriminative Models.
HE Histogram Equalization.
FAR False Acceptance Rate.
FRR False Rejection Rate.
EER Equal Error Rate.
TER Total Error Rate.
CIR Correct Identification Rate.
FIR False Identification Rate.
ROC Receiver Operating Characteristic (curve).
AAM Active Appearance Model.
ASM Active Shape Model.
PDM Point Distribution Model.
8 Introduction
Part I
Face Recognition in General
Chapter 2
History of Face Recognition
The most intuitive way to carry out face recognition is to look at the major
features of the face and compare these to the same features on other faces.Some
of the earliest studies on face recognition were done by Darwin [15] and Galton
[24].Darwin’s work includes analysis of the different facial expressions due to
different emotional states,where as Galton studied facial profiles.However,the
first real attempts to develop semi-automated facial recognition systems began
in the late 1960’s and early 1970’s,and were based on geometrical information.
Here,landmarks were placed on photographs locating the major facial features,
such as eyes,ears,noses,and mouth corners.Relative distances and angles were
computed from these landmarks to a common reference point and compared to
reference data.In Goldstein et al.[26] (1971) a systemis created of 21 subjective
markers,such as hair color and lip thickness.These markers proved very hard
to automate due to the subjective nature of many of the measurements still
made completely by hand.
A more consistent approach to do facial recognition was done by Fischler et al.
[23] (1973) and later by Yuille et al.[61] (1992).This approach measured the
facial features using templates of single facial features and mapped these onto
a global template.
In summary,most of the developed techniques during the first stages of facial
recognition focused on the automatic detection of individual facial features.The
12 History of Face Recognition
greatest advantages of these geometrical feature-based methods are the insensi-
tivity to illumination and the intuitive understanding of the extracted features.
However,even today facial feature detection and measurement techniques are
not reliable enough for the geometric feature-based recognition of a face and
geometric properties alone are inadequate for face recognition [12,37].
Due to this drawback of geometric feature-based recognition,the technique has
gradually been abandoned and an effort has been made in researching holistic
color-based techniques,which has provided better results.Holistic color-based
techniques align a set of different faces to obtain a correspondence between pixels
intensities,a nearest neighbor classifier [16] can be used to classify new faces
when the new image is first aligned to the set of already aligned images.By the
appearance of the Eigenfaces technique [55],a statistical learning approach,this
coarse method was notably enhanced.Instead of directly comparing the pixel
intensities of the different facial images,the dimension of the input intensities
were first reduced by a Principal Component Analysis (PCA) in the Eigenface
technique.Eigenfaces is a basis component of many of the image based facial
recognition schemes used today.One of the current techniques is Fisherfaces.
This technique is widely used and referred [4,9].It combines the Eigenfaces
with Fisher Linear Discriminant Analysis (FLDA) to obtain a better separation
of the individual faces.In Fisherfaces,the dimension of the input intensity
vectors is reduced by PCA and then FLDA is applied to obtain an optimal
projection for separation of the faces from different persons.PCA and FLDA
will be described in Chapter 9.
After development of the Fisherface technique,many related techniques have
been proposed.These new techniques aim at providing an even better projec-
tion for separation of the faces from different persons.They try to strengthen
the robustness in coping with differences in illumination or image pose.Tech-
niques like Kernel Fisherfaces [59],Laplacianfaces [30] or discriminative com-
mon vectors [7] can be found among these approaches.The techniques behind
Eigenfaces,Fisherfaces,Laplacianfaces and Kernel Fisherfaces will be discussed
further later in this thesis.
Chapter 3
Face Recognition Systems
This chapter deals with the tasks of face recognition and how to report per-
formance.The performance of some of the best commercial face recognition
systems is included as well.
3.1 Face Recognition Tasks
The three primary face recognition tasks are:
• Verification (authentication) - Am I who I say I am?(one to one search)
• Identification (recognition) - Who am I?(one to many search)
• Watch list - Are you looking for me?(one to few search)
Different schemes are to be applied to test the three tasks described above.
Which scheme to use depends on the nature of the application.
14 Face Recognition Systems
3.1.1 Verification
The verification task is aimed at applications requiring user interaction in the
form of a identity claim,i.e.access applications.
The verification test is conducted by dividing persons into two groups:
• Clients,people trying to gain access using their own identity.
• Imposters,people trying to gain access using a false identity,i.e.an
identity known to the system but not belonging to them.
The percentage of imposters gaining access is reported as the False Acceptance
Rate (FAR) and an the percentage of client rejected access is reported as the
False Rejection Rate (FRR) for a given threshold.An illustration of this is
displayed in Figure 3.1.
3.1.2 Identification
The identification task is mostly aimed at applications not requiring user inter-
action,i.e.surveillance applications.
The identification test works from the assumption that all faces in the test are
of known persons.The percentage of correct identifications is then reported as
the Correct Identification Rate (CIR) or the percentage of false identifications
is reported as the False Identification Rate (FIR).
3.1.3 Watch List
The watch list task is a generalization of the identification task which includes
unknown people.
The watch list test is like the identification test reported in CIR or FIR,but can
have FAR and FRR associated with it to describe the sensitivity of the watch
list,meaning how often is an unknown classified as a person in the watch list
3.1 Face Recognition Tasks 15
Figure 3.1:
Relation of False Acceptance Rate (FAR),False Rejection
Rate (FRR) with the distribution of clients,imposters in a verification
scheme.A) Shows the imposters and client populations in terms of the
score (high score meaning high likelihood of belonging to the client popu-
lation).B) The associated FAR and FRR,the Equal Error Rate (EER) is
where the FAR and FRR curve meets and gives the threshold value for the
best separability of the imposter and client classes.
16 Face Recognition Systems
3.2 Face Recognition Vendor Test 2002
In 2002 the Face Recognition Vendor Test 2002 [45] tested some of the best
commercial face recognition systems for their performance in the three primary
face recognition tasks described in Section 3.1.This test used 121589 facial
images of a group of 37437 different people.The different systems participating
in the test are listed in Table 3.1.The evaluation was performed in reasonable
controlled indoor lighting conditions
Web site
AcSys Biometrics Corp
Cognitec Systems GmbH
Dream Mirh Co.,Ltd
Eyematic Interfaces Inc.
Imagis Technologies Inc.
Viisage Technology
VisionSphere Technologies Inc.
Table 3.1:
Participants in the Face Recognition Vendor Test 2002.
Face recognition tests performed outside with unpredictable lighting conditions show a
drastic drop in performance compared with indoor experiments [45].
3.2 Face Recognition Vendor Test 2002 17
The systems providing the best results in the vendor test showthe characteristics
listed in Table 3.2.
Watch list
56% to 77%
Table 3.2:
The characteristics of the highest performing systems in the
Face Recognition Vendor Test 2002.The highest performing system for
the identification task and the watch list task was Cognitec.Cognitec and
Identix was both the highest performing system for the verification task.
Selected conclusions from the Face Recognition Vendor Test 2002 are:
• The identification task yields better results for smaller databases,than
larger ones.The identification task gave a higher score the smaller
database used.Identification performance showed a linear decrease with
respect to the logarithm of the size of the database.For every doubling
of the size of the database performance decreased by 2% to 3%.See Fig-
ure 3.2.
• The face recognition systems showed a tendency to more easily identify
older than younger people.The three best performing systems showed an
average increase of performance by approximately 5% for every ten years
increase of age of the test population.See Figure 3.3.
• The more time that elapses from the training of the system to the pre-
sentation of a new “up-to-date” image of a person the more recognition
performance is decreased.For the three best performing systems there
were an average decrease of approximately 5% per year.See Figure 3.4.
56% and 77% corresponds to the use of watch lists of 3000 and 25 persons,respectively.
18 Face Recognition Systems
Figure 3.2:
The Correct Identification Rates (CIR) plotted as a function
of gallery size.Color of curves indicate the different vendors used in the
test.Courtesy of Phillips et al.[45].
Figure 3.3:
The average Correct Identification Rates (CIR) of the three
highest performing systems (Cognitec,Identix and Eyematic),broken into
age intervals.Courtesy of Phillips et al.[45].
3.2 Face Recognition Vendor Test 2002 19
Figure 3.4:
The average Correct Identification Rates (CIR) of the three
highest performing systems (Cognitec,Identix and Eyematic),divided into
intervals of elapsed time from the time of the systems construction to the
time a new image is introduced to the systems.Courtesy of Phillips et al.
20 Face Recognition Systems
3.3 Discussion
Interestingly,the results from the Face Recognition Vendor Test 2002 indicate
a higher identification performance of older people compared to younger.In ad-
dition,the results indicate that it gets harder to identify people as time elapses,
which is not surprising since the human face continually changes over time.The
results of the Face Recognition Vendor Test 2002,reported in Table 3.2,are hard
to interpret and compare to other tests,since change in the test protocol or test
data will yield different results.However,these results provide an indication of
the performance of commercial face recognition systems.
Chapter 4
The Process of Face
Facial recognition is a visual pattern recognition task.The three-dimensional
human face,which is subject to varying illumination,pose,expression etc.has
to be recognized.This recognition can be performed on a variety of input data
sources such as:
• A single 2D image.
• Stereo 2D images (two or more 2D images).
• 3D laser scans.
Also,soon Time Of Flight (TOF) 3D cameras will be accurate enough to be
used as well.The dimensionality of these sources can be increased by one by
the inclusion of a time dimension.A still image with a time dimension is a
video sequence.The advantage is that the identification of a person can be
determined more precisely from a video sequence than from a picture since the
identity of a person can not change from two frames taken in sequence from a
video sequence.
This thesis is constrained to face recognition from single 2D images,even when
tracking of faces is done in video sequences.However,Chapter 12 deals with
22 The Process of Face Recognition
3D reconstruction of faces from one or more 2D images using statistical models
of 3D laser scans.
Facial recognition systems usually consist of four steps,as shown in Figure 4.1;
face detection (localization),face preprocessing (face alignment/normalization,
light correction and etc.),feature extraction and feature matching.These steps
are described in the following sections.
Figure 4.1:
The four general steps in facial recognition.
4.1 Face Detection
The aim of face detection is localization of the face in a image.In the case
of video input,it can be an advantage to track the face in between multiple
frames,to reduce computational time and preserve the identity of a face (person)
between frames.Methods used for face detection includes:Shape templates,
Neural networks and Active Appearance Models (AAM).
4.2 Preprocessing
The aim of the face preprocessing step is to normalize the coarse face detection,
so that a robust feature extraction can be achieved.Depending of the applica-
tion,face preprocessing includes:Alignment (translation,rotation,scaling) and
light normalization/correlation.
4.3 Feature Extraction 23
4.3 Feature Extraction
The aim of feature extraction is to extract a compact set of interpersonal dis-
criminating geometrical or/and photometrical features of the face.Methods for
feature extraction include:PCA,FLDA and Locality Preserving Projections
4.4 Feature Matching
Feature matching is the actual recognition process.The feature vector obtained
from the feature extraction is matched to classes (persons) of facial images
already enrolled in a database.The matching algorithms vary from the fairly
obvious Nearest Neighbor to advanced schemes like Neural Networks.
4.5 Thesis Perspective
This thesis will cover all four general areas in face recognition,though the pri-
mary focus is on feature extraction and feature matching.
A survey of face detection algorithms is presented in Chapter 7.Preprocessing
of facial images is discussed in Chapter 8.Amore in-depth description of feature
extraction methods is presented in Chapter 9.The performance of these feature
extraction methods is presented in Chapter 10,where the Nearest Neighbor
algorithm will be used for feature matching.A new face recognition algorithm
is developed in Chapter 11.
24 The Process of Face Recognition
Chapter 5
Face Recognition
In this chapter general considerations of the process of face recognition are
discussed.These are:
• The variation of facial appearance of different individuals,which can be
very small.
• The non-linear manifold on which face images reside.
• The problem of having a high-dimensional input space and only a small
number of samples.
The scope of this thesis is further defined with the respect to these considera-
5.1 Variation in Facial Appearance
A facial image is subject to various factors like facial pose,illumination and
facial expression as well as lens aperture,exposure time and lens aberrations of
26 Face Recognition Considerations
the camera.Due to these factors large variations of facial images of the same
person can occur.On the other hand,sometimes small interpersonal variations
occur.Here the extreme is identical twins,as can be seen in Figure 5.1.Different
constraints in the process of acquiring images can be used to filter out some of
these factors,as well as use of preprocessing methods.
In a situation where the variation among images obtained from the same person
is larger than the variation among images of two individuals persons more com-
prehensive data than 2D images must be acquired to do computer based facial
recognition.Here,accurate laser scans or infrared images (showing the blood
vessel distribution in the face) can be used.These methods are out of the scope
of this thesis and will not be discussed further.This thesis is mainly concerned
with 2D frontal face images.
Figure 5.1:
Small interpersonal variations illustrated by identical twins.
Courtesy of www.digitalwilly.com.
5.2 Face Analysis in an Image Space
When looking at the photometric information of a face,face recognition mostly
rely on analysis of a subspace,since faces in images reside in a submanifold of the
image space.This can be illustrated by an image consisting of 32 × 32 pixels.
This image contains a total of 1024 pixels,with the ability to display a long
range of different scenerys.Using only an 8-bit gray scale per pixel this image
can show a huge number of different configurations,exactly 256
= 2
a comparison the world population is only about 2
.It is clear that only a small
fraction of these image configurations will display faces.As a result most of the
5.2 Face Analysis in an Image Space 27
original image space representation is very redundant from a facial recognition
point of view.It must therefore be possible to reduce the input image space
to obtain a much smaller subspace,where the objective of the subspace is to
remove noise and redundancy while preserving the discriminative information
of the face.
However,the manifolds where faces reside seem to be highly non-linear and
non-convex [5,53].The following experiment explores this phenomenon in an
attempt to obtain a deeper understanding of the problem.
5.2.1 Exploration of Facial Submanifolds
The purpose of the experiment presented in this section is to visualize that the
facial images reside in a submanifold which is highly non-linear and non-convex.
For this purpose ten similar facial images were obtained from three persons of
the IMMFrontal Face Database
,yielding a total of 30 images.All images were
converted to grayscale,cropped to only contain the facial region and scaled to
100 ×100 pixels.Then 33 new images were produced from each of the original
images by following manipulations:
• Translation;Translation of the original image was done along the x-axis
using the set (in pixels):
• Rotation;Rotation of the original image was done around the center of
the image using the set (in degrees):
• Scaling;Scaling of the original image was done using the set (in %):
These manipulations resulted in the production of 30 ×33 = 990 images.An
example of 33 images produced from one original image is shown in Figure 5.2.
A Principal Component Analysis was conducted on the original 30 images to
produce a three-dimensional subspace spanned by the three largest principal
components.Then all 990 images were mapped into this subspace.These
This data set is further described in Chapter 6.
28 Face Recognition Considerations
mappings into this subspace can be seen in Figure 5.3,where the images derived
from the same original image are connected for easier visual interpretation.
These mappings intuitively suggest that the manifold in which the facial images
reside is non-linear and non-convex.A similar but more comprehensive test is
performed by Li et al.[37].
Figure 5.2:
A sample of 33 facial images produced from one original
image.The rows A,B and C are constructed by translation,rotation and
scaling of the original image,respectively.
5.3 Dealing with Non-linear Manifolds
As described above is the face manifold highly non-linear and non-convex.The
linear methods discussed later in Chapter 9 such as Principal Component Anal-
ysis (PCA) and Fisher Linear Discriminant Analysis (FLDA) are as a result
only partly capable of preserving these non-linear variations.
5.3.1 Technical Solutions
To overcome the challenges of non-linear and non-convex face manifolds there
are two general approaches:
• The first approach is to construct a feature subspace where the face man-
ifolds become simpler,i.e.less non-linear and non-convex than the input
space.This can be obtained by normalization of the face image both geo-
metrically and photometrically to reduce variation.Followed by extraction
5.3 Dealing with Non-linear Manifolds 29
Figure 5.3:
Results of the exploration of facial submanifolds.The 990
images derived from 30 original facial images are mapped into a three-
dimensional space spanned by the three largest eigenvectors of the original
images.The images derived form the original images are connected.The
images of the three persons are plotted in different colors.The three sets
of 30 × 11 images derived by translation,rotation and scaling are displayed
in row A,B and C,respectively.
30 Face Recognition Considerations
of features in the normalized image.For this purpose linear methods like
PCA,FLDA or even non-linear methods as Kernel Fisher Discriminant
Analysis (KFDA) can be used [1].These methods will be described in
Chapter 9.
• The second approach is to construct classification engines capable of solv-
ing the difficult non-linear classification problems of the image space.
Methods like Neural Networks,Support Vector Machines etc.can be used
for this purpose.
In addition the two approaches can be combined.
Work done using only the first approach to statistically understand and simplify
the complex problem of facial recognition is pursued in this thesis.
5.4 High Input Space and Small Sample Size
Another problem associated with face recognition is the high input space of an
image and the usually small sample size of an individual.An image consisting
of 32 × 32 pixels resides in a 1024-dimensional space,where as the number of
images of a specific person typically is much smaller.A small number of images
of a specific person may not be sufficient to make a appropriate approximation
of the manifold,which can cause a problem.An illustration of this problem
is displayed in Figure 5.4.Currently,no known solution comes to mind for
solving this problem.Other than capturing a sufficient number of samples to
approximate the manifold in a satisfying way.
5.4 High Input Space and Small Sample Size 31
Figure 5.4:
An illustration of the problem of not being capable of sat-
isfactory approximating the manifold when only having a small number of
samples.The samples are denoted by circles.
32 Face Recognition Considerations
Chapter 6
Available Data
This chapter presents a small survey of databases used for facial detection and
These databases include the IMM Frontal Face Database [21],which has been
recorded and annotated with landmarks as a part of this thesis.The technical
report made in conjunction with the IMM Frontal Face Database is found in
Appendix A.
Finally,an in-depth description of the actual subsets of three databases used in
this thesis is presented.The three databases used are:
• IMM Frontal Face Database:Used for initial testing in Chapter 10.
• The AR database:Used for a comprehensive test of the MIDMface recog-
nition method (which is proposed in Chapter 11).The test results are
shown in Chapter 13.
• The XM2VTS database:Used for evaluating the performance of the
MIDM algorithm.
Work done using the XM2VTS database has been performed in collaboration
34 Available Data
with Dr.David Delgado Gomez
.The obtained results are to be used for the
participation in the ICBA2006
Face Verification Contest in Hong Kong,Jan-
uary 2006.
6.1 Face Databases
In order to build/train and reliably test face recognition algorithms sizeable
databases of face images are needed.Many face databases to be used for non-
commercial purposes are available on the internet,either free of charge or for
small fees.
These databases are recorded under various conditions and with various appli-
cations in mind.The following sections briefly describe some of the available
databases which are widely known and used.
6.1.1 AR
The AR-database was recorded in 1998 at the Computer Vision Center in
Barcelona.The database contains images of 116 people;70 male and 56 fe-
male.Every person was recorded in two sessions each consisting of 13 images,
resulting in a total of 3016 images.The two sessions were recorded two weeks
apart.The 13 images of each session captured varying facial expressions,illumi-
nations and occlusions.All images of the AR database are color images with a
resolution of 768 × 576 pixels.Landmark annotations based on a 22-landmark
scheme are available for some of the images of the AR database.
6.1.2 BioID
The BioID database was recorded in 2001.BioID contains 1521 images of 23
persons,about 66 images per person.The database was recorded during an
unspecified number of sessions using a high variation of illumination,facial
expression and background.The degree of variation was not controlled resulting
Post-doctoral at the Computational Imaging Lab,Department of Technology,Pompeu
Fabra University,Barcelona.
International Conference on Biometrics 2006.
6.1 Face Databases 35
in “real” life image occurrences.All images of the BioID database are recorded
in grayscale with a resolution of 384 × 286 pixels.Landmark annotations based
on a 20-landmark scheme are available.
6.1.3 BANCA
The BANCA multi database was collected as part of the European BANCA
project.BANCA contains images,video and audio samples,though only the
images are described here.BANCAcontains images of 52 persons.Every person
was recorded in 12 sessions each consisting of 10 images,resulting in a total of
6240 images.The sessions were recorded during a three months period.Three
different image qualities were used to acquire the images,where each image
quality was recorded during four sessions.All images are recorded in color with
a resolution of 720 × 576 pixels.
6.1.4 IMM Face Database
The IMMFace Database was recorded in 2001 at the Department of Informatics
and Mathematical Modelling - Technical University of Denmark.The database
contains images of 40 people;33 male and 7 female.It was recorded during one
session and consists of 7 images per person resulting in a total of 240 images.
The 7 images of each person were captured under varying facial expressions,
camera view points and illuminations.Most of the images are recorded in color
while the rest are recorded in grayscale,all with a resolution of 640 × 480 pixels.
Landmark annotations based on a 58-landmark scheme are available.
6.1.5 IMM Frontal Face Database
The IMMFrontal Face Database was recorded in 2005 at the Department of In-
formatics and Mathematical Modelling - Technical University of Denmark.The
database contains images of 12 people;all males.The database was recorded
during one session and consists of 10 images of each person resulting in a total
36 Available Data
of 120 images.The 10 images of each person were captured under varying facial
expressions.All images are recorded in color with a resolution of 2560 × 1920
pixels.Landmark annotations based on a 73-landmark scheme are available.
6.1.6 PIE
The Pose,Illumination and Expression (PIE) database was recorded in 2000 at
Carnegie Mellon University in Pittsburgh.The database contains images of 68
persons all recorded in one session.More than 600 images of each person were
included in the database,resulting in a total of 41368 images.The images were
captured under varying facial expressions,camera view points and illuminations.
All images are recorded in color with a resolution of 640 × 468 pixels.
6.1.7 XM2VTS
The XM2VTS multi database was recorded at the University of Surrey.The
database contains images,video and audio samples,though only the images are
described here.XM2VTS contains images of 295 people.Every person was
recorded during 4 sessions each consisting of four images per person,resulting
in a total of 4720 images.The sessions were recorded during a four month
period and captured both the frontal and the profiles of the face.All images
are recorded in color with a resolution of 720 × 576 pixels.
6.2 Data Sets Used in this Work
Three out of four data set used in this thesis are collected from face databases
and consist of two parts:facial images and landmark annotations of the facial
images.The last data set used in this thesis consists of 3D laser scans of faces.
The next sections present the four data sets.
6.2 Data Sets Used in this Work 37
6.2.1 Data Set I
Data set I consists of the entire IMMFrontal Face Database [21].In summary,
this database contains 120 images of 12 persons (10 images a person).The 10
images of a person displays varying facial expressions,see Figure 6.1.The images
have been annotated in a 73-landmark scheme,see Figure 6.2.Atechnical report
of the construction of the database can be found in Appendix A.
Figure 6.1:
An example of ten images of one person from the IMM
Frontal Face Database.The facial expressions of the images are:1-6,
neutral expression;7-8,smiling (no teeth);9-10,thinking.
6.2.2 Data Set II
Data set II consists of a subset of images fromthe AR database [41],where 50
persons (25 male and 25 female) were randomly selected.Fourteen images per
person are included in data set II,which are obtained from the two recording
sessions (seven images per person per session).The selected images were all
images in the AR database without occlusions.Data set II is as a result
composed of 700 images.Examples of the selected images of one male and one
female from the two recording session are displayed in Figure 6.3.
Since no annotated landmarks were available for all the images of the AR-
database,data set II required manually annotation using a 22-landmark scheme
38 Available Data
Figure 6.2:
The 73-landmark annotation scheme used on the IMM
Frontal Face Database.
6.2 Data Sets Used in this Work 39
Figure 6.3:
Examples of 14 images of one female and one male obtained
from the AR database.The rows of images (A,B) and (C,D) was captured
during two different sessions.The columns display:1,neutral expression;
2,smile;3,anger;4,scream;5,left light on;6,right light on;7,both side
lights on.
40 Available Data
previously used by the Face and Gesture Recognition Working group
to annotate parts of the AR database
.The 22-landmark scheme is displayed
in Figure 6.4.
Figure 6.4:
The 22-landmark annotation scheme used on the AR
Of the 13 different image variations included in the AR database only 4 have been anno-
tated by FGNET.
6.2 Data Sets Used in this Work 41
6.2.3 Data Set III
Data set III consists of all the frontal images from the XM2VTS database
[43].To summarize,8 frontal images were captured of 295 individuals during 4
sessions,resulting in data set III consisting of a total of 2360 images.Exam-
ples of the selected images of one male and one female from the four recording
session are displayed in Figure 6.5.
A 68-landmark annotation scheme is available for this data set,made in collabo-
ration between the EU FP5 projects UFACE and FGNET.However,this thesis
uses two non-public 64-landmark sets.The first set is obtained by manually an-
notation,where the second is obtained automatically by an optimized ASM[52].
Both landmark sets were created by the Computational Imaging Lab,Depart-
ment of Technology,Pompeu Fabra University,Barcelona.The 64-landmark
scheme is displayed in Figure 6.6.
6.2.4 Data Set IV
Data set IV consists of the entire 3D Face Database constructed by Karl
Skoglund [49] at the Department of Informatics and Mathematical Modelling
- Technical University of Denmark.This database includes 24 3D laser scans
of 24 individuals (including one baby) and 24 texture images corresponding
to the laser scans.Examples of five samples from data set IV are shown in
Figure 6.7.
42 Available Data
Figure 6.5:
Examples of 8 images of one female and one male obtained
from the XM2VTS database.All images are captured in a neutral expres-
6.2 Data Sets Used in this Work 43
Figure 6.6:
The 64-landmark annotation scheme used on the XM2VTS
44 Available Data
Figure 6.7:
Five samples from 3D Face Database constructed in [49].
The 3D shape and texture,3D shape and texture image is shown in the
columns respectively.
Part II
Chapter 7
Face Detection:A Survey
This chapter deals with the problem of face detection.Since the scope of this
thesis is face recognition,this chapter will serve as an introduction to already
developed algorithms for face detection.
As described earlier in Chapter 4,face detection is the necessary first step in a
face recognition system.The purpose of face detection is to localize and extract
the face region from the image background.However,since the human face is
a highly dynamic object displaying large degree of variability in appearance,
automatic face detection remains a difficult task.
The problem is complicated further by the continually changes over time of the
following parameters:
• The three-dimensional position of the face.
• Removable features,such as spectacles and beards.
• Facial expression.
• Partial occlusion of the face,e.g.by hair,scarfs and sunglasses.
• Orientation of the face.
48 Face Detection:A Survey
• Lighting conditions.
The following will distinguish between the two terms face detection and face
Definition 7.1 Face detection,the process of detecting all faces (if any) in
a given image.
Definition 7.2 Face localization,the process of localizing one face in a given
image,i.e.the image is assumed to contain one,and only one face.
More than 150 methods for face detection have been developed,though only a
small subset are addressed here.In Yang et al.[60] face detection methods are
divided into four categories:
• Knowledge-based methods:The knowledge-based methods use a set
of rules,that describe what to capture.The rules are constructed from
the intuitive human knowledge of facial components and can be simple
relations among facial features.
• Feature invariant approaches:The aimof feature invariant approaches
is to search for structural features,which are invariant to changes in pose
and lighting conditions.
• Template matching methods:Template matching methods constructs
one or several templates (models) for describing facial features.The cor-
relation between an input image and the constructed model(s) enables the
method to discriminate over the case of face or non-face.
• Appearance-based methods:Appearance-based methods use statisti-
cal analysis and machine learning to extract the relevant features of a face
to be able to discriminate between face and non-face images.The features
are composed of both the geometrical information and the photometric
The knowledge-based methods and the feature invariant approaches are mainly
used only for face localization,where as template matching methods and appear-
ance-based methods can be used for face detection as well as face localization.
7.1 Representative Work of Face Detection 49
Representative Work
Multiresolution rule-based method [57]
Feature invariant
- Facial Features
Grouping of edges [36]
- Texture
Space Gray-Level Dependence matrix of face pat-
tern [14]
- Skin Color
Mixture of Gaussian [58]
- Multiple Features
Integration of skin color,size and shape [34]
Template matching
- Predefined face templates
Shape templates [13]
- Deformable Templates
Active Shape Models [35]
Appearance-based method
- Eigenfaces & Fisherfaces
Eigenvector decomposition and clustering [54]
- Neural Network
Ensemble of neural networks and arbitration
schemes [47]
- Deformable Models
Active Appearance Models [10]
Table 7.1:
Categorization of methods for face detection within a single
7.1 Representative Work of Face Detection
Representative methods of the four categories described above are summarized
in Table 7.1 as reported in Yang et al.[60].
Only appearance-based methods are further described in this thesis since supe-
rior results seem to have been reported using these methods compared to the
other three categories.
7.2 Description of Selected Face Detection Meth-
In this section the methods of Eigenfaces,Fisherfaces,Neural Networks and
Active Appearance Models are described,though with special emphasis on the
Active Appearance Models.The Active Appearance Models show clear advan-
tages for facial recognition purposes,which will be described and used later in
this thesis.
Notice that neural networks are not restricted to appearance-based methods,but only
neural networks working on photometrical information (texture) are considered here.
50 Face Detection:A Survey
7.2.1 General Aspects of Face Detections Algorithms
Most face detection algorithms work by systemically analyzing subregions of an
image.An example of how to extract these subregions could be,to capture
a subimage of 20 × 20 pixels in the top left corner of the original image and
continuing to capture subimages in a predefined grid.All these subimages are
then evaluated using a face detection algorithm.Subsampling of the image in
a pyramid fashion enables capture of different sizes face.This is illustrated in
Figure 7.1.
Figure 7.1:
Illustration of the subsampling of an image in a pyramid
fashion.Which enables the capture of different size of faces.Besides,
rotated faces can be captured by rotating the subwindow.Courtesy of
Rowley et al.[47].
7.2.2 Eigenfaces
The Eigenface method uses PCA to construct a set of Eigenface images.Ex-
amples of Eigenface images are displayed in Figure 7.2.These Eigenfaces,can
be linear combined to reconstruct the images of the original training set.When
introducing a new image an error (ξ) can be calculated from the best image
reconstruction using the Eigenfases to the new image.If the Eigenfaces are
constructed from a large face database,the size of the error ξ can be used to
determine whether or not a newly introduced image contains a face.
7.2 Description of Selected Face Detection Methods 51
Figure 7.2:
Example of 10 Eigenfaces.Notice that Eigenface no.10
contains much noise and that the Eigenfaces are constructed fromthe shape
free images described in Section 7.2.5.
Another more robust way is to look upon the subspace
provided by the eigen-
faces,and cluster face images and non-face images in this subspace [54].
7.2.3 Fisherfaces
Much like Eigenfaces,Fisherfaces construct a subspace in which the algorithm
can discriminate between facial and non-facial images.A more in-depth descrip-
tion of FLDA,which is used by Fisherfaces,can be found in Chapter 9.
7.2.4 Neural Networks
In a neural network approach features from an image are extracted and fed to
a neural network.One huge drawback of neural networks is that they can be
extensively tuned,in terms of deciding learning methods and on the number of
One of the most significant work in neural network face detection has been done
by Rowley et al.[47,48].He used a neural network to classify images in a [−1;1]
range,where -1 and 1 denotes a non-face image and a face image,respectively.
Every image window of 20×20 pixels was divided into four 10×10 pixels,16 5×5
pixels and six 20 ×5 pixels (overlapping) sub windows.A hidden node in the
Principal Component Analysis can reduce the dimensionality of the data,described further
in Chapter 9.
52 Face Detection:A Survey
neural network was fed each of these sub windows,yielding a total of 26 hidden
nodes.A diagram of the neural network design by Rowley et al.[47] is shown
in Figure 7.3.The neural network can be improved by adding an extra neural
network to determining the rotation of an image window.This will enable the
system to capture faces not vertically aligned in the input image,see Figure 7.4.
7.2.5 Active Appearance Models
Active Appearance Models (AAM) are a generalization of the widely used Active
Shape Models (ASM).Instead of only representing the information near edges,
an AAMstatistically models all texture and shape information inside the target
model (here faces) boundary.
To build an AAM a training set has to be provided,which contains images and
landmark annotations of facial features.
The first step in building an AAM is to align the landmarks using a Procrustes
analysis [28],as displayed in Figure 7.5.Next the shape variation is modelled
by a PCA,so that any shape can be approximated by
s =¯s +Φ
where ¯s is the mean shape,Φ
is a matrix containing the t
most important
eigenvectors and b
is a vector of length t
,which contains a distinct set of
parameters describing the actual shape.The number t
of eigenvectors in Φ
the length of b
is chosen so that the model represents a user-defined proportion
of the total variance in data.To obtain the proportion of p percent variance the
value of t
can be chosen by
where λ
is the eigenvalue corresponding to the i
eigenvector and n is the total
number of non-zero eigenvalues.
The texture variation is modelled by first removing shape information by warp-
ing all face images onto the mean shape.This is called the set of shape free
images.Several methods can then be applied to eliminate global illumination
7.2 Description of Selected Face Detection Methods 53
Figure 7.3:
Diagram of the neural network developed by Rowley et al.
54 Face Detection:A Survey
Figure 7.4:
Diagramdisplaying an improved version of the neural network
in Figure 7.3.Courtesy of Rowley et al.[48].
7.2 Description of Selected Face Detection Methods 55
Figure 7.5:
Full Procrustes analysis.(a) The original landmarks,(b)
translation of the center of gravity (COG) into the mean shape COG,(c)
result of full Procrustes analysis here the mean shape is plotted in red.
variation,see e.g.Cootes et al.[10].Next,the texture variation can be modelled,
like the shape by a PCA,so that any texture can be approximated by
t =
t +Φ
t is the mean texture,Φ
is a matrix containing the t
most important
eigenvectors and b
is a vector of length t
,which contains a distinct set of
parameters describing the actual texture.t
can be chosen,like t
by Eq.7.2.
The AAM is now built by concatenating shape and texture parameters
b =



(s −¯s)
(t −

where W
is a diagonal matrix of weights between shape and texture.To remove
the correlation between shape and texture a PCA is applied to obtain
b = Φ
where c is the AAM parameters.An arbitrary new shape and texture can be
56 Face Detection:A Survey
generated by
s =¯s +Φ
c (7.6)
t =
t +Φ


The process of placing the AAM mean shape and texture on a specific location
in an image and search for a face near by this location,is shown in Figure 7.6.
This process will not be described further here.For a more detailed descrip-
tion of AAM the paper Cootes et al.[10] or the master thesis by Mikkel Bille
Stegmann [50] are recommended.
One advantage of the AAM(and ASM) algorithmcompared to other face detec-
tion algorithms is that a localized face is described both by shape and texture.
Thus,a well defined shape of the face can be obtained by an AMM.This is
an improvement from others face detection algorithms,where the result is a
sub image containing a face without knowing exactly which pixels represent
background and which represent the face.An AAMis also desirable for tracking
in video sequences,assuming that changes are minimal from frame to frame.
Due to these advantages an AAMis used as the face detection algorithm in this
thesis,when automatic detection is required.
7.2 Description of Selected Face Detection Methods 57
Figure 7.6:
Face detection (approximations) obtained by AAM,when
the model is initialized close to the face.The first column is the mean
shape and texture of the AAM.The last column is the converged shape
and texture of the AAM.Courtesy Cootes et al.[10].
58 Face Detection:A Survey
Chapter 8
Preprocessing of a Face Image
The face preprocessing step aims at normalizing,i.e.reducing the variation of
images obtained during the face detection step.Using AAM in the process of
face detection provides a well defined framework to retrieve the photometric
information as a shape free image as well as the geometric information as a
shape.Since this already has been described previously in this thesis only the
subject of light correction will be described within this chapter.
8.1 Light Correction
As described in Section 3.2,unpredictable change in lighting conditions is a
problem in facial recognition.Therefore,it is desirable to normalize the photo-
metric information in terms of light correction to optimize the facial recognition.
Here,two light correction methods are described.
8.1.1 Histogram Equalization
Histogram equalization (HE) can be used as a simple but very robust way to
obtain light correction when applied to small regions such as faces.The aim of
60 Preprocessing of a Face Image
HE is to maximize the contrast of an input image,resulting in a histogramof the
output image which is as close to a uniformhistogramas possible.However,this
does not remove the effect of a strong light source but maximizes the entropy of
an image,thus reducing the effect of differences in illumination within the same
“setup” of light sources.By doing so,HE makes facial recognition a somehow
simpler task.Two examples of HE of images can be seen in Figure 8.1.The
algorithmof HE is straight forward and will not be explained here,an interested
reader can obtain the algorithm in Finlayson et al.[22].
Image before
Image after
Pixel intensity
Pixel intensity
Image before
Image after
Pixel intensity
Pixel intensity
Figure 8.1:
Examples of histogramequalization used upon two images to
obtain standardized images with maximum entropy.Notice,only the facial
region of an image is used in the histogram equalization.
8.1.2 Removal of Specific Light Sources based on 2D Face
The removal of specific light sources based on 2D face models [56] is another
method to obtain light correlation of images.The method creates a pixelwise
correspondence of images (as already described in Section 7.2.5,the AAMshape
free image).By doing so,the effect of illumination upon each pixel x = {x,y}
of an image can be expressed by the equation
= a
8.1 Light Correction 61
where F and
F are the images of the same scene recoded at normal lighting
condition (diffuse lighting) and upon the influence of a specific light source
(illumination mode i),respectively.a
is the multiplication compensation,
and b
is the additive compensation of the illumination mode i of pixel x in
the image
Having n sets of images in the normal illumination and the mode i illumination,
Eq.8.1 can be rewritten as
G= G






If the n sets of images are of different persons,then the rows of G and
G are
independent and the least-squares solution to a
and b
in Eq.8.2 is


= (G
Using Eq.8.4 upon every pixel in the shape free image,the illumination com-
pensation images A
and B
can be constructed.By doing so it is possible
to reconstruct a face image in normal lighting conditions from a face image in
lighting condition i by
Different schemes can be used to identify the lighting condition of a specific face
image,in Xie et al.[56] a FLDA is used.
Removal of two specific illumination conditions is displayed in Figure 8.2.This
used the illumination compensation maps displayed in Figure 8.4.However,this
62 Preprocessing of a Face Image
method sometimes creates artifacts in the faces.A close-up of the illumination
corrected faces from Figure 8.2 can be seen in Figure 8.3 that displays this fact.
Figure 8.2:
Removal of specific illumination conditions from facial im-
ages.A) shows the facial images in normal diffuse lighting.B) column
1-4 and 5-8 show facial images captured under right and left illumination,
respectively.C) is the compensated images.
Figure 8.3:
A close-up of the faces reconstructed in Figure 8.2.Notice
that faces 1-4 are influenced only little by artifacts while faces 5-8 are
influenced substantially by artifacts.
8.2 Discussion
It is clear that HE is a good and robust way of normalizing images.The more
complex method of removing specific illumination conditions seems to yield
impressive results,but has the drawback of sometimes imposing artifacts onto
the images,as can be seen in Figure 8.3,where “shadows of spectacles” can
be seen on persons not wearing spectacles.It was decided to only preprocess
8.2 Discussion 63
Figure 8.4:
Illumination compensation maps used for removal of specific
illumination conditions.Rows A) and B) display the illumination compen-
sation maps for facial images captured under left and right illumination,
64 Preprocessing of a Face Image
facial images with HE to ensure that the images are independent.No tests
were performed to see how facial recognition performs under the influence of
the artifacts introduced by the removal of specific light sources based on 2D
face models.This will be saved for future work.
Chapter 9
Face Feature Extraction:
Dimensionality Reduction
Table 9.1 lists the most promising dimensionality reduction methods (feature
extraction methods) used for face recognition.Out of these Principal Compo-
nent Analysis,Fisher Linear Discriminant Analysis,Kernel Fisher Linear Dis-
criminant Analysis and Locality Preserving Projections will be described in the
Global Structure
Fisher Linear Discriminant Analysis
Principal Component Analysis
Kernel Fisher Linear Discriminant Analysis
Kernel Principal Component Analysis
Local Structure
Locality Preserving Projections
Laplacian Eigenmap
Table 9.1:
Dimensionality reduction methods.
66 Face Feature Extraction:Dimensionality Reduction Methods
9.1 Principal Component Analysis
Principal Component Analysis (PCA),also known as Karhunen-Lo`eve transfor-
mation,is a linear transformation which captures the variance of the input data.
The coordinate system in which the data resides is rotated by PCA,so that the
first-axis is parallel to the highest variance in the data (in a one-dimension pro-
jection).The remaining axes can be explained one at the time as being parallel
to the highest variance of the data,while all axes are constrained to be orthogo-
nal to all previous found axes.To summarize,the first-axis will contain highest
variance,the second-axis contain the second highest variance,etc.An exam-
ple in two dimensions is shown in Figure 9.1.PCA,which is an unsupervised
method,is a powerful tool for data analysis,especially if data resides in a space
higher than three dimensions,where graphical representations are hard.One of
the main applications of PCA is dimension reduction,with little or no loss of
data variation.This is used to remove redundancy and compress data.
Figure 9.1:
An example of PCA in two dimensions,showing the PCA
axis that maximizes the variation in the first principal component:PCA 1.
9.1.1 PCA Algorithm
Different methods can be used to calculate the PCAbasis vectors.Here eigenval-
ues and eigenvectors of the covariance matrix of the data are used.Considering
9.1 Principal Component Analysis 67
the data
X= [x
,  ,x
where n is the amount of data samples,x
is the ith data sample of dimension
d.First is the mean of X subtracted from the data
X= [x
−¯x,  ,x
The covariance matrix Σ
is calculated by
The principal axes are now given by the eigenvectors Φ
of the covariance
= Φ

0    0
0 λ
0    0 λ

is the diagonal matrix of eigenvalues corresponding to the eigenvectors of
= [φ

,  ,φ
The eigenvector corresponding to the highest eigenvalue represents the basis
vector containing the most data variance,i.e.the first principal component.
68 Face Feature Extraction:Dimensionality Reduction Methods
The ith data sample,x
,can be transformed into the PCA space by
= Φ
−¯x) = Φ
Notice that an orthogonal matrix as Φ
has the property Φ
= Φ
.Data in
the PCA space can be transformed back into the original space by
= Φ
If only a subset of the eigenvectors in Φ
is selected,then this will result in
data being projected into a PCA subspace.This can be very useful to reduce
redundancy in the data,i.e.remove all eigenvectors equal to zero.The above
method is described in greater detail in Ersbøll et al.[19].
9.1.2 Computational Issues of PCA
If one has n data samples in a d high-dimensional space where n ≪ d.Then
the computational time is quite large for retrieving eigenvectors and eigenvalues
fromthe d×d covariance matrix.The time needed for eigenvector decomposition
increases by the cube of the covariance matrix size [10].However,it is possible
to calculate the eigenvectors of the non-zero eigenvalues from a much smaller
matrix with size n ×n,by use of
X is calculated by Eq.9.2.The non-zero eigenvalues of the matrices in
Eq.9.4 and Eq.9.9 are equal
= Λ
The eigenvectors corresponding to non-zero eigenvalues can be expressed as

9.2 Fisher Linear Discriminant Analysis 69
Notice that these eigenvectors are not normalized.This can be proved by the
Eckhart-Young Theorem [50].
9.2 Fisher Linear Discriminant Analysis
Fisher Linear Discriminant Analysis (FLDA),also known as Canonical Discrim-
inant Analysis is like PCA,a linear transformation.Unlike PCA,FLDA is a
supervised method,which implies that all training-data samples must be as-
sociated (manually) with a class.FLDA maximizes the between-class variance
as well as minimizes the within-class variance.A graphic example of FLDA is
shown in Figure 9.2.
Figure 9.2:
An example of FLDA in two dimensions,showing the FLDA
axis that maximizes the separation between the classes and minimizes the
variation inside the classes.
The objective function for FLDA is as follows
70 Face Feature Extraction:Dimensionality Reduction Methods
where the between-matrix is defined as
and the within-matrix as
− ¯x
− ¯x
where x
is the j
sample in class i,¯x
mean of class i,¯x mean of all samples,
c is number of classes and n
is the number of samples in class i.
The optimal projection that maximizes the between-class variance and min-
imizes the within-class variance is given by the direction of the eigenvector
associated to the maximum eigenvalue of S
.Notice that the number of
non-zero eigenvalues is at most number of classes minus one [4].
9.2.1 FLDA in Face Recognition Problems
In face recognition problems S
is nearly always singular.This is due to the
fact that the rank of S
is at most n − c,where n (the number of training
samples) usually is much smaller than the number of pixels in each image.
In order to overcome this problem a PCA is usually performed
on the images
prior to FLDA,which removes redundancy and makes the data samples more
compact.The within-matrix,S
,is made non-singular by only considering the
f most important principal components from the PCA,where f is the number
of non-zero eigenvalues of the within-matrix S
9.3 Locality Preserving Projections
The Locality Preserving Projections (LPP) algorithm has recently been de-
veloped [29].When high-dimensional data lies on a low dimension manifold
Normally capturing between 95-99% of the variance.
9.3 Locality Preserving Projections 71
embedded in the data space,then LPP approximate the eigenfunctions of the
Laplace-Beltrami operator of the manifold.LPP aims at preserving the local
structure of the data.This is unlike PCA and FLDA,which aims at preserving
the global structure of the data.
LPP is unsupervised and performs a linear transformation.It models the man-
ifold structure by constructing an adjacency graph,which is a graph expressing
local nearness of the data.This is highly desirable for face recognition compared
to non-linear local structure preserving methods in Table 9.1,since it is signifi-
cantly less computationally expensive and more importantly it is defined in all
points and not just in the training points as Isomaps and Laplacian Eigenmaps.
The objective function of LPP is
where y
is a one-dimensional representation of the data sample x
and S
an entry in the similarity matrix S that represent the adjacency graph.The
adjacency graph weight α if notes i and j are connected can be chosen by:
• A parameter function,[t ∈ R],e.g.
α = e

• A constant,e.g.
α = 1.(9.17)
Two ways of constructing the adjacency graph is:
• ǫ-neighborhood,[ǫ ∈ R]:

< ǫ
0 otherwise
• k nearest neighbors,[k ∈ N]:

α,if x
is among the k nearest neighbors of x
is among the k nearest neighbors of x
0 otherwise
72 Face Feature Extraction:Dimensionality Reduction Methods
The similarity matrix will inflict heavy penalties on the objective function in
Eq.9.15 if neighboring points x
and x
are mapped far apart in the output
space.By minimizing the objective function LPP tries to ensure that y
are close in the output space if x