Facial Component Extraction and Face Recognition with Support Vector Machines

gaybayberryAI and Robotics

Nov 17, 2013 (3 years and 10 months ago)

65 views

Facial Component Extraction and Face Recognition
with Support Vector Machines

Dihua Xi,Igor T.Podolak,and Seong-Whan Lee
Center for Articial Vision Research,Korea University
Anam-dong,Seongbuk-ku,Seoul 136-701,Korea
fdhxi,uipodola,swleeg@image.korea.ac.kr
Abstract
A method for face recognition is proposed which
uses a two-step approach:rst a number of facial
components are found,which are then glued together,
and the resulting face vector is recognized as rep-
resenting one of the possible persons.During the
extraction step,a wavelet statistics subsystem provides
the possible locations of eyes and mouth which are used
by the Support Vector Machine (SVM) subsystem to
extract facial components.The use of wavelet statistics
subsystem speeds up the recognition process markedly.
Both the feature detection SVMs and wavelet statistics
are trained on a small number of actual images with
features marked.Afterwards,a large number of face
vectors are constructed,which are then classied with
another set of SVM machines.
1.Introduction
Face recognition emerged as an important compo-
nent of pattern recognition with a vast number of pos-
sible applications.A number of dierent approaches
are used to tackle this problem,including eigenfaces,
PCA,neural networks,and support vector machines
[6,1,7].Recently,the SVM method has also been
applied to the face authentication [9].
Two basic approaches (while using SVMs) to face
recognition are possible:either the whole image is rec-
ognized (as in [1,3]),or some selected features are
extracted rst,and the recognition follows.We have
decided to pursue this component-based approach [2],

This research was supported by Creative Research Initia-
tives of the Ministry of Science and Technology,Korea.Current
address of Dr.I.T.Podolak:Institute of Computer Science,
Jagiellonian University,Krakow,Poland.
since the whole image method suers greatly from any
shifts of the image.
Basically,a set of selected features (like eyes,mouth,
nose,etc.) is extracted rst,then by concatenating
features a face-vector is built which is eventually rec-
ognized.In our system all of the features are extracted
using SVMs specially trained for each task.As the
feature extraction using only SVMs,requiring the use
of a sliding window moving across the whole image
and would be very slow,we decide to use a number
of other methods.First,a wavelet detection system is
employed to give hints as to where both eyes and the
mouth are.The area to check with SVMs is therefore
much reduced.Additionally,a face geometry is dened
in fuzzy logic terms,which makes it possible to predict
the most probable locations of other features (eg.nose,
nose-bridge),again reducing the search area.As a set
of candidates are found,the same face geometry is used
to construct face-vectors and select these with the best
fuzzy membership values.
Another set of SVMs are trained to perform the ac-
tual recognition of persons with face-vectors as exam-
ples.During recognition the same track is followed:
feature extraction,then face-vector composition and
recognition.
2.Component Location with Wavelet
Statistics System
A fast and stable algorithm used to search for the
proper candidates for the eyes and mouth in a face
image is very important for the face recognition system
using SVM.In this section,a novel approach based on
wavelet and statistics is described.The multiresolution
wavelet is used to decompose a face image into sub-
images.A facial model based on modied Bookstein
coordinate is constructed which is scale independent.
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR02)
0-7695-1602-5/02 $17.00 © 2002 IEEE
Experimental results proved this approach to be very
fast and able to nd correct candidates.
2.1.Facial coordinate system
The facial coordinate system based on Bookstein's
is used to describe the geometric shape of a face.Two
categories of coordinates (shown in right-bottom of
Figure 1) are contained and used to indicate the lo-
cation and shape of each component respectively.The
main coordinate system is used to describe the centers
of brows,eyes and mouth.Its origin is set to the center
of left eye,and the distance of the left and right eyes is
set to unity.For a face almost frontal and upright,the
distance between two eyes can be approximated with
their horizontal distance.Then the coordinate of the
right eye is (1;y
B
re
).
LH
4
LH
3
LH
2
LH
1
HL
4
HL
3
HL
2
HL
1
Figure 1.Main and componential coordinates
used to dene facial geometry shape.
Suppose the screen coordinates of the centers of
left and right eyes are (x
le
;y
le
) and (x
re
;y
re
),then
their corresponding facial coordinates are (0;0) and
(1;
y
re
y
el
x
re
x
le
) respectively.Therefore,the centers of all
ve components can be described by a 7 dimensional
vector (y
B
re
;x
B
lb
;y
B
lb
;x
B
rb
;y
B
rb
;x
B
m
;y
B
m
).
The shape of a facial component is dened by sev-
eral feature points.For each component,we dene a
componential coordinate system whose origin is set to
the componential center and whose unity is set to equal
to the unity of the main coordinate.The componential
coordinate of the mouth is shown in Figure 1 by the
coordinate x
m
s
-y
m
s
.
With this coordinate system used,the facial shape
coordinates will remain unchanged even when the size
of the image or the face rectangle is changed.The
use of componential coordinates for each component
makes estimation and comparison of components from
dierent images possible.The whole exact facial shape
can be estimated from the x distance between the left
and right eyes.
2.2.Image decomposition
Using multiresolution wavelet [5],an image can be
decomposed into a sequence of sub-images which in-
clude dierent frequency information corresponding to
dierent directions at dierent scales [4].Suppose
f(x;y) is an image,let f
0
LL
(x;y) = f(x;y),then
f
n
LL
(x;y) = f
n+1
LL
(x;y) f
n+1
LH
(x;y)
f
n+1
HL
(x;y) f
n+1
HH
(x;y):
(1)
The LH and HL sub images which indicate the com-
ponents of horizontal and vertical information of an
image are used in our research.Figure 1 gives an ex-
ample of the decomposition of an image.The route of
the image decomposition is shown in Figure 2.Notice
that the width and height of a sub image at level n are
half of that at level n 1.
f(x;y) f
1
LL
(x;y) f
2
LL
(x;y)    f
N
LL
(x;y)
f
1
LH
(x;y) f
2
LH
(x;y)    f
N
LH
(x;y)
f
1
HL
(x;y) f
2
HL
(x;y)    f
N
HL
(x;y)
-
-
-
-
@
@R
@
@R
@
@R
@
@R
A
A
AU
A
A
AU
A
A
AU
A
A
AU
level 0 level 1 level 2 level N
Figure 2.Route of wavelet decomposition.
2.3.Feature vector construction
In previous sections,we have shown that the feature
of a point can be described by its facial coordinates and
responses in LH and HL sub images at all levels.Sup-
pose the decomposition level is up to N,then the vector
(x
B
;y
B
;f
1
LH
(x;y);f
1
HL
(x;y);  ;f
N
LH
(x;y);f
N
HL
(x;y)),
indicates the feature of a point (x;y),it is named
feature vector.To reduce the feature vector di-
mension,let f
i
(x;y) = f
i
LH
(x;y) + f
i
HL
(x;y),
and it is possible to represent the vector with
(x
B
;y
B
;f
1
(x;y);  ;f
N
(x;y)).To describe the whole
facial model,it is still required that the center of each
component can be described only by its coordinate
(x
c
i
;y
c
i
);(i = 1;  ;N).
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR02)
0-7695-1602-5/02 $17.00 © 2002 IEEE
In summary,a face model can be dened by x
i
=
(x
c
i
;y
c
i
) and v
i
= (v
1
i
;  ;v
N
i
i
),where x
i
is the center
of component i and v
j
i
is the feature vector of j
th
fea-
ture point of component i,N
i
is the number of feature
points of component i.We called the production of
fx
i
g and fv
j
i
g;(i = 1;  ;5;j = 1;  ;N
i
) normal-
ization of a facial image.
2.4.Statistical face model and training
For training the model,400 facial images (512512)
including feature points for all ve facial components
(screen coordinate),including both brows,eyes,and
mouth,are included in our database.First,all N facial
images are initialized to estimate their corresponding
facial shape models.The mean face is produced by
calculating the mean of each componential center and
all feature points for each component.Let f

x
i
g and
f

v
j
i
g be the mean of all faces.Then,we can estimate
the variable rectangle for the center of each compo-
nent.Actually,only the y coordinates of the right eye,
mouth,and both brows need to be processed.The sta-
tistical face model contains of the mean face and rect-
angle of each component used to constrain its moving
area.This model will be used to search for the loca-
tions of eyes and mouth.
2.5.Searching for eyes and mouth candidates
In this section,we are going to introduce the fast
algorithm to search for candidates for eyes and mouth
of any input facial image.
Using the algorithm of section 2.2,an input image
is rst decomposed into a sequence of sub images at
all levels (we constrain both the width and height of
smallest sub image to be no less than 32,otherwise it
will be too small to be recognized).The f
LH
+ f
HL
of sub images at each level is used for searching by the
model.
To match a facial component in a sub image,the
modied cross correlation (MCC) of two point sets,
(x
i
;y
i
) and (x
0
i
;y
0
i
)(i = 1;  ;N),is used.Suppose
a
i
= f(x
i
;y
i
);b
i
= f
0
(x
0
i
;y
0
i
)),the MCC is calculated
by
N
X
i=1
(a
0
i
a)(b
0
i


b);(2)
where
a
0
i
= a
i
=
N
X
i=1
a
i
;b
0
i
= b
i
=
N
X
i=1
b
i
;a =
1
N
N
X
i=1
a
0
i
;

b =
1
N
N
X
i=1
b
0
i
:
Two main ideas,coarse to ne graining and high to
low level sub images,are used to design a fast eyes and
mouth searching algorithm.The eyes and brows don't
need to be distinguished in the highest level sub image,
because the sub image is too small.Since the eye is
prone to be located erroneously near the brows,we use
models of both eyes and brows to distinguish them at
the next level (when sub image size is near 64  64).
At the smallest level,it is not dicult to search for
candidates in the whole image using little computing
time.The candidates are adjusted for better matching
by moving lightly the center of each component in the
higher level.Therefore,the system is very fast.
The algorithmof searching of candidates of eyes and
mouth is listed below.Experimental results show no
missing locations when 5 candidates used.And over
80% of test images are correctly located at the rst
candidate.
Begin
for level from the highest N to lowest 1
 Set moving rectangle RECT of left eye:the
whole or the left-top quarter of the sub image
at at the highest level N,but moving lightly
to the position decided by the last level
 Read the sub image f
LH
+f
HL
at level
for the left eye moving in RECT
for varying the distance of left and right
eyes:big at level N but very small for
others
 Produce all the feature points using
the mean facial model
 Estimate the matching MCCs of the
model of left and right eyes
for varying the mouth position lightly
in both x and y
 Estimate the matching MCC of
mouth
 Calculate the sum of MCCs of all
three components (eyes and mouth)
 Compare the sum with the stored
sums and only keep 5 largest
endfor
endfor
if the distance of left and right eye is greater
than that in the mean facial model at
level 3 when sub-image size is 64 64
for slightly change the vertical distance
of eyes and brows
 Distinguish the eyes and brows by
moving the eyes vertically and max-
imize the MCCof all brows and eyes
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR02)
0-7695-1602-5/02 $17.00 © 2002 IEEE
endfor
endif
endfor
End
Output:ve candidates of positions of eyes and
mouth
Since all of the search consists only of complete pro-
cessing at smallest sub image and trivial adjustment
needed at other levels,the algorithm can run very fast.
In fact,it needs less than 0:1 second for a 512  512
input image.
3.Face Recognition using SVM
Support Vector Machine is an implementation of
structural risk minimization principle,developed by
Vapnik et al.[10],whose object is to minimize the upper
bound on the generalization error.
In the case of a linearly separable two class problem,
with examples f(x
i
;y
i
)g
l
i=1
;x
i
2 R;y
i
2 f1;+1g,
the algorithm maximizes the margin of error which
is the perpendicular distance from the separating hy-
perplane w:x  w + b = 0 from the nearest positive
example plus the distance from the nearest negative
example.We expect better generalization capability
from a hyperplane with margin maximized.In order
to achieve this goal kwk
2
is minimized subject to a set
of constraints
y
i
(x
i
 w+b) 1  0;i = 1;  ;l (3)
i.e.each example is classied by a distance from the
separating hyperplane of at least 1.
This is done by reformulating the problem in terms
of positive Lagrange multipliers 
i
:the following Lan-
grangian has to be minimized
L
P
=
1
2
kwk
2

l
X
i=1

i
(x
i
 w+b) +
l
X
i=1

i
(4)
subject to constraints (3).Minimization of L
P
(the pri-
mal problem) can be expressed as (the dual problem)
the maximization of
L
D
=
l
X
i=1

i

1
2
l
X
i=1
l
X
j=1

i

j
y
i
y
j
x
i
 x
j
(5)
subject to constraints
P

i
y
i
= 0 and 
i
> 0.In the
solution only a small number of 
i
coecients dier
from zero.As each 
i
corresponds to one data point,
these are the only data included in the solution.These
will be the support vectors { points lying on the margin
border.
For problems not linearly separable,a mapping of
the input space into a high dimensional space x 2
R
l
7!(x) 2 R
h
is needed,which gives a much
higher probability that the mapped points will be lin-
early separable.The important element of the dual
problem specication is that the data points do not
appear by themselves,but rather as dot products of
all pairs.This makes it possible,instead of an explicit
choice of the feature space and explicit mapping,to
use a kernel function K(a;b) { a positive symmetric
function,a scalar product in the feature space

T
(x
(1)
)(x
(2)
) =
X

i
(x
(1)
i
)(x
(2)
i
) = K(x
(1)
;x
(2)
)
(6)
When the dot-product in (5) is substituted with this
kernel function,we obtain
L
D
=
l
X
i=1

i

1
2
l
X
i=1
l
X
j=1

i

j
y
i
y
j
K(x
i
;x
j
) (7)
The Gaussian RBF,polynomial or hyperbolic tan-
gent kernel functions can only be used as kernel func-
tions.The solution to the classication is therefore
given by the sign of the function
f(x) =
N
SV
X
i=1

i
y
i
K(s
i
;x) +b (8)
where s
i
are the support vectors (N
SV
in total) { data
points with non-zero Lagrange multipliers.For the ex-
ample x to be classied it is only needed to compute
the sign of f(x).
3.1.Feature extraction
During the processing of the images,the features
are extracted rst.This is done with a set of spe-
cially trained SVMs which utilize feature location in-
formation provided by the wavelet statistics subsystem.
In order to train SVMs,we have selected 32 face im-
ages out of the database (we have used the AT&T face
database) and hand marked the features.For each of
them an SVM was trained.A bootstrapping method
was employed:rst an SVMwas trained on a small set
of examples,then other examples were checked with
those badly recognized being added to the training set.
The SVM was re-trained after every few hundred new
examples were added.Training using all examples was
prohibitively long.Actually,a method of training rst
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR02)
0-7695-1602-5/02 $17.00 © 2002 IEEE
with window skipping 7 pixels both horizontally and
vertically,and then re-learning with a skip of 4 proved
to be quickest with good generalization results too.
We used a conglomerate of 2 SVMs for each fea-
ture detector,with machines trained on the same set,
but employing dierent kernel functions.This enabled
us to enhance the generalization capabilities of the de-
tector greatly.The second SVM was used only when
the rst could detect any candidates for the feature
(or less than a set minimum).This made it possible
to use the SVMs with smaller number of support vec-
tors rst (ie.faster).Actually the secondary machines
were used rarely,in less than 10% cases.Both polyno-
mial (of degrees 2 and 3) and linear kernels were used.
The generalization capabilities for SVMfeature extrac-
tors reached 95{97.5%(for dierent features and kernel
functions).
The features are extracted one by one,rst both
of the eyes and the mouth,starting at locations pre-
dicted by the wavelet statistics system.To provide for
a possible prediction error,the wavelet statistics sys-
tem gives a ranked list of hints for each feature,which
are checked in order,or composed together if laying
nearby.The use of wavelet statistics subsystem speeds
up the whole recognition.
Other feature positions are predicted,basing on fea-
tures already found,with a use of a face geometry de-
ned in fuzzy terms:we dened fuzzy functions for the
expected distance between eyes,eyes and the nose,nose
and the mouth,inclusion of the nose in the triangle
formed by the eyes and the mouth.These conditions
are dened as trapezoidal functions.
3.2.Face­vector construction and recognition
After the extraction of candidates for each of the
features the face-vectors are constructed.This is done
again with the help of the face geometry:only those
feature conglomerates are selected which fulll all the
criteria dened,ie.all membership functions have pos-
itive values.Then a number of these with highest mem-
bership value (a minimumof all separate values - fuzzy
AND function) are selected;see Figure 3.
As the SVMis a binary classier,one SVMmachine
was built for each person in the database,detecting
that person among the others.300 best face-vectors
were used for each image during training.The use of a
high number of examples makes up for small variations
during features extraction,eg.an eye can be detected
together with an eyebrow,or without.
During recognition the same path is followed:rst
the features have to be extracted,then the resulting
face-vectors are recognized using the trained SVMs.
Figure 3.Some faces and face­vectors ex­
tracted.In some,subtle differences among
face vectors extracted fromthe same images
can be seen.
Again,a number of face-vectors with best fuzzy ge-
ometry membership values are used.If a face-vector is
positively recognized by more than one machine,then
the one with highest activation (margin of error) is se-
lected,and after all are tested,the person with most
detections is considered to be recognized in the image.
4.Experimental Results and Conclusion
In our experiments,we have used the AT&T (for-
merly ORL) face database [8],which consists of 400
frontal images of 40 persons (10 images each).Images
are 92x112 pixels in 256 grey levels.The feature SVM
detectors were trained on randomly selected 32 images
with hand marked features.Then,the data set was
divided into 2 equal components:200 images in the
training set (5 images of each person),and the rest in
the test set.
Our database used for training the wavelet statistics
model comprises of 400 Asian face images (200 male
and 200 female).All the images are 512 512 frontal,
upright and the feature points being manually pointed
for all ve components (eyes,brows and mouth).The
Daubechies wavelet is selected for the image wavelet
decomposition.Then,the algorithm from section 2.5
is applied to the AT&T database to search for the can-
didates of eyes and mouth.For all the 400 images,the
number of correctly located is 325,42,21,9 and 3 at
the rst to fth candidates.In this experiment,our al-
gorithm was applied to dierent people in various sizes
with poses changing.The algorithm is very fast (less
than 0:1 second for each image) with good results being
obtained.
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR02)
0-7695-1602-5/02 $17.00 © 2002 IEEE
left eye
right eye
nose-bridge
mouth
nose
95:0%
91:8%
97:0%
95:4%
96:3%
95:6%
95:8%
97:3%
95:9%
95:7%
96:0%
96:0%
97:5%
96:1%
96:5%
Table 1.Feature extraction generalization
rates for linear,polynomial with degree 2 and
polynomial with degree 3 kernels.
The feature SVM extractors were trained with a
bootstrapping method as described before,using poly-
nomial kernel of degrees 1,2,and 3,which achieved
generalizations of 95{97.5%,see Table 1.As a second
SVM was used in case the rst could not nd a fea-
ture (and judging from the detection of other features
it should be there),and the SVMpairs were selected on
the basis of minimal correlation of classications,the
actual generalization rates are even higher.Features
were extracted at 5 dierent scales.The combination of
SVM extractors,face geometry during extraction and
face-vector construction resulted in perfect face vectors
for all the images.
For the actual recognition 40 SVMs had to be
trained - each to recognize one person against all other;
this is due to the fact that SVM is a bi-class classier.
The results for the test set for polynomial kernel with
degree 2,a linear one,linear with extended features are
shown in Table 2 compared with the whole image ap-
proach on the same data,which are lower and suered
greatly when the face in the image was shifted,which
is not the case for component-based approach.Us-
ing a complex SOM{convolutional network,Lawrence
et al.have achieved a 94:25% correct hit rate on the
same data set [3].The extended in Table 2 stands for
a system where feature sizes were extended - the same
features as before were extracted,but larger areas were
included in the face-vector.In that mode,components
of the cheek were included too,without the need to
specially extract them.This,and the addition of other
features will help in producing better recognition rates.
Dierent kernels were used,but the linear (ie.poly-
nomial with degree 1) achieved the best generalization
results which hints,that the face recognition problem
with a component based approach becomes linearly
separable.The linear SVMs had the smallest number
of support vectors,which is also important for speed.
The system is still slow.But the addition of the
wavelet subsystem gives at least a twofold increase in
speed,cutting down the time needed for face compo-
nents extraction,as well as time needed to build viable
face vectors when many features combinations fullled
face geometry constraints.
correctly
badly
not
recognized
recognized
recognized
polynomial
84:0%
3:0%
13:0%
linear
89:0%
1:5%
9:5%
extended
91:0%
2:5%
6:5%
whole image
83:5%
0:0%
16:5%
Table 2.Some recognition results on the test
set.The linear kernel achieved better results
than the polynomial one,and much better
than the whole image approach.not recog-
nized stands for images not recognized by any
of the machines.
References
[1] R.Brunelli and T.Poggio.Face recognition:Features
versus templates.IEEE Trans.on Pattern Analysis
and Machine Intelligence,15(10):1042{1052,October
1993.
[2] B.Heisele,M.Pontil,and T.Poggio.Component-
based face detection in still grey images.Technical
Report A.I.Memo 1687,MIT Articial Intelligence
Laboratory,2000.
[3] S.Lawrence,C.L.Giles,A.T.Choi,and A.D.Black.
Face recognition:A convolutional neural network ap-
proach.IEEE Trans.on Neural Networks,8:98{113,
January 1997.
[4] S.Mallat.A theory for multiresolution signal decom-
position:the wavelet representation.IEEE Trans.on
Pattern Analysis and Machine Intelligence,11(7):674{
693,July 1989.
[5] S.Mallat.A Wavelet Tour of Signal Processing.Aca-
demic Press,New York,1998.
[6] H.Moon and P.J.Philips.Analysis of PCA-based face
recognition algorithms.In K.W.Bowyer and P.J.
Philips,editors,Empirical Evaluation Techniques in
Computer Vision,pages 57{71.IEEE Computer Soci-
ety Press,Los Alamitos,CA,1998.
[7] P.J.Philips.Support vector machines applied to face
recognition.Technical Report NISTR 6241,National
Institute of Standards and Technology,Geithesburg,
1998.
[8] F.S.Samaria and A.C.Harter.Parameterisation
of a stochastic model for human face identication.In
Preceedings of 2nd IEEE Workshop on Applications of
Computer Vision,pages 138{142,Sarasota,FL,USA,
5-7 December 1994.
[9] A.Tefas,C.Kotropoulos,and I.Pitas.Using support
vector machines to enhence the performance of elastic
graph matching for frontial face authentication.IEEE
Trans.on Pattern Analysis and Machine Intelligence,
23(7):735{746,July 2001.
[10] V.Vapnik.Statistical Learning Theory.John Wiley
& Sons,New York,1998.
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR02)
0-7695-1602-5/02 $17.00 © 2002 IEEE