Support Vector Machines
in Face Detection Systems
Mihaela ROMANCA
Group 245
Abstract:
Automatic facial feature localization has been a long

standing challenge in the field of
computer vision for several decades. This can be
explained by the large variation a face
in an image can have due to factors such as position, facial expression, pose, illumination,
and background clutter. If the problem of face detection was firstly studied using the
geometrical based measurements, the
learning based algorithms are more and more
popular in the image processing domain. SVM approaches to facial feature detection
typically present the features extraction from images and the learning of the SVM
parameters. There are different algorithms pro
posed in the existing literature, and in this
paper three approaches are presented: the first one is based on a multiclass SVM
algorithm, analyzing the image using multiple 2

class SVM. The second paper
approaches an edge case of facial detection when the
face is not entirely visible. The last
proposed algorithm describes a system for detecting the face using the SVM algorithm
combined with another popular technique in image processing: PCA.
1
Introduction
Face recognition technology can be used in wide ran
ge of applications such as identity
authentication, access control, and surveillance and security. Interests and research
activities in face recognition have increased significantly over the past few years and
SVM is a widely used technique in approaching
the face detection algorithm.
However, building a face detection system has been a challenge, irrespective to the
method used, due to various changes in the face image. Variation cannot occur only when
changing the face identity but also when the same fac
e changes position or the light
conditions in the analyzed images are not constant. Because of all these, all face detection
systems until present are compromising between two main features: performance and
robustness.
For face detection two issues are cen
tral

the first is what features to use to represent a
face. A face image subjects to changes in viewpoint, illumination, and expression. An
effective representation should be able to deal with possible changes. The second is how
to classify a new face im
age using the chosen representation. In geometric feature

based
methods, facial features such as eyes, nose, mouth, and chin are detected. Properties and
relations such as areas, distances, and angles, between the features are used as the
descriptors of fa
ces. Although being economical and efficient in achieving data reduction
and insensitive to variations in illumination and viewpoint, these methods are highly
dependent on the extraction and measurement of the detected face, so the systems need to
be conti
nuously recalibrated for new faces or for ranging distances between human and
the point where the analyzed images are taken.
In contrast, template matching and neural methods generally operate directly on an
image

based representation of faces

pixel int
ensity array. Because the detection and
measurement of geometric facial features are not required, these methods are more
practical and easy to be implemented as compared to geometric feature

based methods.
2
General Information
The main idea of the SVM a
lgorithm [2] is that given a set of points which belong to one
of the two classes, it is needed an optimal way to separate the two classes by a hyperplane
as seen in the below figure. This is done by:
maximizing the distance (from closest points) of either
class to the
separating hyperplane
minimizing the risk of misclassifying the training samples and the unseen
test samples.
Optimal Separating Hyperplane
Depending on the way the given points are separated into the two available classes, the
SVMs can
be:
Linear SVM
Non

Linear SVM
Linearly separable data
Non

linearly separable data
Linearly and non

linearly separable data
Linear SVMs
Let
be a set of points
with
. Each point
belongs to either of two
classes, with label
. The set
is linear separable if there are
and
such that
The pair
defines the hyperplane equation
, named the separating
hyperplane.
The signed dist
ance
of a point
to the separating hyperplane
is given by:
From (4.23) and (4.24) it follows that:
therefore
is the lower bound on the distance between points
and the separating
hyperplane
.
Given a linearly separable set S, th
e optimal separating hyperplane is the separating
hyperplane for which the distance to the closest (either positive or negative)
points in S is maximum, therefore it maximizes
.
Optimal separating hyperplane
Non

Linea
r SVMs
The only way the data points appear in the dual form of the training problem is in the
form of dot products
. Even if in the given space the points are non

linear
separated, in a higher dimensional space, it is very
likely that a linear separator can be
constructed.
So the solution is to map the data points from the input space
into some space of
higher dimension
(n > d) using a function
. Then the training algorithm
will depend only on dot products of the form
.
Constructing (via
) a separating hyperplane with maximum margin in the higher

dimension
al space yields a nonlinear decision boundary in the input space.
Because the dot is computationally expensive kernel function are used. A kernel function
such that
is used in the t
raining algorithm.
All the previous derivations in the model of linear SVM are still viable by replacing the
dot with the kernel function, since a linear separation is still done, but in a different
space.
The classes for Kernel Functions used in SVM are:
Polynomial:
RBF (Rad
ial Basis Function):
Sigmoide:
The kernel functions require calculations in
, ther
efore they are not
difficult to compute. It remains to determine which kernel function
can be associated
with a given (redescription space) function
.
Decision surface by a polynomial
classifier
However, in practice, one proceeds vice versa: kernel functions are tested about
which is know to correspond to the dot product in a certain space (which will work as
redescription space, never made explicit). Therefore, the user operates by “
trial and
error”. An advantages that the only parameters needed when training an SVM are the
kernel function K.
Decision surface by a RBF classifier
3
SVM in Face Detection
SVM with a binary tree classification strategy
The first analyzed approa
ch proposes a face detection system using linear support vector
machines with a binary tree classification strategy. The result of this technique are very
good as the authors conclude: “The experimental results show that the SVMs are a better
learning algo
rithm than the nearest center approach for face recognition.” [1]
General information subsection describes the basic theory of SVM for two class
classification. A multi

class pattern recognition system can be obtained by combining
two class SVMs. Usually t
here are two schemes for this purpose. One is the one

against

all strategy to classify between each class and all the remaining; The other is the one

against

one strategy to classify between each pair. While the former often leads to
ambiguous classificati
on, the latter one was used for the presented face recognition
system.
A bottom

up binary tree for classification is proposed to be constructed as follows:
suppose there are eight classes in the data set, the decision tree is shown in the figure
below wher
e the numbers 1

8 encode the classes. By comparison between each pair, one
class number is chosen representing the “winner” of the current two classes. The selected
classes (from the lowest level of the binary tree) will come to the upper level for another
round of tests. Finally, the unique class will appear on the top of the tree.
The bottom

up binary tree used for classification
Denote the number of classes as c, the SVMs learn
discrimination functions in
the training
stage, and carry out comparisons of
times under the fixed binary tree
structure. If
does not equal to the power of 2, we can decompose
as
where
. Because any natural
number (even or odd) can be decomposed into finite positive integers which are the
power of 2. If
is odd,
, and if
is even
. It can be noticed that the
decomposition is not unique, but the number of comparisons in the test stage is always
.
Face Detec
tion and Recognition with Occlusions
The next approach analyzes a more edge case, where the face is not entirely present in the
analyzed image. So it is searched to derive a criterion for SVM that can be employed in
the three cases defined in the figure b
elow: not occluded, mixed and occluded. The
classical criteria of SVM cannot be applied to any of the three cases, because SVM
assumes all the features are visible. So a new algorithm is implemented named by the
authors Partial Support Vector Machines (PSV
M) to distinguish it from the standard
criteria used in SVM. [3]
Occlusion cases taken into account
The goal of PSVM is similar to that of the standard SVM
–
to look for a hyperplane that
separate the samples of any two classes as much as possible. In c
ontrast with traditional
SVM, in PSVM the separating hyperplane will also be constrained by the incomplete
data. In the proposed PSVM, the set of all possible values for the missing entries of the
incomplete training sample are treated as an affine space i
n the feature space such that a
criterion which minimizes the probability of overlap between this affine space and the
separating hyperplane is designed. To model this, the angle between the affine space and
the hyperplane in the formulation is incorporate
d. The resulting objective function is
shown to have a global optimal solution under mild conditions, which require that the
convex region defined by the derived criterion is close to the origin. Experimental results
demonstrate that the proposed PSVM appr
oach provides superior classification
performances than those defined in the literature.
PCA and SVM for Face Detection
In this paper [5], a novel method is proposed for eliminating most of the non

face area in
gray images, so that the detection time is
shortened and the detection accuracy is
improved. Face area has different pixel character with most of the non

face area. By
analyzing histogram distributions, it shows face and non

face area have different
histogram distribution. The histogram of face are
as has Gaussian

like distribution but
non

face area histogram has irregular distribution. According to the histogram
distribution feature, the face potential area can be chosen.
If the histogram is distributed in a small range, its mean value is a high v
alue and if the
histogram distribution is in a wide range, it has a small mean value.
The histogram of face image is a Gaussian

like distribution; the mean value is an
intermediate value. By a number of tests, the histogram mean
value of face potential are
a is chosen in a fixed range. So if the mean value of a sample
area is falling in that range, this sample area is selected as a face potential area.
Otherwise, it is filtered as non

face area.
Furthermore, for face detection, an algorithm combining PCV and
SVM is used, it is a
coarse

to

fine process to detect face region.
The processing consists of three steps:
Step 1: face potential is selected using histogram distribution feature. Face and
non

face area have different histogram distribution. The histogram
of face areas
has Gaussian

like distribution but non

face area histogram has irregular
distribution.
Step 2: PCA is used to decrease the dimension of face feature space. At this step,
1000 sample images of size 19×19 are trained
Step 3: SVM is used as cla
ssifier to verify face candidate. It is trained by face and
non

face samples which are represented by PCA.
4
Conclusions
Although the SVM has the disadvantage of the need to choose the type of kernel, because
it can be difficult to justify quantitatively
the choice of a specific kernel for a specific
application, its advantages have made SVM a preferred method in face detection
applications. Based on SVM real time applications for face detection have been made, in
which the image is pre

processed to emphas
ize the features and no special illumination
and headgear are needed. According to the criteria of the minimization for the structural
risk of SVM, the errors between sample

data and model data are minimized and the
upper bound for predicting error of the
model is also decreased simultaneously. The
simulation results of this method show that higher processing speed, better correct
recognition rate, and improved generalization are obtained.
5
References
[1] Guodong Guo, Stan Z. Li, Kapluk
Chan, “Face Recognition by Support Vector
Machines”, School of Electrical and Electronic Engineering Nanyang Technological
University, Singapore 639798
[2] Ming

Hsuan Yang, Antoine Cornuejols. Introduction to Support Vector Machines
[3] Hongjun Jia, Ale
ix M. Martinez, “Support Vector Machines in Face Recognition with
Occlusions”, The Department of Electrical and Computer Engineering The Ohio State
University, Columbus, OH 43210, USA
[5]
Jing Zhang
,
Xue

dong Zhang
,
Seok

wun Ha, “A Novel Approach Using P
CA and
SVM for Face Detection”, Fourth International Conference on Natural Computation 2008
IEEE
Comments 0
Log in to post a comment