Support Vector Machines in Face Detection Systems

chardfriendlyAI and Robotics

Oct 16, 2013 (4 years and 8 months ago)


Support Vector Machines
in Face Detection Systems


Group 245


Automatic facial feature localization has been a long
standing challenge in the field of
computer vision for several decades. This can be

explained by the large variation a face
in an image can have due to factors such as position, facial expression, pose, illumination,
and background clutter. If the problem of face detection was firstly studied using the
geometrical based measurements, the

learning based algorithms are more and more
popular in the image processing domain. SVM approaches to facial feature detection
typically present the features extraction from images and the learning of the SVM
parameters. There are different algorithms pro
posed in the existing literature, and in this
paper three approaches are presented: the first one is based on a multiclass SVM
algorithm, analyzing the image using multiple 2
class SVM. The second paper
approaches an edge case of facial detection when the
face is not entirely visible. The last
proposed algorithm describes a system for detecting the face using the SVM algorithm
combined with another popular technique in image processing: PCA.



Face recognition technology can be used in wide ran
ge of applications such as identity
authentication, access control, and surveillance and security. Interests and research
activities in face recognition have increased significantly over the past few years and
SVM is a widely used technique in approaching
the face detection algorithm.

However, building a face detection system has been a challenge, irrespective to the
method used, due to various changes in the face image. Variation cannot occur only when
changing the face identity but also when the same fac
e changes position or the light
conditions in the analyzed images are not constant. Because of all these, all face detection
systems until present are compromising between two main features: performance and

For face detection two issues are cen

the first is what features to use to represent a
face. A face image subjects to changes in viewpoint, illumination, and expression. An
effective representation should be able to deal with possible changes. The second is how
to classify a new face im
age using the chosen representation. In geometric feature
methods, facial features such as eyes, nose, mouth, and chin are detected. Properties and
relations such as areas, distances, and angles, between the features are used as the
descriptors of fa
ces. Although being economical and efficient in achieving data reduction
and insensitive to variations in illumination and viewpoint, these methods are highly
dependent on the extraction and measurement of the detected face, so the systems need to
be conti
nuously recalibrated for new faces or for ranging distances between human and
the point where the analyzed images are taken.

In contrast, template matching and neural methods generally operate directly on an
based representation of faces

pixel int
ensity array. Because the detection and
measurement of geometric facial features are not required, these methods are more
practical and easy to be implemented as compared to geometric feature
based methods.


General Information

The main idea of the SVM a
lgorithm [2] is that given a set of points which belong to one
of the two classes, it is needed an optimal way to separate the two classes by a hyperplane
as seen in the below figure. This is done by:

maximizing the distance (from closest points) of either

class to the
separating hyperplane

minimizing the risk of misclassifying the training samples and the unseen
test samples.

Optimal Separating Hyperplane

Depending on the way the given points are separated into the two available classes, the
SVMs can

Linear SVM

Linear SVM

Linearly separable data

linearly separable data

Linearly and non
linearly separable data

Linear SVMs

be a set of points

. Each point
belongs to either of two
classes, with label
. The set

is linear separable if there are

such that

The pair

defines the hyperplane equation
, named the separating

The signed dist
of a point

to the separating hyperplane

is given by:

From (4.23) and (4.24) it follows that:


is the lower bound on the distance between points

and the separating

Given a linearly separable set S, th
e optimal separating hyperplane is the separating
hyperplane for which the distance to the closest (either positive or negative)

points in S is maximum, therefore it maximizes

Optimal separating hyperplane

r SVMs

The only way the data points appear in the dual form of the training problem is in the
form of dot products
. Even if in the given space the points are non
separated, in a higher dimensional space, it is very

likely that a linear separator can be

So the solution is to map the data points from the input space

into some space of
higher dimension

(n > d) using a function
. Then the training algorithm
will depend only on dot products of the form

Constructing (via
) a separating hyperplane with maximum margin in the higher
al space yields a nonlinear decision boundary in the input space.

Because the dot is computationally expensive kernel function are used. A kernel function

such that

is used in the t
raining algorithm.

All the previous derivations in the model of linear SVM are still viable by replacing the
dot with the kernel function, since a linear separation is still done, but in a different

The classes for Kernel Functions used in SVM are:


RBF (Rad
ial Basis Function):


The kernel functions require calculations in
, ther
efore they are not
difficult to compute. It remains to determine which kernel function

can be associated
with a given (redescription space) function

Decision surface by a polynomial

However, in practice, one proceeds vice versa: kernel functions are tested about
which is know to correspond to the dot product in a certain space (which will work as
redescription space, never made explicit). Therefore, the user operates by “
trial and
error”. An advantages that the only parameters needed when training an SVM are the
kernel function K.

Decision surface by a RBF classifier


SVM in Face Detection

SVM with a binary tree classification strategy

The first analyzed approa
ch proposes a face detection system using linear support vector
machines with a binary tree classification strategy. The result of this technique are very
good as the authors conclude: “The experimental results show that the SVMs are a better
learning algo
rithm than the nearest center approach for face recognition.” [1]

General information subsection describes the basic theory of SVM for two class
classification. A multi
class pattern recognition system can be obtained by combining
two class SVMs. Usually t
here are two schemes for this purpose. One is the one
all strategy to classify between each class and all the remaining; The other is the one
one strategy to classify between each pair. While the former often leads to
ambiguous classificati
on, the latter one was used for the presented face recognition

A bottom
up binary tree for classification is proposed to be constructed as follows:
suppose there are eight classes in the data set, the decision tree is shown in the figure
below wher
e the numbers 1
8 encode the classes. By comparison between each pair, one
class number is chosen representing the “winner” of the current two classes. The selected
classes (from the lowest level of the binary tree) will come to the upper level for another

round of tests. Finally, the unique class will appear on the top of the tree.

The bottom
up binary tree used for classification

Denote the number of classes as c, the SVMs learn

discrimination functions in
the training

stage, and carry out comparisons of

times under the fixed binary tree
structure. If

does not equal to the power of 2, we can decompose

. Because any natural
number (even or odd) can be decomposed into finite positive integers which are the
power of 2. If

is odd,
, and if

is even
. It can be noticed that the
decomposition is not unique, but the number of comparisons in the test stage is always

Face Detec
tion and Recognition with Occlusions

The next approach analyzes a more edge case, where the face is not entirely present in the
analyzed image. So it is searched to derive a criterion for SVM that can be employed in
the three cases defined in the figure b
elow: not occluded, mixed and occluded. The
classical criteria of SVM cannot be applied to any of the three cases, because SVM
assumes all the features are visible. So a new algorithm is implemented named by the
authors Partial Support Vector Machines (PSV
M) to distinguish it from the standard
criteria used in SVM. [3]

Occlusion cases taken into account

The goal of PSVM is similar to that of the standard SVM

to look for a hyperplane that
separate the samples of any two classes as much as possible. In c
ontrast with traditional
SVM, in PSVM the separating hyperplane will also be constrained by the incomplete
data. In the proposed PSVM, the set of all possible values for the missing entries of the
incomplete training sample are treated as an affine space i
n the feature space such that a
criterion which minimizes the probability of overlap between this affine space and the
separating hyperplane is designed. To model this, the angle between the affine space and
the hyperplane in the formulation is incorporate
d. The resulting objective function is
shown to have a global optimal solution under mild conditions, which require that the
convex region defined by the derived criterion is close to the origin. Experimental results
demonstrate that the proposed PSVM appr
oach provides superior classification
performances than those defined in the literature.

PCA and SVM for Face Detection

In this paper [5], a novel method is proposed for eliminating most of the non
face area in
gray images, so that the detection time is
shortened and the detection accuracy is
improved. Face area has different pixel character with most of the non
face area. By
analyzing histogram distributions, it shows face and non
face area have different
histogram distribution. The histogram of face are
as has Gaussian
like distribution but
face area histogram has irregular distribution. According to the histogram
distribution feature, the face potential area can be chosen.

If the histogram is distributed in a small range, its mean value is a high v
alue and if the
histogram distribution is in a wide range, it has a small mean value.

The histogram of face image is a Gaussian
like distribution; the mean value is an
intermediate value. By a number of tests, the histogram mean

value of face potential are
a is chosen in a fixed range. So if the mean value of a sample
area is falling in that range, this sample area is selected as a face potential area.
Otherwise, it is filtered as non
face area.

Furthermore, for face detection, an algorithm combining PCV and

SVM is used, it is a
fine process to detect face region.

The processing consists of three steps:

Step 1: face potential is selected using histogram distribution feature. Face and
face area have different histogram distribution. The histogram

of face areas
has Gaussian
like distribution but non
face area histogram has irregular

Step 2: PCA is used to decrease the dimension of face feature space. At this step,
1000 sample images of size 19×19 are trained

Step 3: SVM is used as cla
ssifier to verify face candidate. It is trained by face and
face samples which are represented by PCA.



Although the SVM has the disadvantage of the need to choose the type of kernel, because
it can be difficult to justify quantitatively
the choice of a specific kernel for a specific
application, its advantages have made SVM a preferred method in face detection
applications. Based on SVM real time applications for face detection have been made, in
which the image is pre
processed to emphas
ize the features and no special illumination
and headgear are needed. According to the criteria of the minimization for the structural
risk of SVM, the errors between sample
data and model data are minimized and the
upper bound for predicting error of the
model is also decreased simultaneously. The
simulation results of this method show that higher processing speed, better correct
recognition rate, and improved generalization are obtained.



[1] Guodong Guo, Stan Z. Li, Kapluk

Chan, “Face Recognition by Support Vector
Machines”, School of Electrical and Electronic Engineering Nanyang Technological
University, Singapore 639798

[2] Ming
Hsuan Yang, Antoine Cornuejols. Introduction to Support Vector Machines

[3] Hongjun Jia, Ale
ix M. Martinez, “Support Vector Machines in Face Recognition with
Occlusions”, The Department of Electrical and Computer Engineering The Ohio State
University, Columbus, OH 43210, USA

Jing Zhang
dong Zhang
wun Ha, “A Novel Approach Using P
CA and
SVM for Face Detection”, Fourth International Conference on Natural Computation 2008