A I P/ C V

muscleblouseAI and Robotics

Oct 19, 2013 (3 years and 11 months ago)

86 views

A
DVANCE

I
MAGE

P
ROCESSING
/ C
OMPUTER

V
ISION

L
ECTURE

1:

F
ACE

D
ETECTION


W
HAT
? W
HY
? A
ND

H
OW


Kamal Nasrollahi

Laboratory

of
C
omputer
V
ision and
M
edia
T
echnology (
CVMT
)


1

P
LAN

FOR

THE

C
OURSE









Mini Projects

2

W
HAT

IS

FACE

DETECTION
?




3

YES

I
MPORTANCE

OF

F
ACE

D
ETECTION


The first step for any automatic face recognition
system


First step in many Human Computer Interaction
systems


Expression Recognition


Cognitive State/Emotional State
Recognition


First step in many surveillance systems


Tracking


Automatic
Target Recognition(
ATR
)


Video coding……

W
HY

IS

THIS

STILL

CHALLENGING
?

5

F
ACE

D
ETECTION
:
CURRENT

STATE


State
-
of
-
the
-
art:


Front
-
view face detection can be done at >15 frames
per second on 320x240 black
-
and
-
white images on a
700MHz PC with ~95% accuracy.


Detection of faces is faster than detection of edges!


Side view face detection remains to be difficult.

D
IFFERENT

A
PPROACHES

1

7

D
IFFERENT

A
PPROACHES

2


Knowledge
-
based methods:


Encode what constitutes a typical face, e.g., the
relationship between facial features


Feature invariant approaches:


Aim to find structure features of a face that exist even
when pose, viewpoint or lighting conditions vary


Template matching:


Several standard patterns stored to describe the face as
a whole or the facial features separately


Appearance
-
based methods:


Some models
are learned from a set of training images
that capture the representative variability of faces.


K
NOWLEDGE
-
B
ASED

M
ETHODS


Top
-
down
approach: Represent a face
using a set of human
-
coded rules

Example:


The center part of face has uniform intensity
values


The difference between the average intensity
values of the center part and the upper part is
significant


A face often appears with two eyes that are
symmetric to each other, a nose and a mouth


Use these rules to guide the search
process

K
NOWLEDGE
-
BASED

M
ETHODS


Pros:


Easy to come up with simple rules


Based on the coded rules, facial features in an input
image are extracted first, and face candidates are
identified


Work well for face localization in uncluttered
background


Cons:


Difficult to translate human knowledge into rules
precisely: detailed rules fail to detect faces and general
rules may find many false positives


Difficult to extend this approach to detect faces in
different poses: implausible to enumerate all the
possible cases

F
EATURE
-
B
ASED

M
ETHODS


Bottom
-
up approach: Detect facial features (eyes,
nose, mouth, etc) first


Facial features: edge, intensity, shape, texture,
color, etc


Aim to detect invariant features


Group features into candidates and verify
them






From: Finding
Faces in Cluttered Scenes Using Random Labeled Graph Matching (1995)
by T. K. Leung ,


M.C. Burl ,


P. Perona

F
EATURE
-
B
ASED

M
ETHODS


Pros: Features are invariant to pose and
orientation change


Cons:


Difficult to locate facial features due to several
corruption (illumination, noise, occlusion)


Difficult to detect features in complex background


T
EMPLATE

M
ATCHING

M
ETHODS


Store a template


Predefined: based on
edges or regions


Deformable: based on
facial contours (e.g.,
Snakes)


Templates are hand
-
coded (not learned)


Use correlation to
locate faces


T
EMPLATE

M
ATCHING

M
ETHODS

From: Finding
face
features (1992) by
Ian Craw, David Tock and Alan Bennett

T
EMPLATE
-
B
ASED

M
ETHODS


Pros:


Simple


Cons:


Templates needs to be initialized near the face
images


Difficult to enumerate templates for different poses
(similar to knowledge
-
based methods)

A
PPEARANCE
-
B
ASED

M
ETHODS
:
C
LASSIFIERS


Neural network


Multilayer
Perceptrons


Princiapl

Component Analysis (
PCA
)



Support
vector machine (
SVM
)


Distribution
-
based
method


Hidden
Markov model


Adaboost



….


A
PPEARANCE
-
B
ASED

M
ETHODS
:
N
EURAL

NETWORK


From: Neural
Network
-
Based
FaceDetection (1998), by
H. Rowley, S.
Baluja
, and T.
Kanade

A
PPEARANCE
-
B
ASED

M
ETHODS
:
N
EURAL

NETWORK


From: Neural
Network
-
Based
FaceDetection (1998), by
H. Rowley, S.
Baluja
, and T.
Kanade

A
PPEARANCE
-
B
ASED

M
ETHODS
:
N
EURAL

NETWORK


From: Neural
Network
-
Based
FaceDetection (1998), by
H. Rowley, S.
Baluja
, and T.
Kanade

A
PPEARANCE
-
B
ASED

M
ETHODS
:
N
EURAL

NETWORK


From: Neural
Network
-
Based
FaceDetection (1998), by
H. Rowley, S.
Baluja
, and T.
Kanade

A
PPEARANCE
-
B
ASED

M
ETHODS



Pros:


Use powerful machine learning algorithms


Has demonstrated good empirical results


Fast and fairly robust


Extended to detect faces in different pose and
orientation


Cons:


Usually needs to search over space and scale


Need lots of positive and negative examples


Limited view
-
based approach

H
OW

TO



22


T
WO

M
ETHODS


Color

based







Viola & Jones

23

C
OLOR

B
ASED

F
ACE

D
ETECTION





24

C
OLOR

B
ASED

F
ACE

D
ETECTION



Skin
color

modeling










25

C
OLOR

B
ASED

F
ACE

D
ETECTION



Skin
color

modeling










26

C
OLOR

B
ASED

F
ACE

D
ETECTION



Probability

image,
Segmentation











27

C
OLOR

B
ASED

F
ACE

D
ETECTION



Feature
extraction



Number

of holes
inside

the region


Height

to
width

ratio


Cross
correlation

with

a
face

template










28

C
OLOR

B
ASED

F
ACE

D
ETECTION


29


Demo

C
OLOR
-
B
ASED

F
ACE

D
ETECTOR



Pros:


Easy to implement


Effective and efficient in
constrained environment


Insensitive to pose,
expression, rotation
variation


Cons:


Sensitive to environment
and lighting change


Noisy detection results
(body parts, skin
-
tone
line tone line regions)

T
HE

V
IOLA
&J
ONES

F
ACE

D
ETECTOR


A real
-
time approach for object detection


Training is slow, but detection is very fast


Key ideas


Integral images

for fast feature evaluation


Boosting

for feature selection


Attentional

cascade

for fast rejection of non
-
face
windows


31

I
MAGE

F
EATURES

“Rectangle filters”


E
XAMPLE

Source

Result

F
AST

COMPUTATION

WITH

INTEGRAL

IMAGES


one pass


(
x, y)

C
OMPUTING

THE

INTEGRAL

IMAGE

C
OMPUTING

THE

INTEGRAL

IMAGE


Cumulative

row

sum
: s(x, y) = s(x

1, y) + i(x, y)


Integral
image
:
ii
(x, y) =
ii
(x, y−1) + s(x, y)

ii(x, y
-
1)

s(x
-
1, y)

i(x, y)


MATLAB
: ii =
cumsum
(
cumsum
(double(
i
)),
2);

C
OMPUTING

SUM

WITHIN

A

RECTANGLE


Let
A,B,C,D

be the values of
the integral image at the
corners of a rectangle


Then the sum of original
image values within the
rectangle can be computed
as:


sum = A


B


C + D


Only 3 additions are
required for any size of
rectangle!

D

B

C

A

E
XAMPLE

-
1

+1

+2

-
1

-
2

+1

Integral
Image

Value =

∑ (pixels in white area)



∑ (pixels in black area)


F
EATURE

SELECTION


For a 24x24 detection region, the number of
possible rectangle features is ~160,000!

F
EATURE

SELECTION


For a 24x24 detection region, the number of
possible rectangle features is ~160,000!


At test time, it is impractical to evaluate the
entire feature set


Can we create a good classifier using just a small
subset of all possible features?


How to select such a subset?

B
OOSTING


Boosting is a classification scheme that works by
combining
weak learners
into a more accurate
ensemble classifier


A weak learner needs only do better than chance


Training consists of multiple
boosting rounds


During each boosting round, we select a weak learner
that does well on examples that were hard for the
previous weak learners


“Hardness” is captured by weights attached to
training examples

T
RAINING

PROCEDURE


Initially, weight each training example equally


In each boosting round:


Find the weak learner that achieves the lowest
weighted

training error


Raise the weights of training examples misclassified by
current weak learner


Compute final classifier as linear combination of all
weak learners (weight of each learner is directly
proportional to its accuracy)


Exact formulas for re
-
weighting and combining
weak learners depend on the particular boosting
scheme (e.g.,
AdaBoost
)

B
OOSTING

ILLUSTRATION


43


Initialize sample weights


For each cycle:


Find a classifier that
performs well on the weighted
sample


Increase weights of
misclassified examples


Return a weighted list of
classifiers

B
OOSTING

ILLUSTRATION

Weak

Classifier 1

B
OOSTING

ILLUSTRATION

Weights

Increased

B
OOSTING

ILLUSTRATION

Weak

Classifier 2

B
OOSTING

ILLUSTRATION

Weights

Increased

B
OOSTING

ILLUSTRATION

Weak

Classifier 3

B
OOSTING

ILLUSTRATION

Final classifier is

a combination of weak
classifiers


Define weak learners based on rectangle features


For each round of boosting:


Evaluate each rectangle filter on each example


Select best threshold for each filter


Select best filter/threshold combination


Reweight examples


Computational complexity of learning:
O
(
MNK
)


M

rounds,
N

examples,
K

features

B
OOSTING

FOR

FACE

DETECTION

B
OOSTING

FOR

FACE

DETECTION


First two features selected by boosting:











This feature combination can yield 100% detection
rate and 50% false positive rate

B
OOSTING

FOR

FACE

DETECTION


A 200
-
feature classifier can yield 95% detection rate and
a false positive rate of 1 in 14084

Not good enough!

(
ROC) curve

A
TTENTIONAL

CASCADE


We start with simple classifiers which reject many
of the negative sub
-
windows while detecting almost
all positive sub
-
windows


Positive response from the first classifier triggers
the evaluation of a second (more complex) classifier,
and so on


A negative outcome at any point leads to the
immediate rejection of the sub
-
window

FACE

IMAGE

SUB
-
WINDOW

Classifier 1

T

Classifier 3

T

F

NON
-
FACE

T

Classifier 2

T

F

NON
-
FACE

F

NON
-
FACE

A
TTENTIONAL

CASCADE


Chain classifiers that are
progressively more complex and
have lower false positive rates:



vs


false

neg

determined by

% False Pos

% Detection

0




50

0 100

FACE

IMAGE

SUB
-
WINDOW

Classifier 1

T

Classifier 3

T

F

NON
-
FACE

T

Classifier 2

T

F

NON
-
FACE

F

NON
-
FACE

Receiver operating
characteristic

A
TTENTIONAL

CASCADE


A detection rate of 0.9 and a false positive rate on the
order of 10
-
6

can be achieved by a

10
-
stage cascade if each stage has a detection rate of
0.99 (0.99
10

≈ 0.9) and a false positive rate of about
0.30 (0.3
10

≈ 6
×
10
-
6
)


FACE

IMAGE

SUB
-
WINDOW

Classifier 1

T

Classifier 3

T

F

NON
-
FACE

T

Classifier 2

T

F

NON
-
FACE

F

NON
-
FACE

T
RAINING

THE

CASCADE


Set target detection and false positive rates for each
stage


Keep adding features to the current stage until its
target rates have been met


Need to lower
AdaBoost

threshold to maximize detection
(as opposed to minimizing total classification error)


Test on a
validation set


If the overall false positive rate is not low enough,
then add another stage


Use false positives from current stage as the negative
training examples for the next stage

T
HE

IMPLEMENTED

SYSTEM


Training Data


5000 faces


All frontal, rescaled to

24x24 pixels


300 million non
-
faces


Faces are normalized


Scale, translation


Many variations


Across individuals


Illumination


Pose


S
YSTEM

PERFORMANCE


Training time: “weeks” on 466 MHz Sun
workstation


38 layers, total of 6061 features


Average of 10 features evaluated per window on
test set


“On a 700
Mhz

Pentium III processor, the face
detector can process a 384 by 288 pixel image in
about .067 seconds”


15 Hz


15 times faster than previous detector of comparable
accuracy (Rowley et al., 1998)


Demo

R
EFERENCES






Ziyou

Xiong

(Dept. of Electrical and Computer
Engineering, Univ. of Illinois at Urbana
-
Champaign)


P. Viola & M. Jones (Microsoft)


Jianfeng

Ren
(Signal and Image Processing Lab,
University of Texas at Dallas)


SVETLANA
LAZEBNIK
(
Dept. of Computer
Science, Univ. of North Carolina at Chapel Hill
)





59