Orthoface for Recognition

joinherbalistAI and Robotics

Nov 17, 2013 (3 years and 8 months ago)

212 views



School of Computer Science and Information Technology


University of Nottingham












Orthoface for Recognition

and

Pose
-
Invariant Face Recognition






Voon Piao SIANG


2001















A thesis in partial fulfilment of the requirement of Universit
y of Nottingham for the degree of
Master of Philosophy


This copy of thesis has been supplied on condition that anyone who consults it is understood to recognise that its
copyright rests with the author and that no information derived from it may be publis
hed without the author’s
prior written consent.




i

Abstract

This research project focuses primarily on the development of orthoface for recognition. The
orthoface method is a novel face recognition algorithm. Orthoface method transforms faces
from image spac
e to face space. It achieves a dimension reduction similar to that of
eigenface. All of the classification methods applicable to eigenface method can be applied to
the orthoface method without any further modification. It maximises the inter
-
class
scatteri
ng, and minimizes the intra
-
class scattering, which has been the weakness of the
conventional eigenface method.

The project matrix of the training face from the orthoface method forms an upper
-
triangular
matrix. Each training face has a different number of

coefficients allowing better
discrimination and classification. A new classification technique is applied, in addition to the
supports for the traditional classifiers. The new classification technique does not require
comparison between test face coeffici
ents and training face coefficients. This improves
recognition speed significantly. It is estimated conservatively that such an improvement is no
less than 10 times faster.

Using the basic orthoface method, the recognition rate is 87%. By increasing to 3 t
raining
faces per subject, the recognition rate improves to 92%. However, using hand
-
picked difficult
test faces, a recognition rate of 83% is achieved. This is further improved using orthospace
histogram method, thus yielding a 97% recognition rate.

The s
econd, less significant, part of this research project focuses on pose
-
invariant face
recognition. This part concentrates on the use of a 3D head model and texture mapping
technique to derive new pose views from one or two existing views. The complete mode
l is
discussed in depth. The idea of using one generic 3D head model for mapping facial texture
is realised through the use of a deformable 3D head model. Deformation is done via
parameters extracted from various feature measurements. The facial texture is

mapped onto
the 3D surface using cylinder texture mapping. The mathematics to compute the vertex on the
cylinder is derived and presented in detail.




ii

Acknowledgement

I would like to thank my two supervisors, Dr. Bai Li and Professor Dave Elliman for their

continuous support, supervision, and inspiring ideas over the course of this research project.
Without their guidance, this research project will not be completed.

Dr. Tony Pridmore for his critical evaluation of my research ideas.

My research associates,

Mr. Jerry Ni, Miss Y. H. Liu for their inspiring suggestions and
enormous amount of time to discuss my research problems.

Miss. P. S. Theng for proof reading my thesis.

My family and girlfriend Miranda for their continuous support and encouragement.

My f
riend Kenny Liew, David Leong, K. H. Wong their mental support.




iii

Table of Content

Abstract

................................
................................
................................
................................
......

i

Acknowledgement

................................
................................
................................
....................

ii

List of Figures

................................
................................
................................
...........................

v

List of Tables

................................
................................
................................
...........................

vi

1

Introduction

................................
................................
................................
....

1

1.1

F
ACE
R
ECOGNITION
I
N
P
ERSPECTIVE

................................
................................
...........

1

1.1.1

What is Face Recognition

................................
................................
...............

1

1.1.2

Why Face Recognition is

Important

................................
...............................

2

1.1.3

Application of Face Recognition

................................
................................
....

3

1.1.4

Face Recognition from the Psychological Perspective

................................
..

7

1.1.5

Face Recognition as a Computer Science Problem

................................
......

10

1.1.6

Biometric


the Future of Security Authentication

................................
.......

13

2

Previous Work

................................
................................
..............................

16

2.1

I
NPUT
R
EPRESENTATION

................................
................................
............................

16

2.1.1

Geometrical Approach

................................
................................
..................

16

2.1.2

Graphical Approach

................................
................................
.....................

17

2.2

F
ACE
R
ECOGNITION
M
ETHOD

................................
................................
....................

18

2.2.1

Principal Compo
nent Analysis

................................
................................
.....

18

2.2.2

Dimension Reduction Besides PCA

................................
..............................

18

2.2.3

Connectionist Approach
................................
................................
................

19

2.2.4

Hybrid Representation

................................
................................
..................

20

2.3

I
NVARIANCE TO
D
ISTORTION

................................
................................
.....................

20

2.4

E
XPERIMENTAL
I
SSUES AND
P
REVIOUS
R
E
SULTS

................................
.......................

22

2.4.1

Experimental Issues

................................
................................
......................

22

2.4.2

Previous Results

................................
................................
............................

25

3

Pose
-
Invariant Face Recognition

................................
................................

27

3.1

P
OSE
D
IFFERENCE
=

S
EVERE
D
ISTORTION

................................
................................
.

27

3.1.1

Transformation in 2D and 3D Domain

................................
.........................

29

3.1.2

Estimating the Difference

................................
................................
.............

30

3.2

V
ARIOUS
A
PPROACHES

................................
................................
..............................

32

3.2.1

M
ultiple View Approach

................................
................................
...............

32

3.2.2

Optical Flow Approach
................................
................................
.................

33

3.3

I
DEAS TO
P
OSE
-
I
NVARIANT
F
ACE
R
ECOGNITION

................................
.......................

34

3.3.1

Enhanced Optical Flow

................................
................................
................

34

3.3.2

Pose
-
Invariant using Deformable 3D Head Model

................................
......

35

3.
4

P
ROBLEMS AND
S
OLUTIONS FOR
3D

H
EAD
P
OSE
-
I
NVARIANT
F
ACE
R
ECOGNITION

...

39

3.4.1

One 3D Model for All Subjects

................................
................................
.....

39

3.4.2

Pose Estima
tion using Multiple Views

................................
..........................

40

3.5

3D

T
EXTURE
M
APPING
M
ATHEMATICS

................................
................................
.....

42

3.5.1

Point Transformation in 3D Space

................................
...............................

42




iv

3.5.2

Polygon Mesh
................................
................................
................................

44

3.5.3

Cylinder Texture Mapping

................................
................................
............

44

3.6

C
ONCLUSION

................................
................................
................................
..............

47

4

Orthoface for Recognition
................................
................................
...........

50

4.1

O
VERVIEW OF
O
RTHOFACE

................................
................................
........................

50

4.2

D
IMENSION

R
EDUCTION
................................
................................
.............................

54

4.3

D
ATA
C
OMPRESSION

................................
................................
................................
..

56

4.4

T
RANSFORMATION TO
O
RTHOSPACE

................................
................................
..........

57

4.4.1

Gram
-
Schimidt Orthogonalsation Process

................................
...................

58

4.4.2

Computing the Orthoface

................................
................................
..............

58

4.5

R
ECONSTRUCTION OF
T
RAINING
F
ACE

................................
................................
.......

59

4.5.1

Training Face Reconstruction and Projection

................................
.............

61

4.5.2

Computing K Matrix

................................
................................
.....................

62

4.6

P
ROJECTION OF
T
EST
F
ACE

................................
................................
........................

63

4.7

R
ECOGNITION

................................
................................
................................
............

64

4.7.1

Simple Classification and Recognition

................................
.........................

68

4.8

W
EAKNESS OF
E
IGENFACE OR
PCA

................................
................................
...........

68

4.9

O
RTHOFACE IS
B
ETTER THAN
E
IGENFACE
................................
................................
..

70

5

Experimental Result and Conclusion
................................
.........................

72

5.1

P
RE
-
P
ROCESSING

................................
................................
................................
.......

72

5.2

P
RELIMINARY
E
XPERIMENTAL
R
ESULTS

................................
................................
....

76

5.2.1

Training and Test Faces

................................
................................
...............

77

5.2.2

Projection Graphs

................................
................................
.........................

80

5.3

E
NHANCEMENT TO
B
ASIC
O
RT
HOFACE
M
ETHOD

................................
.......................

83

5.3.1

Robustness Against Difficult Image

................................
..............................

86

5.4

O
RTHOSPACE
H
ISTOGRAM

................................
................................
.........................

91

5.5

C
ONCLUSION

................................
................................
................................
..............

94

6

New Ideas for Future Research

................................
................................
..

98

6.1

O
VERVIEW OF
N
EW
I
DEAS

................................
................................
.........................

98

6.2

C
ROSS
-
C
ORRELATION
W
INDOW
F
ACE
R
ECOGNITION

................................
..............

100

6.3

O
RTHOWINDOW FOR
F
ACE
R
ECOGNITION

................................
................................

103

6.4

W
EIGHT
M
APPED
F
ACE
I
MAGE
:

A
NSWER TO
L
IGHTING
P
ROBLEM
?

.........................

106

6.5

H
ALF
F
ACE
T
ECHNIQUE
:

T
ECHNIQUE TO
C
OUNTER
L
IGHTING
P
ROBLEM

................

108

References

................................
................................
................................
.............................

109

A
PPENDIX
A:

P
APER
P
UBLISHED

................................
................................
........................

A
-
1


A
PPENDIX
B:

Y
ALE
F
ACE
D
ATABASE

................................
................................
..................
B
-
1


A
PPENDIX
C:

ORL

F
ACE
D
ATABASE

................................
................................
..................

C
-
1


A
PPENDIX
D:

R
ESULT OF
3

V
IEW
P
ER
S
UBJECT

................................
................................

D
-
1


A
PPENDIX
E:

R
ESULT OF
O
RTHOSPACE
H
ISTOGRAM

................................
........................
E
-
1





v

List of Figures

Figure 1: Typical usage of face recognition system

................................
................................
..

4

Figure 2: Effect of rearranging horizontal face strips

................................
................................

9

Figure 3: Removing the eyes changes the face dramatically

................................
.....................

9

Figure 4: Hierarchy of importance by human brain
................................
................................
.

10

Figure 5: General face recognition process

................................
................................
.............

11

Figure 6: Excellent face detection from [93]

................................
................................
...........

12

Figure 7: Background removal

................................
................

Error! Bookmark not defined.

Figure 8: Possible outcome of “
No Rejection Test ”

................................
...............................

23

Figure 9: P
ossible outcome of “
Rejection Test Face Used ”

not using imposer set

...............

24

Figure 10: Possible outcome of “
Rejection Test Face Used ”

using imposer set
...................

24

Figure 11: Pose variation generates very different image

................................
.......................

28

Figure 12: 15 poses of the same face

................................
................................
.......................

29

Fig
ure 13: Impressive result of optical flow

................................
................................
............

33

Figure 14: Distortion from frontal view optical flow

................................
..............................

34

Figure 15: Enhancement to o
ptical flow

................................
................................
..................

35

Figure 16: Subject enrolment block diagram

................................
................................
...........

36

Figure 17: Block diagram of pose
-
invariant face recognition

................................
.................

38

Figure 18: Cyberware’s head and face 3D colour scanner

................................
......................

39

Figure 19: 3D model of head statue

................................
................................
.........................

40

Figure 20: Result from pose estimation

................................
................................
...................

40

Figure 21: Best
-
line fit for pose estimation

................................
................................
.............

41

Figure 22:

Polygon mesh of human body

................................
................................
................

44

Figure 23: Illustration of cylinder texture mapping technique

................................
................

45

Figure 24: Reconstruction using
orthoface

................................
................................
..............

51

Figure 25: The upper
-
triangular property of the reconstruction matrix

................................
...

52

Figure 26: First 20 orthoface
................................
................................
................................
....

52

Figure 27: Reconstruction using eigenfaces

................................
................................
............

53

Figure 28: Reconstruction using eigenfaces

................................
................................
............

53

Figure 29: First 20 eigenface

................................
................................
................................
...

54

Figure 30: 2 by 2 image is the linear combination of 4 basis.

................................
.................

54

Figure 31:
Compression using orthoface or eigenface

................................
............................

56

Figure 32: Transformation of 8 faces from Yale Face Database to orthoface

.........................

57

Figure
33: Comparison between Train2 to Test 1, 2, 3, and 4

................................
.................

66

Figure 34: Inverse of Euclidean distant of test face projection

................................
...............

67

Figur
e 35: Side of nose is more distinctive than centre of nose

................................
..............

73

Figure 36: Face extraction that removes background and hairstyle

................................
........

74

Figure 37: Effect of intensity normalisation

................................
................................
............

75

Figure 38: Difference between normalised and not normalised faces

................................
.....

76

Figure 39
: Example of training faces and their orthofaces

................................
......................

77

Figure 40: Negative of training faces

................................
................................
.......................

78

Figure 41: The average training fac
es

................................
................................
......................

78

Figure 42: Test faces

................................
................................
................................
................

79

Figure 43: 3D view of the anticipated shape of matrix


................................
........................

99

Figure 44: Human is not complete symmetry

................................
................................
........

100

Figure 45: Weight map for the eye region

................................
................................
.............

106




vi

Figure 46: W
eight
-
mapped faces are more similar

................................
................................

107

Figure 47: Face comparison of with and without half face

................................
...................

108




List of Tables

Table 1: 6 degree of freedom

................................
................................
................................
...

28

Table 2: Details of faces with different pose

................................
................................
...........

30

Table 3: Difference between fro
ntal view and rotation about vertical axis

.............................

31

Table 4: Details of difference image

................................
................................
........................

31

Table 5: Various 3D transformation matrix

................................
................................
.............

43

Table 6: Projection of Training Face

................................
................................
.......................

65

Table 7: Projection of Test Face

................................
................................
..............................

66

Table 8: Euclidean Distant

................................
................................
................................
.......

66



Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

1



Chapter 1

1

Introduction



1.1

Face Recognition in Perspective

1.1.1

What is Face Recognition

A face recognition would be capable of identifying new instances of faces that the computer
has been trained or programmed to recognise. Given the prior knowledge of a set of known
face, and a digitised scene that contains one or more faces to analyse, the face recognition
system attempts to locate the position and orientation, extract, enhance,
and finally recognise
the extracted face(s).

For example, given a scene of busy street where the subject of interest is standing in the
middle, and surrounded with cluttered objects such as cars, shops, etc. The face recognition
system tries to locate and

extract the subject. The extracted image is pre
-
processed to enhance
image quality. The recognition system then tries to match the extracted face to a known face.

There are two important elements in face recognition system: the training face set and the t
est
face set. The training face set is a set of known faces used to teach the recognition system.
Depending on the type of face recognition algorithm, the system tries to extract features, or
tries to memorize the features. As with any recognition system,
the quality of the training
faces has a large affect on the recognition accuracy and robustness. This effect exists in the
human recognition system as well. If a human is not able to see an object clearly, the ability
to recognise the object is seriously h
indered. The training faces are usually a set of
Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

2

photographs. To ensure quality, these photographs are usually taken under controlled a
environment.

The second element is the test face set. These faces serve as the input to the face recognition
system. Th
e source may be a digital camera feeding the system in real time, or simply a set of
photographs from archived database. Test faces may vary from the training faces. The
amount of difference, and the ability of the recognition system to tolerate these diff
erences,
dictate the recognition accuracy. A robust system will be able to generalise from a set of
examples.

1.1.2

Why Face Recognition is Important

What makes face recognition an interesting problem? Consider the importance of the face in
human culture. A huma
n face is central to social interaction. It is the main source of
information by which people identify each other, and the focus during a conversation. Since
the ability to identify each other is one of humanity’s core ability (like the ability to speak),
a
natural question to ask is whether a computer could replicate this ability. The urge to build
machines that are as human as possible sparks the research on face recognition. Face
recognition is one of the many basic abilities that are required in order t
o construct a man
-
like
machine.

For most security authentication applications where virtually any environment or situation
requires a key, card, or password for access can be replaced or further enhanced by face
recognition. Using face recognition instead
of the aforementioned access methods will greatly
increase the ease of use, ease of implementation, and overall elegance of use. If face
recognition is used in parallel with existing access methods, then the level of security could
be greatly increased. F
or example, face recognition could be used in a building’s main
entrance to replace the key or card entry system. This will prevent fraud access due to stolen
key or card. User authentication at ATM machines is highly susceptible to fraud because due
to th
e ease of card falsification. Enforcing face recognition will disallow such fraudulence.
The widely installed base of ATM machines make them strategically advantageous to locate
wanted criminals across the nation. Potentially, even criminals will need to a
ccess an ATM
machine.

Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

3

In the area of human
-
computer interaction (HCI), automatic logon is made possible if a
workstation is installed with a camera that detects its user. Upon positive identification, the
user is automatically logged on and his/her enviro
nment is automatically loaded. The
computer will then be seen as a much friendly piece of hardware.

In law enforcement, face recognition is very useful to match mug shots taken from line up, or
other means of acquisition, against the database of known cri
minals to detect their presence.
This task is very labour intensive if performed by a person.

These applications are clearly well beyond he state
-
of
-
the
-
art for present generation face
recognition systems, except perhaps where only a small number of faces
need be recognised.
However, the astonishing ability of people to recognise faces is an existence proof that a
sufficiently high level of performance is physically possible. Indeed the upper limit of
performance may well be greater than that achieved by hu
man beings, who clearly have only
a limited number of steps in the algorithm used, as neural switching speeds are rather slow!


1.1.3

Application of Face Recognition

Effective face recognition would no doubt a very useful technology. It can be applied in a
wide
range of real world applications. Its huge potential would provide a major impact on
security. The impact includes tighter security implementation and better ease of usage on the
user end. A few major areas where face recognition can be applied are discuss
ed next.

Secure Access to Entrances, Protected Property

Currently, the most popular means of access are key access, magnetic/smart card, and/or pin
number authentication. The provider of means of access trusts that only the authorised person
holds those me
ans of access. However, once the means of access falls into unwanted hands,
there is no way to stop the security breach if the provider is still in the dark.

Performing face recognition in parallel with those means of access adds higher levels of trust.
Un
recognised people will be denied access. This ensures the person holding the conventional
means of access
IS

the person with authorised access. This is why there is a security guard
booth on most major site entrances. Being able to actually see the person
requesting access is
a crucial factor in granting access. Face recognition is especially useful when stationing a
Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

4

security personnel on 24 hourly basis is not possible, e.g. building/door entrances, ATM
machines, safety deposit boxes, vaults.

Figure
1

shows a typical card access system coupled with a face recognition system. Such a
system is highly secure compared to either system operating alone.

Registered Face
Database
Card Reader
Get ID
Card
Camera
Recognition System
Locate and Extract
Unknown Face
Recognition System
Seach Face Database
Recognition System
Match Face
ID
Unknown
Face
Registered
Face
Yes
No
Registered
Face


Figure
1
: Typical usage of face recognition system


Surve
illance Statistics/Audit

Currently, a surveillance system simply records snapshots from the surveillance camera at
fixed intervals. This system does not provide any statistical information, and is practically
useless unless one spends a great deal of time

looking at the video. Applying face recognition
technology to only those snapshots would unveil a whole range of statistical information and
give a potential audit trail. Even the simplest information such as the number of people
passing through the area
at certain times of the day is useful. A full employee movement
audit trail would provide information on movement patterns or habits, and could be used to
Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

5

detects unauthorised personnel in a secured area. General surveillance like street surveillance
is us
eful in the event of crime.

Another useful application is to identify criminals and suspects. Having such a system in
transport hub like airport and train station to detect wanted criminals will help to prevent the
criminals from leaving the area or coun
try.

Authenticating Users of Computer Networks

The most commonly used method of authenticating a computer user is via username and
password. Many companies have discovered that passwords can be guessed, stolen or
forgotten. They can often be
cracked

using

tools freely available on the internet. Multiple
passwords per user is not feasible, as this is inconvenient for the user, difficult to remember,
and time consuming for administrators. Face recognition can either act as an alternative to the
password syst
em, or add an extra layer of security to the existing system.

“Recent surveys show that the average corporate fraud event costs
about $1 million. Corporate fraud is estimated to be $2.7 Billion of
losses per year in the US alone.”



Computer Security Inst
itute


San Francisco


March 2000


If face recognition is performed in addition to password authentication, the security level is
raised with a high level of confidence. Other biometrics might also be used such as
fingerprint or iris pattern recognition.

“T
he need for a more secure, yet easy
-
to
-
use means of authenticating
users is growing at an explosive rate.”



eTrue Inc., US


(Creator of TrueFace)


Jan 2001


Time and Attendance

Many companies enforce a punch card policy on employees. This requires the em
ployee to
insert a card into a time stamping machine (the traditional method). The aim is to record the
time of getting into work, leaving work, and the attendance of this particular employee. This
method is cumbersome and slowly losing its popularity. A s
uccessful face recognition
Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

6

provides the identical functionality without the hassle of a punch
-
card, and the recorded date
and time can be fed into the personnel management system directly. This is also useful for
safety purposes during emergency situations

such as a fire alarm. The list of people and/or the
number of people inside the building can be determined at a glance.

Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

7

1.1.4

Face Recognition from the Psychological Perspective

It is the nature of a human being to try to migrate and teach human capabilities t
o others.
Others in the context of artificial intelligence might refer to a computer. The potential of such
machines is one of the factors that generates the interest in psychology and neurophysiology.
Knowing how the brain works informs research into how
to replicate these algorithms and
mechanisms. Of course, viable mechanisms that are not anthropomorphic may be as good or
better in practice.

Face recognition is known as an in
-
class object recognition problem, which means that it
involves differentiation

between objects that are very similar. All faces, in general, belong to
the same class as they all carry a standard set of features positioned according to a fixed set of
rules. All faces are, in general, almost elliptical, and consist of hair, eyes, nose
, and mouth.
These features, in general, are positioned by the rules: hair
-
line on the top, eyes aligned left
and right on the upper half of the face below the hair
-
line, nose positioned below the pair of
eyes and centred vertically and horizontally, mouth

aligned horizontally in the middle and
below the nose. In
-
class recognition is clearly a very difficult task and yet most people
perform this task without any apparent effort.

Strong evidence shows that the human being’s strong capacity for object recogn
ition in
general and face recognition in particular is tied to specialised brain areas.
Neurophysiological evidence for a “face recognition module” has been discovered in
monkeys. As such, a human being’s proficiency with faces is believed to be “hardwired

(
[32]
,
[76]
,
[109]
). Damage to this section of the brain causes a disorder known as
Prosopagnosia. Patients lose their ability to recognise a face, or to differentiate individuals

(
[35]
,
[42]
).

In fact, a simple test and reasoning confirms that face recognition is indeed a specialised skill,
and this skill does not extend to other objects. A human is not able to differentiate very
well
between animals. For example, all birds, all fish, all
xx

looks the same, where
xx
is almost
any animal family. Even a man’s best friend, the dog, looks the same if it is from the same
species. In contrast, humans of the same ethnic group do not look
similar to each other except
for twins. This suggests that human’s in
-
class recognition capability works on the face only,
in other words, the skill is “hardwired” (computer science terminology: program/code that is
Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

8

not generic or easily extendable is term
ed “hard coded”). Further evidence can be seen from
the fact that faces are not at all easily differentiated when presented upside
-
down.

Over the years, exhaustive psychological experiments have been carried out to investigate
how human brain performs face

recognition. Some observations described in
[15]
,
[16]
,
[17]
,
and
[18]

include:



People are very good at recognising faces, but not at recognising other obj
ects that
requires an in
-
class differentiation skill.



People are sensitive to tiny adjustment in position or changes to features within the
face: people recognises faces easily if certain feature of a familiar face changes, for
example the changes in posit
ion of the nose, or the substitution of eyes with another
subject’s.



People recognise badly if features were not in normal configuration: experiments
were done by rearranging the horizontal strips of human face. Recognition accuracy
deteriorated markedly.
For example in
Figure
2

(image source
[107]
) shows the
difference in a person’s looks after swapping around the eyes and mouth strips. The
Yale face database was used here.







Can you recognise w
hose face is this?





Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

9




Figure
2
: Effect of rearranging horizontal face strips

Recognition of Unfamiliar Faces Starts From Internal Feature

A human does not treat all parts of the face equally. It is observed that the sha
pe and outline
of a glance has the most weight. This is followed by the eyes. Very often, recognition fails
immediately after the eyes are covered or deleted, as shown in
Figure
3

(image source
[107]
).

The mouth and nose are the least important of all.





or


Figure
3
: Removing the eyes changes the face dramatically



Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

10

Face Outline,
Hair
Eyes
Mouth
Nose
(very
unimportant)

Figure
4
: Hierarchy of importance by human brain


1.1.5

Face Recognition as a Computer

Science Problem

Face recognition as a psychological problem focuses on the understanding of how the human
brain performs the task. As a computer science problem, the focus and aim switches from
understanding to the process of mechanising face recognition,

or to perform the face
recognition through mathematical and computer algorithm. In general, the face recognition
process is as illustrated in
Figure
5
.

Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

11

General Face Recognition Process


Locate
Face
1
Cut
Face
2
Matching
Algorithm/
Classifier
5
Face Location +
Face Image
Test Face
Face Image
from Camera
Face Image
Face Database
Training Face
Features
Pre-
Processing
3
Pre-processed
Test Face
Feature
Extraction
4
Features
Confidence Score/
Matching Result


Figure
5
: Gene
ral face recognition process

Steps 1 to 3 are known as the preparation stage. The image taken from the camera is know as
the test image. The first task is to determine whether one or more face exist in the test image.
The is known as face detection. Sung a
nd Poggio
[93]

describes a face detection method
using example
-
based learning. This method is very successful and produce surprisingly
reliable results. Shown in
Figure
6

(image source
[93]
) is some sample result from Sung and
Poggio’s work.






Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

12





Figure
6
: Excellent face detection from
[93]

Face detection has spawned widespread research interest over the last 5 y
ears and recent
results have been very promising. Rowley
et al.

[85]

describes face detection using a neural
network. The resultant face detection method is highly accurate. Reference
[14]
,
[48]
,
[56]
,
[73]
,
[94]

describe face detection using various techniques and method of how to eliminate
negative external factors.

Face detection is
usually followed by feature detection. Upon knowing that a face exists in a
certain image area, the next step is to find the position of some specific and salient facial
features. Usually, this means the search for eyes, nose and mouth. The positions of th
ese
features are used to determine the pose and central point. These two pieces of information
feed the algorithm to extract the face.

Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

13

Most face recognition algorithms require some forms of pre
-
processing. For example the test
face is scaled to the same si
ze as the training face. The intensity is adjusted to some standard
average value, for example using histogram equalisation. After pre
-
processing, the test face
acquired from the external source is then used as the input to the matching algorithm.

Working

directly on the test face yields poor results. As such, most algorithm works on the
extracted features. The term “feature extraction” carries different meaninsg depending on the
algorithm. Broadly speaking, this is strategically selected points, or the re
sult of some kind of
transformation. This forms the heart of face recognition. The ability to extract features that
discriminate strongly among different subjects but weakly within the same subject dictates
how effective the feature is as a discriminator.

1.1.6

Biometrics


the Future of Security Authentication

Traditional means of authentication are losing their effectiveness due to increasingly more
sophisticated “cracking” methods. Keys can be lost or stolen and locks can be picked.
Passwords or pin numbers c
an be forgotten,
cracked

or stolen. The solution to these problems
is to authenticate measurable biological characteristics or behaviours of people in which only
the authorised person is able to provide, to uniquely identify who they are. One way is to use

part of the physical human body that is unique to the person. This is known as biometric
authentication.

Biometric authentication is divided into five main categories: face recognition, fingerprint
recognition, iris authentication, voice recognition, and
signature recognition. DNA
fingerprinting would be the most powerful method of all, but the amplification of DNA
sequences is too slow a process for authentication applications, and there are many ethical
considerations.

i. Fingerprint Recognition

This te
chnique acquires the user’s fingerprint image via the fingerprint scanner. It is then
compared to the fingerprints in the database scanned during the enrolment process. If the
newly acquired fingerprint matches the claimed reference fingerprint, then a pos
itive identity
is returned. This is said to be the most cost effective and flexible solution. Implementation is
very robust because there is no special requirement for the acquisition environment.

Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

14

ii. Face Recognition

A user’s face is captured using a vid
eo camera or webcam and pre
-
processed. Usually, the
user enters a password or swipes a card to provide an ID. This ID is used to retrieve the
reference face from the enrolled face database. The reference face is then compared against
the newly captured fac
e. A positive identity is returned if the two faces match. Hardware cost
is low, and current commercial systems claim very high accuracy. Implementation is
restricted because the environment to capture the face must be stable, and usually under
controlled
lighting and background conditions.

iii. Voice Recognition

A voice pattern is recorded using microphone and converted to digital wave signal via
analogue to digital conversion. During an enrolment, the user says some words. The same
words must be used dur
ing authentication. When the digital pattern of the two samples
matches to within a threshold, a positive identity is returned. Implementation is simple and
the cost is low, but the system might return false identity when the user’s voice is changed
due to

illness.

iv. Iris Recognition

The iris pattern of a human’s eye is one of the most unique features in the human body. The
security reliability is higher than that of the fingerprint. The iris pattern is acquired using an
iris scanner. However, the iris s
canning technology is not completely mature yet. Only
expensive high
-
end models are able to acquire distortion
-
free iris patterns. Although the
implementation cost is high, high security reliability can be achieved provided distortion
-
free
iris patterns ca
n be obtained.

Orthoface for Recognition and Pose
-
Invariant

Face Recognition

Introduction

15

v. Signature Verification

Using a pressure sensitive pen tablet, the user’s signature and pressure dynamics (strokes) are
captured. This is then compared to the signature database to authenticate the user. Writing on
a pen tablet is not eas
y and it takes some getting used to. From the user’s perspective, this
system can seem difficult to use. However, the implementation cost is low, and the
effectiveness is reasonable. Nonetheless, a signature is not difficult to forge if one takes the
time
to practise the same signature. It may be necessary to replicate the speed of writing and
even the pressure used as well as the visual appearance for an on
-
line system.

Source: Reference
[68]

Miros, “Biometrics”, www.miros.com


Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

16

Chapter 2

2

Previous Work



In the previous chapter, the problem of face recognition is defined and an overview of the
problems and major issues presented. This chapter focuses in detail on the previous wor
k and
existing research effort in face recognition.

2.1

Input Representation

The most significant part of face recognition is the input representation. This refers to the
transformation of the intensity map to a form of input representation that allows easy an
d
effective extraction of highly discriminative features. The next stage is classification.
Although this is an important stage, the popular techniques and their strengths are fairly
similar to each other. As such, the choice of classification algorithm do
es not affect the
recognition accuracy as much as input representation.

Input representation is the major factor that differentiate face recognition algorithms. It can
be approached in 2 manners: a geometrical approach that uses spatial configuration of
the
facial feature, and a more pictorial approach that uses image
-
based representation.

2.1.1

Geometrical Approach

There are numerous geographical approaches: seminal work of Kanade
[52]

and Kaya and
Kobayashi
[53]
, Craw and Cameron
[28]
, Wong
et al.

[106]
, Brunelli and Poggio
[20]
, and
Chen and Huang
[24]
. These are feature
-
based systems tha
t start by searching for facial
features, such as the corners of eyes, corners of mouth, sides of nose, nostrils, and contour
along the chin, etc. Feature location algorithms usually based on a heuristic procedure that
rely on edges, horizontal and vertica
l projections of gradient and grey levels, and deformable
templates (see Yuille, Hallinan and Cohen
[110]
).

Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

17

The geometry details of a facial feature are captured by feature vectors that include details
such as distant, angle, c
urvature, etc. The implementation method mentioned above uses a
feature vector with dimension ranging from 10 to 50. Craw and Cameron’s system
[28]

uses a
feature vector that represents feature geometry by displacement vectors
from an “average”
arrangement of features. This effectively measures how the particular training face differs
from the norm. Once faces are represented in the feature vector, the classification or
similarity measurement is usually performed by computing th
e Euclidean distance or a
weighted norm, where dimensions are usually weighted by some measure of variance.

To identify an unknown test face, geometry
-
based recognisers choose the model closest to the
input image in feature space. This approach to face rec
ognition has been limited to frontal
views, as the geometrical measurements used in the feature vector changes according to face
rotations out of the image plane (
x
-
y

plane).

2.1.2

Graphical Approach

The second major type of input representation is the graphica
l or pictorial approach. This
represents faces by filtered images of model faces. In template
-
based systems, which is the
simplest pictorial representation, the faces are represented either by an intensity map of the
entire face, or by part
-
images/sub
-
imag
es of salient facial feature such as eyes, nose, and
mouth. The input to template
-
based system is usually but not necessarily face images. Some
systems use gradient magnitude or gradient vector to take advantage of the better immunity to
lighting condition
s.

Given a test face, this is then compared to all model templates. The typical way to measure
image distant is correlation. Baron
[4]

uses normalised correlation on grey level templates.
This system motivated the famous templa
te based approach by Brunelli and Poggio
[20]

that
uses normalised correlation on gradient magnitude. In fact, this system has gone so far that
Gilbert and Yang
[40]

implemented this system in a cus
tom
-
built VLSI chip to perform real
-
time face recognition. Burt
[21]

uses a hierarchical coarse
-
to
-
fine structure to represent and
match templates. Bichsel
[10]

uses a template that take advantage o
f
x

and
y

components of
the gradient.

Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

18

2.2

Face Recognition Methods

This section describes the various popular face recognition methods used in face recognition
research, and provides a brief overview of each one. The analysis of the effectiveness of each
metho
d and their variants is discussed in the subsequent section.

2.2.1

Principal Component Analysis

Principal component analysis (PCA) is used for both recognition and face reconstruction.
PCA can be described as an optimised pictorial approach. The dimension size o
f the image
space is reduced to form the face space. Dimension reduction is very significant as only a
small percentage of the original number of dimensions is used for classification. The original
dimensionality is equal to the number of pixel of the face

image. In the face space, the
dimension is reduced to the number of eigenfaces (Turk and Pentland
[99]
). The eigenspace
is the face representation framework. To apply PCA, it is assumed that the set of all face
images is a l
inear subspace of all grey level images. The eigenfaces are the basis that span the
face space. The eigenfaces are found by applying principal components to a series of faces. In
the face space, faces are represented by coefficients that mark the projectio
n onto the
corresponding basis.

Turk and Pentland
[9
9]

were the first ones to apply principal component analysis to face
recognition. Akamatsu
et al.

[2]

first preprocess the face image using Fouri
er transform
magnitudes to remove the affect of translation. Craw and Cameron
[28]

applied PCA to
shape
-
free faces, i.e. faces where the salient features have been moved to a standardised
position. Serra and Brunelli
[88]

used PCA on template of major facial features, achieving
result comparable to that of correlation but at a fraction of the computational cost. Pentland,
Moghaddam and Starner
[75]

applied PCA to a series

of problems, including recognition of
frontal views in a large database of over 3000 subjects, recognition under varying rotation out
of the image plane, and detection of facial features using eigentemplates. Kirby and Sirovich
[54]

have demonstrated that faces can be accurately reconstructed from their face space
representation.

2.2.2

Dimension Reduction Besides PCA

In addition to PCA, other analysis techniques have been applied to face images to generate a
new and more compact repr
esentation when compared to the original image space. Kurita,
Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

19

Otsu and Sato
[59]

represented a face by using autocorrelation on the original grey level
images. 25 autocorrelation kernels of up to 2
nd

order are used, and the su
bsequent 25
-
D
representation is passed through a traditional Linear Discriminant Analysis classifier. Cheng
et al.

[25]

and Hong
[49]

have applied Singular Value Decomposition (SVD) to the face
imag
e where the rows and columns of the image are actually interpreted as a matrix. Cheng
et
al.

used SVD to define a basis set of images for each person, which is similar to face space of
that of the PCA. The only difference is that each person’s image has it
s own face space. Hong
creates a low dimensional coding for faces by running the singular values from SVD through
linear discriminant analysis.

Ramsay
et al.

[82]

have used vector quantization to represent faces after the faces

are broken
down to their important facial features. The face is represented by a combination of indices of
best
-
matching templates from a codebook. The number one issue is how to choose the
codebook of feature templates, Nakamura, Mathur and Minami
[71]

have used “isodensity
maps” to represent faces. The original grey level histogram of the face is divided up into
eight buckets, defining grey level thresholds for isodensity contours in the image. Faces are
represented by a set
of binary isodensity lines, and face matching is performed using
correlation on these binary images.

2.2.3

Connectionist Approach

Connectionist approach to face recognition also use pictorial representations for faces
(Kohonen
[55]
,
Fleming and Cottrell
[36]
, Edelman, Reisfeld and Yeshurun
[34]
, Weng,
Ahuja, and Huang
[104]
, Fuchs and Haken
[37]

[38]
, Stonham
[90]
, Midorikawa
[67]
). Since
the networks used in connectionist approaches consist only of classifiers, these approaches
are similar to the ones described above. In multiplayer ne
tworks where simple summating
nodes are used, inputs such as grey level images are applied at input layer. The output layer is
usually arranged as one node per object and its activity determines the object reported by the
network.

The input to the network
is a set of training faces. They are fed into the network to train the
network using a learning procedure that adjusts the network parameters, usually known as
weight. Among connectionist approaches to face recognition, the two most important issues
are i
nput representation at the input layer and the overall network architecture. As previously
mentioned, the input representations are pixel
-
based, with
[55]
,
[36]
,
[37]
, a
nd
[67]

using the
Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

20

original grey level images.
[104]

used directional edge maps,
[90]

used a threshold binary
image, and
[34]

used Gaussian u
nits applied to the grey level image. A variety of network
architectures have been used. A plain multiplayer network trained by back propagation is the
most common approach, and was used in
[36]

and
[67]
. A rather similar technique,
[34]

used
a radial basis function network with gradient decent.
[55]

and
[37]

used a recurrent auto
-
associative memory. It recalls th
e closest pattern to the applied unit currently in memory. A
multiplayer cresceptron is used by
[104]
. This is a derivative of Fukushima’s Neocognition
[39]
. The network by
[90]

works on binarized images, using a network architecture of the
sum of set of 4
-
tuple AND functions.

2.2.4

Hybrid Representation Methods

Hybrid representation methods combine both geometrical and pictorial approaches. Canon
et
al.

[22]

explored a 5D feature vector that stores distances and intensities. This vector is used
as a “first cut” filter on the face database. A least squares fit of eye template is used as the
final match. In another hybrid approach, Lades
et al.

[60]

represented faces as elastic graphs
of local textural features. This technique is also used by Manjunath, Chellappa, Malsburg
[64]
. The graphs’ edges capture the feature geometry, which stores infor
mation such as
distance between two incident features. To represent pictorial information, the graph vertices
stores the result of Gabor filters applied to the image at feature locations. The recognition
process begins by deforming the input face graph to
match model graphs. Then by combining
measures of the geometrical deformation and the similarity of the Gabor filter responses, a
match confidence is calculated. Although this technique is commonly known as elastic graph
matching or in neural net terms as
a dynamic link architecture, its mechanism is effectively
representing and matching flexible templates.

2.3

Invariance to Distortion

The meaning of invariance to distortion refers to the ability of the face recognition algorithm
to tolerate distortion such as
changes in pose, lighting condition, expression etc. Distortion
refers to the content of difference between the training face and the test face. Most systems
described in the previous section provide some degree of flexibility by using representations
or p
erforming an explicit geometrical normalisation step.

Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

21

An invariant representation is one that remains constant even if the input has changed in
certain respects. A band
-
pass filter or low
-
pass filter is able to offer a limited degree of
immunity to lightin
g condition. For example, a Laplacian band pass filter removes low
frequency component. This assumes that the image content due to lighting condition such as
shadows cast on a face is mainly comprised of low frequency information. As such, higher
frequency

texture information is preserved. This is true to a certain extent. However, low
frequency components contain much of the information that is not due to the lighting
condition, especially the face’s original content. In other words, the majority of the
in
formation content comprises low frequency components.

For translation invariance, some approaches transform the intensity map of face image to
frequency domain. Working in the frequency domain, the position of the face will not alter
the frequency pattern.

A Fourier transform is most widely used for this. However, such
transformation does not provide invariance to scale and in
-
image place rotation. Fuchs and
Haken
[37]

[38]

described a mechanism that

is able to tolerate distortion due to a 4 degree of
freedom: translation (move along
x
-
axis +
y
-
axis = 2 freedom), scaling and rotation (within
the image plane). The first step is to apply Fourier transform to provide translation
invariance. Next, the Car
tesian image representation is transformed into a complex
logarithmic map. This is a new representation where scale and in
-
image plane rotation
become translational parameters in the new space. Applying the Fourier transform again
would automatically add i
nvariance to in
-
image plane rotation and scaling. The problem of
out of image plane rotation (rotate about
x
-
axis +
y
-
axis = 2 degree of freedom) is commonly
known as the pose problem. So far, nobody has discovered an input representation that is
invariant

to the pose problem. That is to say there is no input representation that will remain
unchanged when the pose problem exists.

If the system is able to determine two points on the face where it knows the standard position
and standard distant between them
, then the face can be normalised for translation, scale and
in
-
image plane rotation. In geometrical approach, distances in the feature vector are
normalised for scale by dividing it by a given distance such as the interocular distance or the
length of the

nose. In template
-
based system, faces are often geometrically normalised by
rotating and scaling the input image to place the eyes at fixed locations. Out
-
of
-
image plane
rotation could not be treated by this technique, which is necessary if the system was

to handle
general pose. Most systems are able to tolerate 4
-
degree of freedom out of the 6.

Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

22

By treating the process to allow invariance to the problems mentioned previously as a pre
-
processing stage, most face recognition systems’ core recognition and mat
ching algorithm
accept a 2D image as input. As such one could consider most face recognition systems as 2D
systems. There have been a few systems that tried to obtain invariance to out
-
of
-
image plane
rotation. These systems use the technique of storing mu
ltiple training views of different
poses. Akamatsu
[2]

used four slight rotated training faces (up, down, left, right) in addition
to the frontal view. Kurita e
t al.

[59]

used a database of 5800 fac
es to construct a Linear
Discriminant Analysis classifier. His database consists of 116 subjects with 50 faces per
subject. The image source is from a video tape session where each change between each
snapshot produces slight pose difference. Manjunath
[64]

and Lades
et al.

[60]

used an
elastic graph matching technique to match the frontal view to the test face subjected to out of
image plane rotation, and to test faces with facial expressions diffe
rent to that of the training
face.

All of the techniques and methods described up to this point offer some degree of invariance
to pose, lighting condition and facial expression. However, the strength of invariance differs.
This thesis presents a new face

recognition that offers strong invariance to lighting condition
and facial expression.

2.4

Experimental Issues and Previous Results

2.4.1

Experimental Issues

Although a face recognition algorithms can be very mathematical in nature, the experiment to
test its accur
acy and robustness is highly empirical, being based on statistical information. As
with any experiment that relies on statistical results, the sample size is very important.
Sample size refers to the number of training and test faces, both in terms of the
number of
subjects and the number of views per subject. The difference between the test face set and the
training face set is very important as well. The system is considered weak if it performs well
only when the test face closely resembles the training f
ace. In other words, the magnitudes of
the distorting parameters are minimal. The result then becomes meaningless if the face
recognition system is intended only for low level of distortion. The key ability of a robust
system is its ability to generalise f
rom a set of training examples.

Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

23

The building of a face recognition system, assuming that the algorithm has been chosen,
begins by collecting face images. The set of face images collected is divided into training
faces and test faces. The training faces are

processed and transformed into the system’s
representation format, which is usually a geometry feature vector or a set of templates. For
neural network approach, this set of face images is used to train the network.

After processing the training faces, re
cognition begins by feeding the system with a series of
test faces. There are two levels of testing, depending on whether the system performs a
rejection test on a labelled test face or not.

No Rejection Test

This is the simpler experiment because fewer te
st faces are needed. All the test faces used
belong to subjects in the training database. The potential for error is
substitution error
, which
is the mis
-
classification of one person for another. A useful statistic is the recognition rate,
expressed as a p
ercentage of test faces correctly recognised. Currently published result claim
no less than 90%. Another useful but rarely mentioned statistic is the mis
-
substitution rate,
the percentage of test faces being mis
-
classified as another person. This is illust
rated in
Figure
7
.

Correct Recognition
Test Face
Classified
in training database
Correct
Wrong
Substitution Error


Figure
7
: Possible outcome of “
No Rejection Test ”


Rejection Test Face Used

In this experiment, test faces include subjects that do not belong to any subject of the training
database
. This subset of test faces is known as the imposer set. Now, without using the
Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

24

imposer set, i.e. using a test face of a subject in the training database, the possible outcome
increases by one: False rejection. This happens when a test face is wrongfully r
ejected as a
subject not belonging to the training database, illustrated in
Figure
8

Correct Recognition
Test Face
(no imposer set)
Classified
Not Classified
Correct
Wrong
Substitution Error
Wrong
False Rejection


Figure
8
: Possible outcome of “
Rejection Test Face Used ”

not using imposer set


By applying the imposer set, there c
an be two statistics of interest. The false access rate is the
percentage of imposer wrongfully recognised as one of the subjects in the training database.
The remaining of the percentage is the true rejection where the imposer is recognised as not
belongi
ng to one of the training database subject, as shown in
Figure
9
.

False Access
Imposer
Classified
Not Classified
Wrong
Correct
True Rejection


Figure
9
: Possible outcome of “
Rejection Test Face Used ”

using imposer set


In order to obtain a balance mix of various rates, the str
ictness of the face recognition system
must be carefully tuned. In a strict system, the false rejection rate will go up because the
system demands a test face to be of a closer match to the training face. At the same time, the
Orthoface for Recognition and Pose
-
Invariant Face Recognition

Previous Work

25

recognition rate, substitutio
n rate, and false access rate will decrease. The disadvantages of a
strict system is the decrease in recognition rate, but at the same time the false success rate
also decreases. The balance that one should aim for is highly dependent on the type of
applic
ation. High security environment would prefer a more strict system, as granting access
to an imposer that is not an employee (false access) is much more serious than the
inconvenience caused by rejecting an employee who actually has access permission.

2.4.2

Prev
ious Results

The system implemented by Baron
[4]

achieved 100% recognition rate using a database with
42 subjects. Using 108 faces from outside the database, the resulting false access rate is 0%.
Brunelli and Poggio’s template
-
based system
[20]

achieved 100% recognition rate on frontal
view of 47 subjects. Cannon
et al.
’s system
[22]

used a database of 50 subjects and reached
96% recognition rate. Turk and Pentland
[99]
’s eigenface system produced a result of 96%
being correctly classified using a database of 16 subjects under varying lighting conditions.
Akamatsu
et al.

[2]

produced 100% recognition rate using dat
abase of 11 subjects. Pentland,
Moghaddam and Starner
[75]

achieved 95% recognition rate with their system using over
3,000 people from the FERET database
[78]

[77]
, pro
duced by the US Army Research Lab,
which is by far the largest research
-
oriented face database ever produced. Otsu and Sato
[59]

plotted the recognition rate versus false access rate. Their system showed that to achieve
recogn
ition rate well over 90%, the false access rate hiked to an unacceptable level.

However, the validity to compare the results quoted so far is questionable. This is simply
because none of them share the same database, and none used a sufficiently large data
base for
concrete statistical conclusion. The FERET database contains more than 7,500 faces from
over 3,000 subjects. The database is constructed to mimic real world application as close as
possible. Different view of the same subject is taken at different

time, up to weeks apart.
Unfortunately the FERET database is not publicly available. There is no rule of thumb on the
appropriate database size. This is dependent on the type of application. For a typical building
access application, a database in the or
der of few tens of people is sufficient. A criminal
recognition system should have a database in the order of thousands.

Orthoface for Recognition and Pose
-
Invariant Face Recognition


26


Part I


Pose
-
Invariant Face Recognition


Chapter
3

Pose
-
Invariant Face Recognition

................................
................

27

3.1

P
OSE
D
IFFERENCE
=

S
EVERE
D
ISTORTION

................................
................................
.

27

3.2

V
ARIOUS
A
PPROACHES

................................
................................
..............................

32

3.3

I
DEAS TO
P
OSE
-
I
NVARIANT
F
ACE
R
ECOGNITION

................................
.......................

34

3.4

P
ROBLEMS AND
S
OLUTIONS FOR
3D

H
EAD
P
OSE
-
I
NVARIANT
F
ACE
R
ECOGNITION

...

39

3.5

3D

T
EXTURE
M
APPING
M
ATHEMATICS

................................
................................
.....

42

3.6

C
ONCLUSION

................................
................................
................................
..............

47


Orthoface

for Recognition and Pose
-
Invariant Face Recognition

Pose
-
Invariant Face Recognition

27

Chapter 3

3

Pose
-
Invariant Face Recognition



This chapter describes ideas
to overcome the pose problem using 3D modelling technique.
The idea is discussed and outlined in detail. The first section discusses why pose difference is
such a serious problem in face recognition. The next section briefly presents the optical flow
metho
d used to derive virtual views of greater rotation from ¾ views, and the multiple views
method to use many views of different pose from the same person as a training face. An
enhancement to the current optical flow method is proposed as part of the new ide
a. Next,
pose
-
invariant face recognition using a 3D head model is presented in detail. This includes
how a 3D head model could be used to provide pose
-
invariant face recognition, proposal of
the pose
-
invariant face recognition model using block diagram, th
e mathematics needed to
manipulate the 3D head model in 3D space, and the mechanism and mathematics involved
for texture mapping. Finally, the last section explains why the research is has not been
pursued further with the main constraint being the lack of

resources.

3.1

Pose Difference = Severe Distortion

Pose is one of the major obstacles in face recognition. Depending on the application, the pose
problem, i.e. the difference of angle of rotation between test face and training faces, may be
the primary contr
ibuting factor for distortion. If test faces are captured from a video stream,
in which the subject could move and turn freely, then pose becomes the primary cause of the
difference obtained between the test and training face. This affects recognition perf
ormance
severely.

On the other hand, in application where both the test and training faces are captured under a
controlled environment, pose difference is minimal. In these applications, the lighting
Orthoface

for Recognition and Pose
-
Invariant Face Recognition

Pose
-
Invariant Face Recognition

28

problem is a more significant factor. The pose problem
then becomes secondary, while other
problems such as facial expression, and hairstyle changes may be limiting factors. For
example, in an application where a subject has to sit upright and look straight into the camera,
or in an application where a test fa
ce is acquired from a passport photo or driving licence
photo, pose is not so much of a problem.

The human head has 6
-
degree of freedom, as described in
Table
1
.

Freedom

Description

2 Translations

Movement of th
e head on the viewing plane, about x
-
axis and
y
-
axis, i.e. moving up/down, left/right.


3 Rotations

Rotate about



x
-
axis: looking up/down,



y
-
axis: turning left/right, and



z
-
axis: tilting left/right.


1 Scaling

Moving back/forth


Table
1
: 6 degrees of freedom


The pose problem is due to the 2 rotations about the
x
-
axis and
y
-
axis.
Figure
10

(image
source
[12]
) shows a face that most people would recognise eas
ily, even though the pose has
changed dramatically. However, the same task of recognising this set of faces in a recognition
system is very difficult if not impossible.




Figure
10
: Pose variation generates very different image

S
hown in
Figure
11

(image source
[8]
) is another set of 15 poses of the same face; extracted
from Beymer and Poggio
.

[8]
. Pose parameter vari
es in

45


about
y
-
axis, and

15


about
x
-
axis. Again, to the human brain, recognising them is effortless. However, a face recognition
system would probable fail on at least half of the images.

Orthoface

for Recognition and Pose
-
Invariant Face Recognition

Pose
-
Invariant Face Recognition

29
















Figure
11
: 15 poses of the same face

3.1.1

Transformation in 2D and 3D Domain

The transformation of a 3D point
p

onto a 2D plane at
p’

along the application of 6 degrees of
freedom is defined as

p’ = s P R p + t


where
R

is a 3x3 matrix that rotates
p

about all thre
e axes. Matrix
P

then projects the rotated
3D point onto 2D plane by simply dropping the
z
-
coordinate.
s

is the scaling factor and
t

is
the translation offset vector.

Transformation with the 2D domain is simpler. There are 4 degrees of freedom: scaling and

rotation about a point, and translation, from point
p

to point
p’

p’ = s R p + t

where
R

is a 2x2 rotation matrix to rotate a point on the plane about a point,
s

and
t

is the
scaling factor and translation offset vector respectively.

Orthoface

for Recognition and Pose
-
Invariant Face Recognition

Pose
-
Invariant Face Recognition

30

3.1.2

Estimating the Differ
ence

In order to investigate why pose difference is such serious threat to face recognition, the pixel
level difference between different poses was measured. Presented next in
Table
2

(image
source
[8]
) are 3 views: frontal, 30


rotation (commonly known as ¾ view), and 45


rotation.
These are intensity
-
normalised images, as such the nearly identical mean intensity of 83.5 is
expected (the difference is due to rounding error). The histogr
am shows that each face’s pixel
intensity is distributed in approximately the same range. The sharp and strong spike to the left
is the near black hair colour, while the spike in the middle represents the rather uniform
background colour.

Table
3

shows the intensity map of pixel difference. Each pixel is calculated by the
computing the absolute difference between two corresponding pixels, i.e.
c
n

=
|
a
n



b
n

|. The
darker the intensity map, the more alike are the two images. The tw
o intensity maps produced
contain many white “spots” of various sizes. This indicates that there are significant
differences between them. It is also observed that
C
2

is brighter than
C
1
, suggesting that
greater rotation angles result in greater difference
s.

Training Face

Histogram

Rotation

Mean Intensity



0 degree

84



30 degree

(approx)

83



45 degree

(approx)

83

Table
2
: Details of faces with different pose

Orthoface

for Recognition and Pose
-
Invariant Face Recognition

Pose
-
Invariant Face Recognition

31

Face 1

Face 2

Difference,

| Face 1


Face 2 |


A
1


B
1


C
1


A
2


B
2


C
1

Table
3
: Difference between frontal view and rotation about vertical axis


From the histogram in
Table
4
, the area below the graph is directly proportional to the mean
intensity. The differ
ence between view 0


and view 45


(image C
2
) is more intense compared
to the difference between view 0


and view 30


(image C
1
). The difference in terms of mean
intensity constitute 22.9% of the mean intensity of frontal view for C
1
. Correspondingly, the
m
ean intensity of C
2

is 31.3% of the frontal view’s.


Difference

Histogram

Mean Intensity

% Diff. from Frontal View


C
1


19

22.9%


C
2


26

31.3%

Table
4
: Details of difference image

Orthoface

for Recognition and Pose
-
Invariant Face Recognition

Pose
-
Invariant Face Recognition

32

The difference of this magnitude poses a hug
e challenge to any recognition algorithm. In fact,
feeding the face images directly to the recognition algorithm will reduce the accuracy
significantly. Most algorithms will fail, or the accuracy degrades to a point where the result is
no longer reliable e
nough for any practical application.

A more efficient way to solve the pose problem is to transform the rotated head back to the
pose similar to that of the training face. This takes advantage of the prior knowledge of
human face geometry and tries to arti
ficially rotate the head. The aim is to reduce the
distortion introduced by pose difference, so that the test face creates less stress on the
recognition algorithm.

3.2

Various Approaches

3.2.1

Multiple View Approach

Multiple view approach is a brute force approach