Facial Expression Recognition in Static Images

chardfriendlyAI and Robotics

Oct 16, 2013 (4 years and 25 days ago)

109 views

Facial Expression Recognition in Static Images

Ting
-
Yen Wang and Ting
-
You Wang

Advised by Jiun
-
Hung Chen


Abstract

With the advent of the Viola
-
Jones face detection algorithm, the computer vision community has
a pretty good method for detecting faces. How
ever, currently, there is actually very little
published research dealing with detecting facial expressions on still images, as most publications
are focused on detecting facial expressions in video images. This paper is focused on our three
-
week long res
earch project on facial expression detection on still images using different
combinations of image processing methods and machine learners. The two ways we processed
images were using the raw pixels and eigenfaces; two different machine learners, K
-
Nearest

Neighbors and Support Vector Machines, classified the processed images. Our results indicate
that the detection of facial expressions on still images appears to be possible, as using raw
images with
Support Vector Machines turned out
some very promising
results that should lead to
further research and development.


Introduction

There have been many advances in face
detection;

however, the area of expression detection is
still in its early stages. There has been a great deal of work done in this area, and

even
applications of it. For example, Sony cameras have their “Smile Detection” that is supposed to
detect when a person in the image is smiling (
http://www.gadgetbb.com/2008/02/27/sony
-
dsc
-
t300
-
first
-
camera
-
with
-
smile
-
detection/
)
.
Others who have done wo
rk in this field of research
include
CMU (
http://www.pitt.edu/~emotion/research.html
)

and BMW (“
Bimodal Fusion of
Emotional Data in an Automotive Environment”, S. Hoch, F. Althoff, G. McGlaun, G. Rigoll).
Such research has been focused on detecting when a

face
becomes

a particular expression. That
is, it video sequences a face and calculates changes in the image from a “neutral” state to
determine if the face has become another state


note that there are generally seven categories of
expression:
Anger, D
isgust, Fear, Happy, Neutral, Sadness,
and

Surprise
.


We do not always have the luxury of having a sequence of images from a person’s neutral state;
in this paper we discuss our research concerning the feasibility of detecting an expression from a
still
image. We combine various techniques for finding and describing a face, such as Viola
-
Jones, machine learners, and eigenfaces. In the following sections, we will discuss the process of
creating our classifiers, testing the classifiers along with results a
nd conclusions, and we will end
with some future work.


Process

The first step was to detect faces within an image, which we hope to classify. To do this we
leverage the Viola
-
Jones face detection algorithm in OpenCV. The OpenCV repository contains
a Haar

cascade classifier that will find frontal faces. Each of these faces is then saved to a file to
be processed by a classifier.


The next step was to create some classifiers for the various expressions we are going to classify.
We first began with a simpl
e “Smile” and “No smile” binary classification. To begin to see if it
was even remotely possible to classify images, we started with the small class images from CSE
576 project 3. This image “database” contains 17 smiling images and 17 non
-
smiling images.

The next step was to move to a much larger database, provided by CMU
,

with over 8000
different images with 7 different classifications: Happy, Sad, Anger, Fear, Disgust, Surprise, and
Neutral.


The classifiers that we chose to use were K Nearest Neighbor
and Support Vector Machine. For
each classifier, we chose to use two types of features to train our classifiers. The first feature
vector was simply the raw image grayscaled and resized to 25 by 25 (resulting in a feature vector
of length 625). This woul
d provide a baseline for all other future classification methods. The
second feature vector is an expansion of eigenfaces, using the coefficients of those eigenfaces
used to project a face as the feature vector. For this method, we used about 70 images


10
images from each expression


to create the eigenfaces and saved the top 30
.


In order to keep images consistent
for

classification, we used the Viola
-
Jones face finder on the
training images to crop out the face. These faces would generally have the
same bounding box
around the face as other faces found from test images. One issue that we r
a
n into by using this
method is that Viola Jones finds many false positives so we had to manually delete all the false
positives it identified.


After we created al
l the training images, the next step is to actually train the classifiers. In order
to test the classifier, we also wrote a script that for every 10 images, we remove 1 from the
classification set and save it for

cross validation

testing. The classifiers
were created using
libraries from OpenCV. The KNN library that is provided is fairly straightforward and we simply
input each feature vector with a classification number. The SVM library was more complicated
and had many configurations that had to be set
up before it could be run. For the purposes of our
project, we used some basic defaults. This myriad of configurations provides a lot of room to
fine
-
tune the SVM approach to improve the results (for example, we had to change weights for
the different cl
asses to account for classes that are less repr
esented
).


After setting up the classifiers, the final step was to
run tests on some test images


our first test
was to use the images we reserved for cross
-
validation and our second test used images
complete
ly unrelated to any images in the training set
. We will go into further details about the
results we saw

in the
Results

section
, but we did notice that certain classifications were much
more likely than others, such as the neutral face seemed to dominate
others. Therefore, we had to
change certain weights to avoid misclassifying the non
-
neutral classes.


The general workflow is:






Viola
-
Jones
Face Detector

Classifier

Output: “Happy”

呲a楮敤⁷楴栠縱㔰〠
業ages

We eventually found

that with seven classes, it turns out that using ra
w images with SVM
provided the least number of misclassifications while using
eigenfaces

was less accurate. Since
there were still some misclassifications in our best method, we

then focused
on

eliminating
misclassifications by

improving the “Smile”/”No Sm
ile” classification in the time provided. One
of the problems we believe was that neutral faces were overrepresented in the CMU database

by

an order of magnitude more than any other class. Thus, we reduced the training size of the
neutral faces and combi
ned it with angry, fear, disgust, sad, and surprise to form the non
-
smiling
class. This continued to exhibit the same problem as before, classifying nearly everything as
neutral. The problem was that certain classes of faces are too similar to happy, parti
cularly in the
aspect of an open mo
uth with teeth showing. This le
d us to remove
anger

and fear from the non
-
smiling class and this provided better results.


Finally, we looked into the use of contours, since this would allow us to really focus on the curv
e
of the mouth. However, we determined that it
might

not very suitable, since in order to find the
mouth, we need to lower the gradient threshold; however, the lowering of the threshold allowed
a lot of undesired edges to show up in the image including
tee
th
, creases around the face,
shadows,
and hair, making classification fairly difficult. At the other end, using a fairly high
threshold caused us to miss most of the mouth while we still get a lot of noise from the image
(
creases in around the mouth and sh
adows
).




This is not to say that the method will not work, but it would require careful tuning and possibly
more image processing to get good results.


Results


Two
-
classes


Initial results showed that this method of facial expression reco
gnition is promising. For “Smile”
versus “No smile” (no smile being all images that were not ‘Happy’), using KNN on plain
images as feature vectors resulted in
11/18

of the smiles correct and the
122
/130

of the non
-
smiling correct when feeding the

cross
-
va
lidated (remove one for every 10)

data back in. SVM
was even better with only one incorrect classification out of
148

tests. These results show that
the faces in the CMU database can be classified fairly well.


Though the accuracy was very good, the prob
lem with these results was that they were somewhat
biased. After we examined the training data we got from CMU, we discovered that there were
actually a lot “repeated” faces so that for any face, there was a face very similar to it in the
database of the
right classification making it easier for SVM and KNN to classify the images.
For external images, they were almost always classified as non
-
smiling (more specifically,
neutral), which seemed to become the classifier’s “average face” for faces it does not

recognize.
This made some sense in that for any person, their average face is the neutral face and this
expression, or lack thereof, was overrepresented in the database.


To avoid using the “repeated” faces, we created a new test set to classify with f
aces that did not
have any hint of similarity with the images in our database. We then bumped the weights for
smiling (‘Happy’) up to compensate for the overrepresentation of non
-
smiling faces. We also
removed some faces that looked too much like smiling


specifically, we removed the ‘Fear’ and
‘Angry’ faces from the non
-
smiling class. We then retried SVM on our new test data with

the
modified weights and the results were promising
: It was
still
able to classify
96/101

correct
ly
.
Nearly all of the errors w
ere in the fear image
s
, which should be

labeled as non
-
smiling. Since
they were no longer in the database, many moved to smiling.

We also retried KNN on our new
test data (there was no weight adjustment as KNN does not use weights) and it was able to
class
ify
9
2
/101

correct
ly
.

KNN had its errors spread

out throughout the test cases because we
could not weight a certain classification more heavily than others to avoid misclassifications of a
certain expression
s
.


Using eigenface coefficients as feature vecto
rs did not produce better
results
;

rather they were
worse

in general
.

H
owever, it does run much more quickly since we only have 30 feature points
for each image.

But probably due to this same fact, 30 feature points was not enough to capture
an exp
ression
and most expressions were

absorbed into ‘neutral’. To quickly summarize, using
the cross
-
validation technique, we classified
17/18 smiling correct, and 126/130 non
-
smiling
correct. So, if a fac
e is in our database, the eigen
face method is very good at find
ing that
person’s expression again. However, if we move to people not represented in the database,
nearly all results go to non
-
smiling. Only 2/20 images classified smile correctly on our new data
set. This showed that eigenfaces does not capture a gene
ral expression very well.


Three
-
classes


To make the system slightly more complex, we added surprise as a third class for plain images.
We did not perform this for eigenfaces since the results were not very good for just two classes.
Using raw images, SV
M continued to do

fairly good job at classifying.


Raw Images with SVM using 3 Expressions (smile vs. surprise vs. non
-
smile)

Students do not exist in database; Classes Weighted; No Fear/Anger; Using half

of the

neutral

images


Smile

Surprise

Non
-
Smile




Smile

95% (19)


5% (1)




Surprise


94.7% (18)

5.5% (1)




Non
-
Smile

4.8% (3)


95.2% (59)












KNN,

on the other hand,

began
to have

problems differentiating the neutral from surprise
.


Raw Images with KNN using 3 Expressions (smile vs. surpris
e vs. non
-
smile)

Students do not exist in database; No Fear/Anger; Using half neutral


Smile

Surprise

Non
-
Smile




Smile

75% (15)


25% (5)




Surprise


42.1% (8)

57.9% (11)




Non
-
Smile

8.1% (5)


91.9% (57)






Seven
-
classes


Finally, we moved to c
lassifying

all seven categories,

and

we experienced a very similar
situation. When feeding the CMU database back into the system, the KNN search
was
not very
good, hovering around

50% for each category
, classifying most expressions as neutral
:


Raw Images

with KNN using 7 Expressions


Students already exist in database



Anger

Disgust

Fear

Happy

Neutral

Sadness

Surprise

Anger

42.9% (3)




57.1% (4)



Disgust


63.6% (7)



36.4% (4)



Fear



40% (4)

10% (1)

50% (5)



Happy




61.1% (11)

38.9% (7)



N
eutral





100% (71)



Sadness





60% (9)

40% (6)


Surprise





62.5% (10)


37.5% (6)



The problem was that neutral was polluting all the other categories. Perhaps tuning K (currently
10) may produce better results.


For

SVM,
at first,
the neutral fa
ces dominated everything, classifying

nearly

everything as
neutral. With a few modifications to
the weights of the non
-
neutral

classes, we were able to get
nearly 100% accuracy.


Raw Images with SVM using 7 Expressions


Students already exist in databas
e; Classes Weighted



Anger

Disgust

Fear

Happy

Neutral

Sadness

Surprise

Anger

100% (7)







Disgust


100% (11)






Fear



90% (9)

10% (1)




Happy




100% (18)




Neutral





100% (71)



Sadness






100% (15)


Surprise







100% (16)


Keeping

the same weight modifications, w
e then tested the SVM approach
with the

external
faces (those that had

no relation to our training images). Surprisingly, the results were very
promising, as

were able to successfully classify the expression about
90.1
% of

the time

with an
average accuracy across the expressions of 88.6%
!


Raw Images with SVM using 7 Expressions


Students do not exist in database; Classes Weighted



Anger

Disgust

Fear

Happy

Neutral

Sadness

Surprise

Anger

100% (3)







Disgust


71.4%
(5)



28.6% (2)



Fear



90.9% (10)


9.1% (1)



Happy



5% (1)

90% (18)

5% (1)



Neutral


3.45% (1)



93.1% (27)

3.45% (1)


Sadness


16.67% (2)

8.33% (1)



75% (9)


Surprise







100% (19)


When using the new test data, KNN was una
ble to make good c
lassification and produced many

misclassifications
, only 55.4% acc
uracy

and average accuracy of

only 35%
.


Raw Images with KNN using 7 Expressions


Students do not exist in database



Anger

Disgust

Fear

Happy

Neutral

Sadness

Surprise

Anger





100% (3)



Disgust


14.3% (1)


14.3% (1)

71.4% (5)



Fear


9.1% (1)


27.3% (3)

63.6% (7)



Happy




65% (13)

35% (7)



Neutral





96.55% (28)

3.45% (1)


Sadness

8.33% (1)




75% (9)

16.67% (2)


Surprise





21.1% (4)

26.3% (5)

52.6% (10)



We also tried ei
genfaces here
, and saw the same results as with the two classes. For test subjects
with a similar expression of their own in the database, the method performed pretty well, but on
ne
w test subjects, nearly all beca
me neutral.


Eigenfaces Images with SVM u
sing 7 Expressions


Students do not exist in database; Classes Weighted



Anger

Disgust

Fear

Happy

Neutral

Sadness

Surprise

Anger





100% (3)



Disgust


14.3% (1)



85.7% (6)



Fear





100% (11)



Happy




15% (3)

85% (17)



Neutral





100% (29)



Sadness





100% (12)



Surprise





84.2% (16)


5.3% (3)



Though the results were poor, it is possible that w
ith so
m
e modifications to the weights,

altering
some other parameters of SVM
,

and changing the number of eigenfaces
, we may be able to
achi
eve an unbiased classifier

using eigenfaces
.


Experience

Although we were eventually able to obtain promising results with our facial expression
classification algorithms, reaching this point was not an easy one. As students that are
completely new to th
e field of Computer Vision, it was difficult to come up with a research idea
that was based in Computer Vision and could be feasibly done in three weeks. Being able to
discuss our ideas with the Professor and the Teaching Assistant was very helpful during

this
portion of the project. However, we note that the opportunity to embark upon whatever we
wanted had an enjoyable aspect to it; we took this opportunity to let our imaginations attempt to
mix what we had learned in class with what we desired to crea
te.


The next portion of the project


coming up with an implementation plan


magnified our
Computer Vision inexperience, which also made our TA’s (Jiun
-
Hung Chen) help and advice
invaluable! We ended up desiring to work on a project with very little p
ublished research, so we
found ourselves coming up with different solutions from scratch. Due to our time limitations and
lack of experience, the opportunity to go over our ideas and their feasibility with Jiun
-
Hung was
extremely helpful. Jiun
-
Hung helpe
d us to focus our three weeks on a couple of promising paths
instead of wasting time working on a series of inept algorithms.


With some ideas with which to go about working on our project, we were equipped to face the
challenges of implementation, most
significantly, implementing Viola
-
Jones and the machine
learners and constructing the databases. The hardest part with implementing Viola
-
Jones and
the machine learners was the process of learning how to incorporate and use OpenCV
effectively. For examp
le, for the OpenCV machine learners, we had to become competent with
the many complicated functions needed for KNN and SVM despite the lack of good examples
and documentation


in fact, there were actually
no

examples of SVM that we could use. This
need f
or competency at using the OpenCV functions would become especially important during
the testing phase of our project where we would have to tune our expression recognition system,
especially the machine learners. Also, we had to discover and work around t
he ‘quirks’ of
OpenCV; for example, a number of the machine learning functions
declared

in their .h header
files actually were not implemented in OpenCV. Despite these challenges, we really appreciated
the provision of OpenCV as a tool to implement comple
x algorithms like Viola
-
Jones and SVM,
as it allowed us to focus our project on actually trying to solve a problem (classifying facial
expressions) instead of just trying to get an algorithm to work

(such as Viola
-
Jones)
.
Constructing the databases was on
e of the most time consuming portions of this project. Perha
ps
the most time consuming part

was sorting our sample images into the proper expression
classifications (i.e. Anger, Fear, Disgu
st, etc.)


in our case, around 1500
images

for training

and

a

fe
w hundred more

for

testing. We then needed to format the images according to how our
facial recognition system expected the images; the raw image system used JPG images, whereas
the eigenfaces method used TGA images.


After completing the implementation of

our system and creating our image databases, the testing
portion of the project turned out to be both the most depressing a
nd the most exhilarating part

of
the project. Since we

were embarking on a
novel research project, we experienced a very large
numb
er of failed attempts at classifying facial expressions before finally arriving at a point where
we were successfully classifying expressions! After connecting all of our project components
(databases, Viola
-
Jones, machine learners, image processors like
eigenfaces), we used a number
of image testing sets and each time we discovered different ways to both fail our system as well
as ways to tune our system. At times, it seemed as if we would never be able to get our system
to work correctly with some of ou
r more complicated test sets. However, after many long hours
of head scratching and intense labor, finally seeing a working system allowed us to transform our
former grief into amazing exhilaration!


Overall, this project did provide a valuable experien
ce in the field of Computer Vision. We were
challenged at every step, from idea formulation, to implementation, to testing
. H
owever, with
the advising from our TA, Jiun
-
Hung Chen, and spending many hours working through these
challenges, we were able to
grow in many aspects of Computer Vision. In the end, finally
developing a working artifact has really encouraged us to continue looking into the field of
Computer Vision, perhaps even as a research area!


Future Work


The work we did showed promise that

it is possible to take a still image and determine the
expression on the face, however, there is still much work to be done. In the limited time that we
had, we provided a couple of baseline examples that can be extended upon fairly simply given
some tim
e. Some ideas we considered were contours, weighting a face, and possibly a mixture
of these.


Contours, as described above, were briefly looked at and we did not pursue it at this time. By
simply taking contours naively, the result is actually very no
isy (for example, we get contour
lines for individual teeth and creases on the face) and it is very difficult to ascertain the
expression that is being made by the face. Even if we decided to just look at the mouth, it is
actually pretty hard to see where

the mouth is even located! However, looking at contours does
have promise if we are able to identify the contour that represents the mouth area.


Another option that could provide better results is to weight certain parts of the image as more
informative

than others. For example, much of these images are heavily based on what the
mouth is doing. Also, we know that the mouth is on the bottom half of the face (at least for our
images). With these two pieces of information, we can take each face and “enha
nce” the bottom
half area of the face before using it as part of the classifier, or in the extreme case, simply cut off
the top half of the image. This would then focus on just the differences in the mouth. This
would be our next step if we had the time.


Lastly, we only looked at two clas
sification methods, KNN and SVM
,

a

few variations of a
database

and tuning parameters (such as number of eigenfaces, weights, K, etc.)
. There are
possibly other classifiers that can perform this form of classification m
uch better than either
KNN or SVM. On the other hand, SVM has many tunable parts which might enhance SVM’s
ability to classify the images more accurately. Furthermore, our facial database might not have
been best suited for identifying each expression cle
arly.
By adjusting the images we used in our
database to train our classifiers

and tuning some weights
, we were able to get much better results.
This would indicate that with the right choice of images

and parameters
, the system would be
much more robust
to wide variations of expre
s
sions.