Face Recognition in Video Using What- and-Where Fusion Neural Network

brasscoffeeAI and Robotics

Nov 17, 2013 (3 years and 9 months ago)

459 views

Université du Québec

École de technologie supérieure

Face Recognition in Video Using What
-
and
-
Where Fusion Neural Network

Mamoudou Barry and Eric Granger

Laboratoire d’imagerie, de vision et d’intelligence
artificielle

École de technologie supérieure

Montreal, Canada

Université du Québec

École de technologie supérieure

2

Overview

1.
Introduction

2.
What
-
and
-
Where fusion neural network

3.
Experimental methodology

4.
Results

5.
Conclusion


Université du Québec

École de technologie supérieure

3

1. Introduction

Challenges of video
-
based face recognition



low quality and resolution of frames.



uncontrolled environments: variation in poses,
orientation, expressions, illumination, occlusion,
etc.


Université du Québec

École de technologie supérieure

4

1. Introduction

General system for face recognition in
video

Université du Québec

École de technologie supérieure

5

1. Introduction

State of the art


1.
Methods based on static images


exploit quality metric, and recognize only high
quality ROIs


2.
Spatiotemporal approaches


track faces in the environment, and recognize
individuals over several samples

Université du Québec

École de technologie supérieure

6

1. Introduction

Objectives



Observe the effectiveness of the What
-
and
-
Where
fusion neural network in video
-
based face recognition



Robust operation in uncontrolled environments



Université du Québec

École de technologie supérieure

7

2. What
-
and
-
Where Fusion Neural Network

(Granger
et al
., 2001)

Division of data
streams




1.
What

data
:

intrinsic
properties of a
face
(to classifier)


2.
Where

data
:

c
ontextual
information


(to tracker)

Tracker
Classifier
1
h
R
1
k
L
1
k
L
1
k
L
1
k
L
Evidence
accumulation
track
#
WHAT
data stream
WHERE
data stream
y
e
y
ab
F
e
1
F
e
h
F
e
R
Université du Québec

École de technologie supérieure

8

Tracker
:
bank of
Kalman

filters


e
stimates the future position





of faces in a scene



Classifier
:
fuzzy ARTMAP


classifies faces detected in a scene


neural network architecture capable



of fast, stable, online, unsupervised



or supervised, incremental learning,

classification and prediction

2. What
-
and
-
Where Fusion Neural Network

Université du Québec

École de technologie supérieure

9

2. What
-
and
-
Where fusion neural network

Evidence accumulation


1
k
L
1
k
L
1
k
L
Evidence
accumulation
F
e
1
F
e
h
F
e
R
e
y
1
h
L
-
2
2
L
-
1
L
Université du Québec

École de technologie supérieure

10

Sequential evidence accumulation

Fusion of responses from classifier and tracker


1.
accumulation rule:



2.

prediction of the recognition system:



2. What
-
and
-
Where Fusion Neural Network



'
e e ab
H H
 
T T y


argmax:1,2,...,
e
e
e e e
Hk
k
K T k L
 
Université du Québec

École de technologie supérieure

11

3. Experimental methodology

Data set

(D. Gorodnichy, CNRC, 2005)

Video
-
based framework for face recognition in video




Task:
recognize the user of a PC




11 individuals
:
2 video sequences per individual,
one dedicated for training and the other for testing



Université du Québec

École de technologie supérieure

12

3. Experimental methodology

Data set




different scenarios
:

pose, expression, orientation,
motion, proximity, resolution and partial occlusion.

Université du Québec

École de technologie supérieure

13

3. Experimental methodology

Protocol for experiments




train:

train fuzzy ARTMAP with
What
data,
using two training strategies


Hold
-
Out Validation (HV)


Particle Swarm Optimization (PSO) to optimize hyper
-
parameters (Granger
et al.,

2007)




test
:
classify
What
data with fuzzy ARTMAP and
track
Where
data with
Kalman

filters

Université du Québec

École de technologie supérieure

14

3. Experimental methodology

Performance measures



accuracy:

average classification error (estimate of


generalization error)



resource requirements:




compression:
average number of training patterns
per category




convergence time:

average number of epochs
required to complete learning.

Université du Québec

École de technologie supérieure

15

4.
Results

Examples of Face Detections


Université du Québec

École de technologie supérieure

16

4.
Results

Average error and compression

vs
.
ROI scaling size (with 100% of training data)

Université du Québec

École de technologie supérieure

17

4. Results

Average error and compression

vs
. training subset size (with a |ROI| =10x10)

Université du Québec

École de technologie supérieure

18

4. Results

Average convergence time



fuzzy ARTMAP with HV: ~
1 epoch



fuzzy ARTMAP with PSO: ~
543 epochs



(60 particles x ~8.9 iterations x 1 epoch)

Université du Québec

École de technologie supérieure

19

4. Results

Average confusion matrix

Université du Québec

École de technologie supérieure

20

Example of prediction errors over time

4. Results

Université du Québec

École de technologie supérieure

21


Effectiveness of the What
-
and
-
Where fusion neural
network in improving the accuracy on complex video data
(about 50% over fuzzy ARTMAP alone, and k
-
NN).



The system is less sensitive to noise: attenuation of fuzzy
ARTMAP poor predictions.



Optimizing the network internal parameters using PSO
learning strategy improves the accuracy of the system.



Fuzzy ARTMAP yields a higher compression than k
-
NN:
suitable for real time and ressource limited applications.

5. Conclusion

Université du Québec

École de technologie supérieure

22

6. Future work




Explore different ARTMAP models to
improve the classification rate.




Explore other representations (features) of face
based on biological vision perception.




Investigate for more robust tracking algorithms
such as Extended Kalman filter, Particle filters,
etc., for non linear tracking.