Where, Who and What?

brasscoffeeAI and Robotics

Nov 17, 2013 (3 years and 10 months ago)

85 views

Where, Who and What?



@AIT


Intelligent Affective Interaction

ICANN, Sept. 14, Athens, Greece

Aristodemos Pnevmatikakis, John Soldatos and Fotios Talantzis

Athens Information Technology, Autonomic & Grid Computing

Overview


CHIL


AIT SmartLab


Signal Processing for perceptual components


Video Processing


Audio Processing


Services


Middleware


Easing application assembly

Computers in the Human Interaction Loop


EU FP6 Integrated Project (IP
506909)


Coordinators: Universität
Karlsruhe (TH) Fraunhofer
Institute IITB


Duration: 36 months


Total Project costs: Over 24M



Goal: Create environments in
which computers serve humans
who focus on interacting with
other humans as opposed to having
to attend to and being preoccupied
with the machines themselves


Key Research Areas:


Perceptual Technologies


Software Infrastructure


Human
-
Centric Pervasive
Services

AIT SmartLab Equipment


Five fixed cameras (one with fish
-
eye lens)


PTZ camera


NIST 64
-
channel array


4 clusters of 4 inverted T
-
shaped SHURE
microphone clusters


4 tabletop microphones


6 dual Xeon 3 GHz, 2 Gb PCs


Firewire cables & repeaters

AIT SmartLab


Perceptual Components

Detection and Identification System

Recognizer

D
etector

Eye
detector

Head

detector

Tracker

Face
normalizer

Face
recognizer

Frontal
verifier

Confidence
estimator

Weighted
voting

Classifier
confidence

ID

Frontality
confidence

Unconstrained Video Difficulties






































Where and Who are the World Cup Finalists?


and European
Champions?

Tracking

Adaptive
background
Parameters’
adaptation
Adaptive Background Module
Frames
Target
association
Evidence Generation Module
Track
initialization
Target
split
?
Kalman Module
State
Prediction
Measurement
update
Edge
detection
Evidence
extraction
Split
Existing
New
Predicted
tracks
PPM
State information
No
split
New state
Edges
Track
consistency
Track
memory
Track Consistency Module
Targets
Tracking


Smart Spaces

Tracking


3D from Synchronized Cameras

Tracking


Outdoors Surveillance


AIT system 2
nd

in the VACE / NIST
surveillance evaluations

Head Detection

Eye
detector

Head

detector

Tracker

Face
normalizer

Face
recognizer

Frontal
verifier

Confidence
estimator

Weighted
voting


Detection of head by processing the outline of the foreground
belonging to the body

Eye Detection

Eye
detector

Head

detector

Tracker

Face
normalizer

Face
recognizer

Frontal
verifier

Confidence
estimator

Weighted
voting


Vector quantization of colors in head region


Detect candidate eye regions


Based on resemblance to skin, brightness, shape and size


Selection amongst candidates based on face geometry

Face Recognition from Video

Effect of Eye Misalignment: LDA

2
3
4
5
6
7
8
9
10
0
5
10
15
20
25
30
35
Numbero of training images per person
PMC (%)
Ideal eyes
Ideal for training, detected for testing
Detected for training, testing
Effect of Eye Misalignment

0
1
2
3
4
5
6
7
0
10
20
30
40
50
60
RMS eye perturbation (%, relative to eye distance)
PMC (%)
PCA
PCAw/o3
LDA
EBGM
Laplacianfaces
MACE
2D-HMM
Edginess
No preprocessing
Feature vector
Post-decision
5
10
15
20
25
30
PMC (%)
Classifier Fusion

Illumination variations


Pose variations








Classifier fusion addresses the fact that
different classifiers are optimum for different
recognition impairments

Edginess
No preprocessing
Feature vector
Post-decision
0
10
20
30
40
50
60
70
80
PMC (%)


Fusion Across Time, Classifiers and Modalities

Speech of an
individual
collected
over
5
seconds
Faces of an
individual
collected
over
5
seconds
Histogram
equalization
PCA
LDA
Fusion across time
Fusion across time
N images
N images
N images
N IDs and
confidences
,
PMC of
60
%
N IDs and
confidences
,
PMC of
58
%
Fusion across
classifiers
Single ID and
confidence
,
PMC of
31
%
Single ID and
confidence
,
PMC of
36
%
Visual ID and
confidence
,
PMC of
29
%
Fusion across modalities
Audio ID and
confidence
,
PMC of
9
.
7
%
Audio
-
Visual ID
,
PMC of
6
.
8
%
Face Recognition @ CLEAR2006

15 sec training

30 sec training

Testing
duration
(sec)

1

5

10

20

1

5

10

20

AIT

50.57

29.68

23.18

20.22

47.31

31.14

26.64

24.72

UKA

46.82

33.58

28.03

23.03

40.13

23.11

20.42

16.29

UPC

79.77

78.59

77.51

76.40

80.42

77.13

74.39

73.03

New AIT

45.35

27.01

17.65

15.73

43.72

17.76

13.49

7.86

Speaker ID @ CLEAR2006

15 sec training

30 sec training

Testing
duration
(sec)

1

5

10

20

1

5

10

20

AIT

26.92

9.73

7.96

4.49

15.17

2.68

1.73

0.56

CMU

23.65

7.79

7.27

3.93

14.36

2.19

1.38

0.00

LIMSI

51.71

10.95

6.57

3.37

38.83

5.84

2.08

0.00

UPC

24.96

10.71

10.73

11.80

15.99

2.92

3.81

2.81

AIT
IS2006

25.69

5.60

4.50

2.25

15.01

2.19

2.42

0.0

Audiovisual ID @ CLEAR2006

15 sec training

30 sec training

Testing
duration
(sec)

1

5

10

20

1

5

10

20

AIT

23.65

6.81

6.57

2.81

13.70

2.19

1.73

0.56

UIUC
primary

17.61

2.68

1.73

0.56

13.21

2.43

1.38

0.56

UIUC
contrast

20.55

5.60

3.81

2.25

15.99

3.41

2.42

1.12

UKA /
CMU

43.07

29.20

23.88

20.22

35.73

19.71

16.61

12.36

UPC

23.16

8.03

5.88

3.93

13.38

2.92

2.08

1.12

Audiovisual Tracker


Information
-
theoretic
speaker localization
from mic. array


Accurate azimuth,
approximate depth, no
elevation


Moderate targeting of
speaker’s face using a
PTZ camera


Refine targeting by
visual face detection

Services

Memory Jog


Memory Jog:


Context
-
Aware Human
-
Centric Assistant for
meetings, lectures, presentations


Proactive, Reactive Assistance and Information
Retrieval


Features
-
Functionalities


Sophisticated Situation Modeling / Tracking


Essentially Non
-
obtrusive Operation


Intelligent Meeting Recording Functionality


GUI runs also on PDA


Full Compliance to CHIL Architecture


Integration actuating devices (Targeted Audio,
Projectors)


Context as Network of Situations

Transition

Elements & Components

NIL




Table Watcher (people in table area), SAD

S1




White
-
Board Watcher (presenter in speaker area),

Face ID, Speaker ID

S2




Speaker ID (speaker ID ≠ presenter ID), Speaker
Tracking

S3




Face Detection (presenter in speaker area),

Face ID, Speaker ID

S2




White
-
Board Watcher (no face in speaker area for N
seconds), Table Watcher (all participants in meeting
table)

S4




Table Watcher (nobody in table area)

What Happened While I was Away?

Middleware

Virtualized Sensor Access

CHIL Compliant Perceptual Components


Several sites develop site, room, configuration
specific Perceptual Components for CHIL


Provide common abstractions in the input and
output of the PC (black box)


Facilitate Component Exchange Across Sites
& Vendors


Standardization commenced for Body Trackers


Continues to Face ID Components

Architecture for Body Tracker Exchange

Information
retrieval

Transparent
connection to
sensor output

Common
control API
(CHILiX)

Services
complying to
current API

Non
-
CHIL
Compliant
Body Tracker

Sensor
abstraction

Thank you!

Questions?