Multi-modal Biometric Verification for Small and Very Small Devices

brasscoffeeAI and Robotics

Nov 17, 2013 (3 years and 6 months ago)

60 views

Secure contracts signed by mobile Phone

IST
-
2002
-
506883

Jacques Koreman, NTNU

Andrew Morris, Spinvox

International Workshop on

Verbal and Nonverbal Communiation Behaviours

Vietri sul Mare, 29
-
31 March 2007

Multi
-
modal Biometric Verification

for Small and Very Small Devices


Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

2

Overview


Background and application: SecurePhone


Multimodal biometric recognition


face, voice, signature: natural


For small devices: PDA


Good performance, short verification time


Security problem


For very small devices: SIM card


Global features to run on slow CPU


Short verification time, acceptable performance


Conclusion


Further improvements by glottal feature fusion?


Relevance for COST2102

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

3

Background: SecurePhone project


Duration: 01.01.2004


30.11.2006


Aim:

a mobile phone with biometric authentication

and e
-
signature support for dealing secure transactions

on the fly




SecurePhone consortium:


Management



Research



Implementation





Exploitation

Financing:

EU 6th framework IST

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

4

SecurePhone

GPRS/UMTS

e
-
signature

manager

SIM
card

PIN

number

video camera

touch

screen

microphone

data

capture

data

capture

data

capture

biometric

preprocessor

biometric

recogniser

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

5

Multimodal biometric recogniser

Haar LL4 wavelets

GMM

GMM

GMM

geometric features

MFCCs

Face

Voice

Signature

reject user

accept user

release private key

“biometric

recogniser”

user profile

world model

Later:

HL4

LL

HL

LH

HH

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

6

PDA: fusion results for PDAtabase





DET curves/result table for 5
-
digit (left), 10
-
digit (middle) and phrase prompts (right)

Modality

5
-
digit

10
-
digit

Phrase

Voice

7.21

3.24

5.54

Face

28.40

27.55

28.33

Signature

8.01

Fusion
(mean)

2.39

1.54

2.30

Fusion
(sd)

0.96

0.83

1.85

Marcos Faundez
-
Zanuy:

Face recognition:

an unsolved problem

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

7

From
small

to
very small

devices: problem


Biometric data cannot be stored or processed on the
PDA, because impostors could steal biometric data.


Therefore storage and processing must be on SIMcard,
which self
-
destroys when tampered with physically.


Instead of a few seconds on the PDA, verification on
the SIMcard takes one hour!


Bottleneck: large number of comparisons in voice and
signature verification (for client model and UBM)


for large number of frames per prompt


for large number of Gaussian mixtures in GMM

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

8


Reducing the frame rate or the number of GMM
mixtures cannot reduce the processing time in a
sufficient order of magnitude


Drastic solution: globalised features (idea taken from
static signature representations)


Means (cf. Long
-
Term Average Spectrum for voice)
and standard deviations per vector parameter across
all frames; also greatly reduced number of Gaussians
required for modelling the vectors


To counteract the effect of averaging, compute
globalised features for subparts of the signal

From
small

to
very small

devices: solution

Marcos Faundez
-
Zanuy:

Open your mind: sometimes a
simple solution can give a
good result“ (and sometimes
you cannot get around it)

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

9

PDA results

Global feat.

Means
only

Means
only

Means
only

Means
only

Means
+ sd

Means
+ sd

Means
+ sd

Means
+ sd

#Gauss.

1

2

4

8

1

2

4

8

Voice

28.20

30.08

30.36

32.08

22.78

22.55

24.41

25.71

Face

32.26

31.78

29.06

29.19

32.26

31.78

29.06

29.19

Signature

37.26

29.28

27.15

26.25

28.34

26.60

21.27

19.21

fused

17.95

17.16

14.83

15.01

13.68

12.35

10.05

10.31

EER (percent) for globalised means (columns 2
-
5) and means plus standard
deviations (columns 6
-
9)


Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

10

SIM card results

EER (percent) for globalised means (columns 2
-
5) and means plus standard
deviations (columns 6
-
9
) for voice and signature divided into two equal subparts

Global feat.

Means
only

Means
only

Means
only

Means
only

Means
+ sd

Means
+ sd

Means
+ sd

Means
+ sd

#Gauss.

1

2

4

8

1

2

4

8

Voice

22.13

21.09

20.87

21.86

20.88

19.72

17.68

18.49

Face

32.26

31.78

29.06

29.19

32.26

31.78

29.06

29.19

Signature

38.29

27.58

22.58

17.86

28.14

22.16

17.59

16.45

Fused

12.89

12.48

10.49

9.32

12.56

10.48

8.28

9.15

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

11

Improvement needed


Performance drop:


PDA EER 2.39% (meanwhile improved to 0.9%)


SIM

EER 10.05% (8.28 for two equal subparts)


Performance can be improved if we do not restrain the
GMM models to be the same across all modalities


Otherwise: Use of complementary features
within

a
modality


Face: simple face geometric variables


Voice
: parameter values of LF model fitted to glottal flow
derivative, obtained from inverse filtering of mic signal

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

12

Interest to this COST action


Interest in glottal flow derivative for speaker recognition
stems from


expected complementarity to MFCC representation of
spectrum


applicability in applications which use very little training
data (as in SecurePhone, for user
-
friendliness)


But can also be useful for other classification problems,
like “the recognition of emotional states,
gesture, speech
and facial expressions,

in anticipation of the
implementation of useful application such as intelligent
avatars and interactive dialog systems”


(quote from aims website of this workshop)

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

13

Last night’s addendum:

speech & gestures


Source signal parameters can also be used together with
other spectral parameters as well as F0, duration,
loudness measures to signal prominence.


In speech, these signals can be used differently across
languages (syllable
-
timed vs. stress
-
timed) and speakers
(German Research Council “rhythm project” led by

Bill Barry, Saarland University, to which NTNU contributes
with Norwegian database recordings and analyses).


Prominence also
signalled

by extent/size as well as
acceleration of gestures.


In how far do gestures and speech signal parameters
correlate? When are they used as complementary/
alternative strategies for signalling prominence?

Int’l Workshop on Verbal and Nonverbal Communication Behaviours,
Vietri sul Mare, 29
-
31 March 2007, slide

14






Thank you for your attention.