Speaker Identification, Speaker Verification

dashingincestuousSécurité

23 févr. 2014 (il y a 3 années et 5 mois)

60 vue(s)



2002

VIU Oct 2007 : Speaker Recognition

1

F. Schiel

Florian Schiel

Venice International University

Oct 2007

Speaker Recognition =

Speaker Identification, Speaker Verification



2002

VIU Oct 2007 : Speaker Recognition

2

F. Schiel

Agenda


See the Context


Speech Recognition vs. Speaker Recognition


Speaker Identification vs. Speaker Verification


Speaker Recognition: Basics


Speaker Verification using HMM


Discussion


and then ...



2002

VIU Oct 2007 : Speaker Recognition

3

F. Schiel

General Approach to Authentification


Three general ways to perform authentification:

-

proof of knowledge (e.g. password),

-

proof of possession (e.g. chip card),

-

proof of property (biometrics)
,


and their combinations


Biometrics:
physiological based

vs.
behavioural based


Biometrical features:

Fingerprint, iris scan, facial scan, hand geometry,

signature, voice


from U. Türk 2007



2002

VIU Oct 2007 : Speaker Recognition

4

F. Schiel

Biometric Features: General Requirements


universal: can be found in any user


unique: even for identical twins


measurable: does not require human evaluation


robust to short
-
term and long
-
term variability


low dimensionality


robust to changing environment


robust to impersonation



from U. Türk 2007

++

+

++

+

o

oo

+




2002

VIU Oct 2007 : Speaker Recognition

5

F. Schiel

Taxonomie
Speech Processing

Natural Language Processing

(NLP)

Spoken Language Processing

(SLP)

Lexica

Syntax

Parsing

Spellers

Search /

Indexing

Semantics

Terminology

Thesaurus

Dialogue systems

Speech

Identification

Speech Synthesis

Speaker

recognition

Speech

Recognition


Forensics



2002

VIU Oct 2007 : Speaker Recognition

6

F. Schiel

Speech Recognition


"Decode the spoken content from
the acoustic signal"

Speaker Recognition


"Determine the identity of a
speaker from acoustic signal"

ASR

"Sehr geehrter .."

SI/SV

Accepted/

Rejected

ID

Speech

Models

Speaker

Characteristics

Claimed

Identity



2002

VIU Oct 2007 : Speaker Recognition

7

F. Schiel

Speaker Verification


Authentification according to

claimed identity


Result is binary:

"accept" / "reject"


Scaling: effort independent

of number of participants


Accuracy: dependent of size

of enrolment data

Speaker Identification


Identification from limited number

of participants


Result is speaker identity


Scaling: effort increases linear

with number of participants


Accuracy: dependent of

+ size of enrolment data

+ number of participants

reject

Identität falsch

accept

Identität ok

correct

identity ok

accept

Identität ok

false

reject

reject

reject

Identität falsch

correct

accept

false

accept

identity wrong

100

N

Correctness

Speaker Recognition



2002

VIU Oct 2007 : Speaker Recognition

8

F. Schiel


Applications:


Access Control


Verification of identity

via the phone


Automatic Teller Machines


Password resetting


Banking: Identity for new
accounts etc.


Protection against theft (cars...)

Speaker Verification


Applications:


Forensics


Police Work


Automatic User Settings


Speaker Classification:

Advertising

Speaker Identification



2002

VIU Oct 2007 : Speaker Recognition

9

F. Schiel

Speaker Verification: Doddington's Zoo (1)

User = registered speaker, Impostor = non
-
registered speaker



Goats : users that are often rejected wrongly


(increasing 'false reject' errors)


Lambs : users that are easily imitated


(increasing 'false accept' errors)


Sheep : users that 'behave' (not goats and not lambs)


Wolfs : particulary successful impostors


(increasing 'false accept' errors)


from Doddington 1998



2002

VIU Oct 2007 : Speaker Recognition

10

F. Schiel

Speaker Verification: Doddington's Zoo (2)

Wolfs may perform zero
-
effort or active impostor attempts to
break into a SV system.


Problem:

Speaker verification data bases do not contain active impostor

attempts data of wolfs

-
> most technical evaluations are based on non
-
realistic data!



2002

VIU Oct 2007 : Speaker Recognition

11

F. Schiel

Technical Speech Processing

Feature

detection

Dekoder

High

pass

Analog Signal

0

t

Digital Signal

t

Vectors

m
1

.

.

m
N

m
1

.

.

m
N

10

20

...



"Call Richard!"



"Radio off!"



"216"

Symbols

Symbols:



Text



Action



Semantics

A / D

Anti
-

Aliasing

Filter



2002

VIU Oct 2007 : Speaker Recognition

12

F. Schiel

Verification

"Accept"

"Reject"

Feature

detection

High

pass

A / D

Anti
-

Aliasing

Filter

Claimed

identity

PIN

Finger

print

ASR

Select

ID

Speaker Models

Speaker Verifikation: Basics (1)



2002

VIU Oct 2007 : Speaker Recognition

13

F. Schiel

Verification

Feature

detection

High

pass

Speaker Verification: Basics (2)

f

f
sam
/2

Analog low pass filter

to avoid anti
-
aliasing

effects

+ Analog
-
Digital

Converter

„Accept”

„Reject”

A / D

Anti
-

Aliasing

Filter

Anti
-

aliasing

filter

A / D



2002

VIU Oct 2007 : Speaker Recognition

14

F. Schiel

Speaker Verification: Basics (3)

Features
:



speaker specific



robust against noise



partly long term

0

Extraction of

Speaker

characteristics

m
1

...

m
N

m
1

...

m
N

10

20

m
1

...

m
N

m
1

...

m
N

30

40

...

Window

25 ms

Merkmals
-

berechnung

Verification

High

pass

A / D

Anti
-

Aliasing

Filter

"Accept"

"Reject"

A / D

Anti
-

Aliasing

Filter

Feature

detection



2002

VIU Oct 2007 : Speaker Recognition

15

F. Schiel

Feature

detection

High

pass

A / D

Anti
-

Aliasing

Filter

Verification

"Accept"

"Reject"

p(S | ID)
<

threshold

vector sequence

S

m
1

.

.

m
N

m
1

.

.

m
N

10

20

...

decision

p(S | ID)
>

threshold


"Accept"



"Reject"

speaker model

of claimed ID

Speaker Verification: Basics (4)



2002

VIU Oct 2007 : Speaker Recognition

16

F. Schiel

Speaker Verification: Tuning


Error types highly dependent on threshold


high security
-
> false accept low


false reject high

user friendly
-
> false reject low


false accept high

Equal

Error

Rate

false

accept

false

reject


Both errors increase by:

-

channel disturbance

-

crosstalk

-

noise

-

room acoustics

threshold


Solution:

-

multiple enrolments

-

adaptive learning



2002

VIU Oct 2007 : Speaker Recognition

17

F. Schiel

Speaker Verification: Score Normalisation (1)

Problem:

How to set the optimal threshold?


HMMs generate a priori probabilities:

O : observation = sequence of features

l : speaker model

Bayes:


but is dependent on various factors

P
l

O
=
p
O

l
P
l
P
O
p
O

l
P
O


2002

VIU Oct 2007 : Speaker Recognition

18

F. Schiel

Speaker Verification: Score Normalisation (2)

Solution: Bayesian Decision Rule:




with Bayes and log to both

sides this leads to:





P
l

O
=
p
O

l
P
l
P
O
C
FR
P
l

O
C
FA
P
l

O
log
p
O

l

log
p
O

l
log
C
FA
P
l
C
FR
P
l
=
threshold
C
FR
, C
FA
: cost functions



2002

VIU Oct 2007 : Speaker Recognition

19

F. Schiel

Speaker Verification: Score Normalisation (3)

Often assumed: costs are equal and speakers occur

equally distributed






is estimated using a
world or
cohort model


world model : speaker model trained to all
speakers

cohort model : speaker model trained to a
group of


most competing models
(wolfs)





log
p
O

l

log
p
O

l
log
N

1
N
:
number
of
users

impostors
p
O

l


2002

VIU Oct 2007 : Speaker Recognition

20

F. Schiel

Speaker Verification: Enrolment

Method

Fixed, pre
-
specified sentence:

e.g. "My voice is my password"

Fixed, selectable sentence:

e.g. maiden name of grandmother

Changing number triplets:

e.g. fifteen, thirtynine, seventythree

System generates a new sentence

for each verification

Enrolment

Remarks

Speak sentence

3
-

5 times

Speak sentence

3


5 times

Speak each number

3


5 times

Sentence may be

intercepted and played back

Additional security

by content

High security by many

possible combinations

Elaborate enrolment,

high processing effort,

very high security

Speak each phoneme

3


5 times



2002

VIU Oct 2007 : Speaker Recognition

21

F. Schiel

Speaker Verification: HMM types

Method

pre
-
specified sentence

recombination of segments

taken from enrolment data

modeling without time structure

Model

Security



䅣捵牡捹





汩湥慲






piecewise linear






ergodic




o



2002

VIU Oct 2007 : Speaker Recognition

22

F. Schiel

Speaker Verification: Features (1)

Variable signal characteristics


often required: telephone band 300


3300 Hz

(higher resonances cut off)


changing channel characteristics, caused by

transmission line, handset, distance to mouth


static and intermittent noise


user: health, intoxication, fatigue



2002

VIU Oct 2007 : Speaker Recognition

23

F. Schiel

Speaker Verification: Features (2)

Candidates determined by physiology:


fundamental frequency, average


wave form of vocal folds, jimmer, jitter, irregularities


formants: average and dynamics


places of articulation: fricatives, plosives


nasal cavity resonance


sub
-
glottal resonance




2002

VIU Oct 2007 : Speaker Recognition

24

F. Schiel

Speaker Verification: Features (3)

Candidates determined by behaviour:


voiced/unvoice ratio


fundamental frequency, dynamics


syllable rate, pause/speech ratio


dialectal features: vowel quality

Candidates determined by speech technology:


Linear Predictor Coefficients (LPC)


filter bank, Bark filter bank, Mel filter bank


Cepstrum, Mel
-
Cepstrum


(derivations with respect to time)



2002

VIU Oct 2007 : Speaker Recognition

25

F. Schiel

Sprecherverifikation: Road Map

1990

Heute

2010

2020

Zugangskontrollen

Sicherheitsbereich

Authentifizierung

über Telefon

Geräte "erkennen"

ihren Benutzer

Sprecherprofil

auf Chipkarten

Zugangskontrolle für

Tastaturlose PDAs

Authentifizierung

im Hintergrund

Öffentliche

Sprecherprofile

Automatischer Alkohol
-

test im Fahrzeug



2002

VIU Oct 2007 : Speaker Recognition

26

F. Schiel

Thank You!