in Sheep Clothing


29 nov. 2013 (il y a 5 années et 5 mois)

417 vue(s)

Biometric Authentication Revisited:
Understanding the Impact of Wolves
in Sheep Clothing

Lucas Ballard, Fabian Monrose, Daniel Lopresti

USENIX Security Symposium, 2006

Presenter: Tao Li


To argue that previous assumption that
forgers are minimally motivated and
attacks can only be mounted by hand is
too optimistic and even dangerous

To show that the standard approach of
evaluation significantly overestimates the
security of the handwriting
based key
generation system

What did the authors do?

In this paper, the author described their
initial steps toward developing evaluation
methodologies for behavior biometrics
that take into account threat models
which have largely been ignored.

Presented a generative attack model
based on concatenative synthesis that can
provide a rapid indication of the security
afforded by the system.


Background Information

Experimental Design

Human Evaluation

Generative Evaluation


Background Information

Obtaining human input as a system
security measure

Not reproducible by attackers

Eg, passwords

Online attacks

limited to a number of wrong

Offline attacks

limited only to the resources of the
attackers, time & memory.

When use passwords to derive cryptographic keys,
susceptible to offline attacks

What is biometric?

An alternative form of user input intended
difficultly to be reproduced by attackers

A technique for user to authenticate himself
to a reference monitor based on biometric

A means for generating user
cryptographic keys. Can it survive offline

Not sure

Password hardening: password + biometric

Which is good biometric

Traditional procedure of biometric as an
authenticate paradigm

Sampling an input from user

Extracting an proper set of features

Compare with previously stored templates

Confirm or deny the claimed identity

Good features exhibit

Large inter
class variability

Small intra
class variability

How to evaluate biometric

The standard model

Enroll some users by collecting training samples,
eg, handwriting or speech

Test the rate at which users

attempts to
recreate the biometric within a predetermined
tolerance fails
False Reject Rate (FRR).

False Accept Rate (FAR): rate to fool the system

Equal Error Rate (EER): where FRR=FAR

The lower EER, the higher the accuracy.

How to evaluate biometric

Commonly divided into na
ve forgeries &
skilled forgeries

Missing generative models to create
synthetic forgeries

Evaluation is misleading under such weak
security assumptions which underestimates

Handwriting Biometrics

As a first step to provide a strong
methodology for evaluate performance,
the authors developed a prototype
toolkit using handwriting dynamics as a
case in point.

Handwriting Biometrics

Offline handwriting

A 2
D bitmap, eg, a scan of a paper

only spatial info.

Features extracted from it like bounding boxes and aspect
ratios, stroke densities in a particular region, curvature

Online handwriting

Sampling the position of a stylus tip over time on digitizing
tablet or pen computer

temporal and spatial info.

Features includes all from offline and timing and stroke
order information

Experimental Design

Collect data over 2 months analyzing 6 different
forgery styles

Three standard evaluation metrics


not really forgeries, naturally forgeries


created after seeing static rendering of the target user



using real
time rendering

Three more realistic metrics

similar to na
ve, except similar writing style attacker


forgeries after attackers are trained


exploit info to algorithmically generate forgery

Data Collection

11,038 handwriting samples collected on
digitized pen tablet computers from 50
users during 3 rounds

Data Collection

Round one: 1 hour, two data sets

First set established a baseline of



5 different phrases

2 words oxymoron, ten times

Establish biometric templates for authentication

Samples for na
ve and na
ve* forgeries

Second data set, the

generative corpus

To create the generative forgeries

Consists of a set of 65 oxymoron

Data Collection

Round 2, 90 min, 2 weeks later

Same users wrote the 5 phases of round 1
ten times, forge representative samples of
round 1 to create 2 sets of 17 forgeries

Static forgeries

seeing only static

Dynamic forgeries

seeing a real
rendering of the phrase

Data Collection

Round 3, select nine users and train them

Exhibit a natural tendency of better forgery

3 skilled but untrained users each writing style:


Train them: forge 15 samples from their own
writing styles with real
time reproduction of the
target sample.

Authentication System


s writing sample on the electronic tablet
represented by 3 signals over time

x(t), y(t) for location of the pen

p(t) for pen up or down at time t

Tablet computes a set of n statistical features

..fn) over the signals

Authentication System

Based on the variation of feature values in
a passphrase written m times and human
natural variations, generate a n*2 matrix
template {{l1,h1},


Compare the user sample with feature
values f1,f2,

,fn with it. Each fj<lj or
fj>hj results in an error.

Feature analysis

Not only the entropy of each feature, but
rather how difficult the feature is to forge

For each feature f

Rf: proportion of times that f was missed by
legitimate users

Af: proportion of times that f was missed by
forgers from round 2


Q(f) more closer to 1, the feature more desirable

Feature analysis

Divide feature set into temporal and
spatial groups and order them based on
Q(f), chose top 40 from each group and
discard any with a FRR greater than
10%, finally got 15 spatial and 21
temporal features.

Human Evaluation

Human Evaluation

Human Evaluation

At seven errors, the
trained mixed, block
and cursive forgers
improved their FAR
by 0.47, 0.34 and

This improvements
results from less
than 2 hours


Generative Evaluation

Fining and training skilled forgers is time

To explore the use of an automated approach
using generative models as a supplementary
techniques for evaluating behavioral

To investigate whether an automated
approach, using a limited writing samples
from the target, could match the false accept
rates observed for the trained forgers

Generative Evaluation

The approach to synthesize handwriting is to
assemble a collection of basic units (n
grams) that
can be combined in a concatenative fashion to
mimic authentic handwriting.

The basic units are obtained from

General population statistics

Statistics specific to a demographic of the targeted user

Data gathered from the targeted user

Generative Evaluation

Generative Evaluation

Generative signature
using some basic
units from the
database as above

Original signature
shown below

Generative Evaluation

Limit 15 out of the
65 samples of target
user and 15 samples
of same style users

Result: generative
attempt only used
6.67 target users

writing samples and
the average length
of an n
gram was
1.64 characters


The authors argued in detail that current
evaluation of security of biometric system is
not accurate, underestimating the threat

To prove this, they analyzed a handwriting
based key
generation system and show that
the standard approach of evaluation
significantly overestimates its security


Present a generative attack model based
on concatenative synthesis that
automatically produce generative forgeries

The generative approach matches or
exceeds the effectiveness of forgeries
rendered by trained humans



Where to improve

The handwriting
based key
system needs lots of people and work.

It remains unclear as to the extent to which
these forgeries would fool human judges,
especially forensic examiners

The generative algorithm needs improvement
like incorporating other parameters in it to
make it more accurate.


Any Questions?