in Sheep Clothing

spotlessstareSecurity

Nov 29, 2013 (3 years and 4 months ago)

212 views

Biometric Authentication Revisited:
Understanding the Impact of Wolves
in Sheep Clothing

Lucas Ballard, Fabian Monrose, Daniel Lopresti



USENIX Security Symposium, 2006


Presenter: Tao Li


Motivation



To argue that previous assumption that
forgers are minimally motivated and
attacks can only be mounted by hand is
too optimistic and even dangerous


To show that the standard approach of
evaluation significantly overestimates the
security of the handwriting
-
based key
-
generation system

What did the authors do?


In this paper, the author described their
initial steps toward developing evaluation
methodologies for behavior biometrics
that take into account threat models
which have largely been ignored.


Presented a generative attack model
based on concatenative synthesis that can
provide a rapid indication of the security
afforded by the system.

Outline



Background Information


Experimental Design


Human Evaluation


Generative Evaluation


Conclusion



Background Information


Obtaining human input as a system
security measure


Not reproducible by attackers


Eg, passwords


Online attacks

limited to a number of wrong
attemps


Offline attacks

limited only to the resources of the
attackers, time & memory.


When use passwords to derive cryptographic keys,
susceptible to offline attacks

What is biometric?



An alternative form of user input intended
difficultly to be reproduced by attackers


A technique for user to authenticate himself
to a reference monitor based on biometric
characteristics


A means for generating user
-
specific
cryptographic keys. Can it survive offline
attacks?

Not sure


Password hardening: password + biometric

Which is good biometric
features?


Traditional procedure of biometric as an
authenticate paradigm


Sampling an input from user


Extracting an proper set of features


Compare with previously stored templates


Confirm or deny the claimed identity


Good features exhibit


Large inter
-
class variability


Small intra
-
class variability

How to evaluate biometric
systems?


The standard model


Enroll some users by collecting training samples,
eg, handwriting or speech


Test the rate at which users


attempts to
recreate the biometric within a predetermined
tolerance fails
--
False Reject Rate (FRR).


False Accept Rate (FAR): rate to fool the system


Equal Error Rate (EER): where FRR=FAR


The lower EER, the higher the accuracy.

How to evaluate biometric
systems?


Commonly divided into na
ï
ve forgeries &
skilled forgeries


Missing generative models to create
synthetic forgeries


Evaluation is misleading under such weak
security assumptions which underestimates
FAR.





Handwriting Biometrics


As a first step to provide a strong
methodology for evaluate performance,
the authors developed a prototype
toolkit using handwriting dynamics as a
case in point.

Handwriting Biometrics


Offline handwriting


A 2
-
D bitmap, eg, a scan of a paper


only spatial info.


Features extracted from it like bounding boxes and aspect
ratios, stroke densities in a particular region, curvature
measurements.


Online handwriting


Sampling the position of a stylus tip over time on digitizing
tablet or pen computer


temporal and spatial info.


Features includes all from offline and timing and stroke
order information


Experimental Design


Collect data over 2 months analyzing 6 different
forgery styles


Three standard evaluation metrics


Na
ï
ve

not really forgeries, naturally forgeries


Static

created after seeing static rendering of the target user

s
passphrase


Dynamic

using real
-
time rendering


Three more realistic metrics


Na
ï
ve*
--
similar to na
ï
ve, except similar writing style attacker


Trained

forgeries after attackers are trained


Generative

exploit info to algorithmically generate forgery

Data Collection



11,038 handwriting samples collected on
digitized pen tablet computers from 50
users during 3 rounds


Data Collection


Round one: 1 hour, two data sets


First set established a baseline of

typical


user
writing


5 different phrases

2 words oxymoron, ten times
each


Establish biometric templates for authentication


Samples for na
ï
ve and na
ï
ve* forgeries


Second data set, the

generative corpus



To create the generative forgeries


Consists of a set of 65 oxymoron

Data Collection


Round 2, 90 min, 2 weeks later


Same users wrote the 5 phases of round 1
ten times, forge representative samples of
round 1 to create 2 sets of 17 forgeries


Static forgeries

seeing only static
representation


Dynamic forgeries

seeing a real
-
time
rendering of the phrase

Data Collection


Round 3, select nine users and train them


Exhibit a natural tendency of better forgery


3 skilled but untrained users each writing style:
cursive
,

mixed
,
block


Train them: forge 15 samples from their own
writing styles with real
-
time reproduction of the
target sample.

Authentication System



User

s writing sample on the electronic tablet
represented by 3 signals over time


x(t), y(t) for location of the pen


p(t) for pen up or down at time t


Tablet computes a set of n statistical features
(f1,f2,

..fn) over the signals




Authentication System


Based on the variation of feature values in
a passphrase written m times and human
natural variations, generate a n*2 matrix
template {{l1,h1},

..{ln,hn}}.


Compare the user sample with feature
values f1,f2,

,fn with it. Each fj<lj or
fj>hj results in an error.

Feature analysis


Not only the entropy of each feature, but
rather how difficult the feature is to forge


For each feature f


Rf: proportion of times that f was missed by
legitimate users


Af: proportion of times that f was missed by
forgers from round 2


Q(f)=(Af
-
Rf+1)/2


Q(f) more closer to 1, the feature more desirable

Feature analysis


Divide feature set into temporal and
spatial groups and order them based on
Q(f), chose top 40 from each group and
discard any with a FRR greater than
10%, finally got 15 spatial and 21
temporal features.

Human Evaluation

Human Evaluation

Human Evaluation



At seven errors, the
trained mixed, block
and cursive forgers
improved their FAR
by 0.47, 0.34 and
0.18.


This improvements
results from less
than 2 hours


training

Generative Evaluation


Fining and training skilled forgers is time
consuming


To explore the use of an automated approach
using generative models as a supplementary
techniques for evaluating behavioral
biometrics.


To investigate whether an automated
approach, using a limited writing samples
from the target, could match the false accept
rates observed for the trained forgers

Generative Evaluation


The approach to synthesize handwriting is to
assemble a collection of basic units (n
-
grams) that
can be combined in a concatenative fashion to
mimic authentic handwriting.


The basic units are obtained from


General population statistics


Statistics specific to a demographic of the targeted user


Data gathered from the targeted user

Generative Evaluation

Generative Evaluation


Generative signature
using some basic
units from the
database as above


Original signature
shown below


Generative Evaluation



Limit 15 out of the
65 samples of target
user and 15 samples
of same style users


Result: generative
attempt only used
6.67 target users


writing samples and
the average length
of an n
-
gram was
1.64 characters

Conclusion



The authors argued in detail that current
evaluation of security of biometric system is
not accurate, underestimating the threat


To prove this, they analyzed a handwriting
-
based key
-
generation system and show that
the standard approach of evaluation
significantly overestimates its security

Conclusion



Present a generative attack model based
on concatenative synthesis that
automatically produce generative forgeries


The generative approach matches or
exceeds the effectiveness of forgeries
rendered by trained humans


Weakness

&


Where to improve


The handwriting
-
based key
-
generation
system needs lots of people and work.


It remains unclear as to the extent to which
these forgeries would fool human judges,
especially forensic examiners


The generative algorithm needs improvement
like incorporating other parameters in it to
make it more accurate.

Thanks!

Any Questions?