Biometric Authentication of Computer Users

TSIT01,TSIT02 Computer Security

Jonathan Fors

jonathan.fors@liu.se

December 2,2013

In this assignment you will study the basic principles of biometric user authentication.

This is done via the example of using digitally sampled handwritten signatures.The main

goal of the assignment is to give you a feeling of the basic possibilities and problems with

biometric authentication.This means that at the end of the study you should answer some

questions,using your experiences during the lab assignment and your general knowledge

as a basis for your answers.

1 Preparatory Work

Before you come to the computer lab room at your pre-booked time,you should read

the sections about user authentication in the course literature,especially,of course,the

subsection on biometrics.You should also study these instructions thoroughly,so that you

have a plan for what you should do during the two booked hours with access to assistance

in the lab room.

2 Background

Figure 1 depicts a general biometric user authentication system.When a user tries to be

accepted under a given identity,the user must ﬁrst present the right type of signal to the

biometric reader,for example press the ﬁngertip against a ﬁngerprint reader.The size of

the direct output signal from a biometric reader normally exceeds by several magnitudes

what is needed for the purpose and what is convenient and eﬃcient to handle.Thus it is

Figure 1:General biometric system

1

reduced to some crucial parameters,whose values should be unique for every prospective

user.But a biometric signal always has some variance for a given user,even measurements

of such relatively static properties as a ﬁngerprint.This is because even though the

ﬁngerprint itself may be exactly the same,the reader captures a picture,where picture

contrast thresholds,dirt on the ﬁnger or reader,stretching of the skin against the reader

etc.create slightly diﬀerent images every time.So a user is not characterised by some

exact biometric parameter values,but by allowed ranges for these values.The decision

function fetches the stored values for the given identity in the database,and compares

these to the processed result for the currently captured signal.If they are similar enough,

the user is accepted,and otherwise the user is rejected.

From this short summary it is clear that it is crucial for the quality of the user

authentication that the processing and parameter extraction are designed with great care.

For ﬁngerprints,the features to extract have been agreed upon since long.The remaining

problem is the image processing and how to handle diﬀerences between two readings for

the same ﬁnger within limits such as available processing capacity etc.While that is not

an easy problem,the situation in this lab assignment is even more complicated.

In this assignment you are to study computer samples of handwritten signatures.There

is no agreed upon set of parameters for this kind of biometric signal.The software used

in the assignment implements a very crude ﬁrst step in testing such signals for similarity.

Your task is to use it hands on and have this as a background for a short discussion on

biometric principles.

2.1 System Principles

The system samples the x and y coordinate positions for a pen moved over a special

surface.It also captures the pressure p,including if the pen is lifted from the surface.

This is then treated as three separate signals in the time domain.

The ﬁrst step in the processing is a frequency analysis.This means that we try to

extract how the signal was written,not what static pattern it left on a piece of paper.

What counts is what speeds that were used in all three directions.Which then means that

careful tracing of a written signature will probably yield values that are totally diﬀerent

from those captured when the original signature was written.But also we could get very

similar values for signatures that on paper look diﬀerent.Of course we do not have cyclic

frequency signals here,but the processing measures how much of the total energy of the

signal that is spent in diﬀerent frequency regions.This measurement is called Power

Spectral Density,or PSD for short.

You use a command to create the PSD for all three directions.The command then

reduces the amount of data into a set that is reasonable to handle and compare.For

simplicity,this is done by dividing the total frequency range into 11 regions,and then

calculating the mean value for each region.Thus we now have 33 discrete values for each

signature to compare and study.But we have to decide how they are to be compared,so

that we capture what is really characteristic for each user.

The ﬁnal step in the input processing is a Principal Component Analysis,PCA.This

method is a very general one in signal processing,and there is a short description of it

in appendix A.For the current purpose (the lab assignment),it is enough to know that

what PCA does,is that it uses a set of signals,and there it seeks out the directions in

2

the multidimensional space along which the signals vary most.In our case we have 33

points captured each time,which can then be regarded as points in a 33-dimensional space.

The PCA step reduces these to three new values,which try to capture the “coordinate

values” which show the greatest variation among users in a transformation of the original

33-dimensional room.As is pointed out in the appendix,this is not necessarily an ideal

signal processing for the purpose of user authentication,but it does give us a reasonable

set of values to use in the ﬁnal study in the assignment.

3 The Computer Laboration

The biometric signatures are written on writing tablets that will be provied by the lab

assistant.These are connected to the lab computers via USB.On the computer,Matlab

is used to perform the computations.

The tablets require that an older of Matlab is used.This can be started by opening

the Start Menu,All Programs,Server-Based Software,Matlab,Matlab 8.0.This starts a

32-bit version of Matlab.Note that the window title for Matlab will be 2012b.This is

exactly the 8.0 version.

Starting Matlab does take some time,especially when everybody in the lab session

does it at the same time.When the program is started and the console is active,initialize

the course by

initcourse tsit01

or

initcourse tsit02

3.1 Capturing signatures

First each pair should capture a set of digitalized signatures with the help of speciﬁc

tablets and pens connected to the computer.Each person from each pair should write

at least six signatures.These signatures are captured in the computer by calling the

Matlab function acquire_signature,which will return the data for each signature as a

matrix with the name you choose,here s1.(The ﬁrst time you call this function,you get

a warning that future versions of Matlab won’t support it.Just ignore this.)

s1 = acquire_signature;

When acquire_signature has been called,a new window appears and there you see the

pen movement when the pen is positioned on the signing pad.If you lift the pen,the

recording is paused until you place the pen on the tablet again.When the whole signature

has been written,the data collection is ended by typing Ctrl+S or by clicking on the File

menu and choosing Save and return.The other option in the ﬁle menu is Clear,which

does not work well with the current Matlab version.You will get a crash and the window

closes.

Unfortunately,some of the pens and tablets are better than others.For example

sometimes the tablet doesn’t start recording a signal directly or not until the second

attempt to write.Always wait for the little black,square dot on the tablet to appear

3

before you start writing!Don’t write too fast!And make sure you don’t press the switch

on the pen,because this turns the pen into a device controlling the cursor.Also a section

of the signature may be omitted when the slant of the pen was not right etc.Whenever

you feel directly that the recorded signature is useless,don’t use Clear (see above),but

just type Ctrl+N to get a new window.You can also ﬁnish the attempt as if it had

succeeded,and then start anew in the command window with the same variable as the

recipient of the result.

It is important that you capture and save at least six diﬀerent fully captured signatures

from each person.A good idea is to give them the same name for each person but diﬀerent

numbers (like s1,s2,...,s6 and t1,t2,...,t6).In order to check if a signature that

seemed alright really can be used,use the command plotsign to see how a saved signature

looks like.If things get too bad,ask the lab assistant for a new pen and tablet.Have

patience with this step!

3.2 Statistical Analysis

Now you should have two sets of at least six Matlab matrices,each containing the sample

points from one signature.Each matrix has three columns,one for the x values,x(t),one

for the y values,y(t),and one for the pressure,p(t).The number of rows depends on

how you wrote.The tablet registers 125 values per second,and a typical signature takes

3-5 seconds.Thus you should have some hundred rows,but probably not as much as a

thousand.The ﬁrst step in the next part is the compression of your data.You should use

the commands

Fs=features(s1,...,s6);

Ft=features(t1,...,t6);

This command calculates the three PDSs for each signature from one user plus the

mean values,and stores them in the matrix Fs or Ft,which contains 33 values for each

signature,a total of 198 values.You now have one compressed matrix for each user in the

pair.

Then it is time for the principal component analysis.The goal is to calculate a

minimal set of characteristic values,which will enable the decision process to distinguish

the signatures made by one person from signatures done by any other person.

If we did a PCA for signals from just one person,we would get an indication of what

is consistent and what is varying for that person’s signatures.But that tells us nothing

about how this compares to other persons’ signatures.Thus we need an analysis for a

larger user population.In this assignment you ﬁrst compare your two sets of signatures

to each other.

First you combine your respective signatures with the command

Fpair=[Fs;Ft];

Important:Note the semicolon!The next step is to perform the PCA,which is done with

R = pca(Fpair);

R is now a matrix,which you can use to transform you signatures with the commands

Fspca = Fs ∗ R;

Ftpca = Ft ∗ R;

4

Speciﬁer Marker

o Circle

+ Plus sign

* Asterisk

.Point

x Cross

s Square

d Diamond

ˆ Upward-pointing triangle

v Downward-pointing triangle

> Right-pointing triangle

< Left-pointing triangle

p Pentagram

h Hexagram

Table 1:Possible plot markers and their representation

Speciﬁer Color

y yellow

m magenta

c cyan

r red

g green

b blue

w white

k black

Table 2:Specifying plot colors

What have we so far achieved?We ﬁrst found a transformation of our total signature

data,which should transform the data into directions according to their variance.The

result for the transformation of each of your signatures is stored in Fspca.and Ftpca.

Comparison It is now time to see if these transformations actually create data sets,which

are speciﬁc for each user.To study the result in graphical form,you should display the

three largest principal components of the signatures for each user with a user speciﬁc sign

in a plot.Start with one,using for example o to show its points in the graph

d3plot (Fspca,’o’);

The second argument controls the appearance of the plot markers.A list of available

markers are given in table 1.It is also possible to plot using diﬀerent colors,as given in

table 2.For more information,see the Matlab command

help plot

You get the best result if you get both plots in the same graph,which is achieved with

the command

hold on;

5

To turn this oﬀ to get separate presentations you just write

hold oﬀ;

If the graph gets diﬃcult to interpret,you can use the

view

and

axis

commands in the Matlab window to get better viewing angles.If you need help use the

built-in documentation commands

help view

help axis

An alternate way of showing the graphs is to use the Rotate 3d tool which can be found

as a round arrow just above the plot window.

Save this plot and include it in your report.It is important that you save the ﬁgure

with the correct ﬁle format,PNG.Do not use JPEG as this strongly distorts the data!

3.3 Working With More Users

The results so far have only tried to distinguish between signatures for two users.Now

you should include more.The larger set of users is simulated by a prepared ﬁle,which

contains registered signature matrices from ﬁve diﬀerent persons.This ﬁle is called Ftot.

Your signature results are combined with these in a total result through the commands

load Ftot;

or just

Fpopulation=[Ftot;Fpair];

The next step is to use PCA for the whole population,which is done with the command

R = pca(Fpopulation);

R is now a new matrix,which you can use to transform the stored signatures.You

have ﬁve ﬁles Fnw,Fka,Fmo,Ffc and Fks.For each of these you should now repeat the

following:

load Fxx;

Fxxpca = Fxx ∗ R;

where xx of course is one of the letter pairs in the ﬁve ﬁle names above.Don’t forget to

also transform your own signatures with this new R into new PCA variables!

Fspca = Fs ∗ R;

Ftpca = Ft ∗ R;

You should have seven transformed “user characteristics” Fxpca in total at this stage.

Then you compare the results visually by creating a 3D-plot as you did for just your

two signatures.Since most of the tablets have had their sensitivity drift during the years

6

since the stored signatures were taken,you may ﬁnd that the stored signatures form one

group while you current ones form another.

Save this combined plot and include it in your report!Again,use the PNG format.

3.4 Evaluation

It is now time to evaluate our system.Could it be used for biometric authentication of at

least a small set of users?Could it be developed into a real life system?If yes,how?If

no,why not?Please note that your conclusions should be founded on what you have seen

in this lab session,and they should be reported in written form and handed in to the lab

assistant in order to get your ﬁnal grade on the lab assignment part of the course.

There are three things that you can do to better evaluate the result of the system.

The ﬁrst one you should try,is to study graphically the Euclidean distance between the

seven original sets of signature features,the ﬁve stored and your own.This is done with

the help of the dmap function,which calculates the pair-wise distance between vectors,

and returns the result in a square matrix,where element (x,y) is the Euclidean distance

between vector x and vector y.You ﬁrst calculate this matrix with the command

d = dmap (Fpopulation);

This matrix can now be studied with the commands

contourf(d);

colorbar;

Remember to use hold off before issuing these commands.Otherwise,the contour plot

will be overlaid on the previous plots.The graph shows the distance between all signature

feature vectors of the population.Also see the dmap documentation in Appendix B!The

x axis and y axis are individual signatures.The distances are colour coded,and the

translation of the colours is shown in the colour bar.What conclusions can you draw

about the system from this graph?What should an ideal graph look like?Also include

this graph in your report!

3.5 Forgeries and bad data

This last section is if there is time left in the lab session.Try to to change your own

signature.Deliberately try to write as if you were in a hurry,tense etc.Compare this

with your earlier signatures,using the tools from the earlier steps!Do not create a new

R-matrix.Finally,if there is still more time,you can try to forge a signature,preferably

letting one person in the pair imitate the signatures of the other.Again compare with the

original signatures and draw conclusions!

4 The report

In the report you should very brieﬂy summarize what you did and what you called any

variables and ﬁles,which you refer to in the text.There is no need to repeat these

instructions!The main part of the report must consist of your answers to the following

questions.

7

1.What conclusions could you draw from d3plot?Were your values clearly diﬀerenti-

ated from the other users’ values?In all dimensions?Were the other users’ data

divided into clearly separate groups?

2.What did dmap show?What would a dmap look like for an ideal system?Did your

results agree with this ideal,or what were the diﬀerences?Did its results agree with

your results from d3plot?Did you get any further information from this presentation

of the data?

3.If you had time for the extra signatures,what did they show?

4.If you had time for the forgery attempt,what did it show?

5.What is your ﬁnal verdict for this system?Can it be used for smaller populations?

Can it be developed,to better ﬁt smaller populations and/or be used for larger

populations?If so,how?Note that your suggestions can concern type of equipment

as well as data processing.One important point is that individual diﬀerences

between the tablets,such as diﬀerences in pressure sensitivity,may create artiﬁcial

diﬀerences in the user templates used to create the transformation matrix R.Have

you seen any indications of this?Can it be compensated for in a real life system?

6.After this experiment,do you believe in digitalized handwritten signatures as a

possible method for biometric user authentication?Motivate!

The report should be handed in to your the lab assistant Jonathan Fors (via e-mail to

jonathan.fors@liu.se) at the latest before the start of the following exam period,unless

you have been given explicit permission to complete it later.Allowed format is preferably

.pdf.If you can get the relevant information into a reasonably short e-mail,that can also

be used for the text with the plots attached.The plots need to be saved as PNG images!

If you are very quick and well-prepared,you may be able to complete the report during

the actual lab session.Then you can of course print it and hand it to the assistant directly.

There are no special requirements for the layout,as long as it is logical and clear.

Your e-mail subject line and the ﬁle name of the attachment containing the report

must include your LiU-IDs!The lab assistant should not be required to sort,rename

and distinguish between a hundred saved attachments with identical names.Incorrectly

formatted e-mails and ﬁles will be ignored!The lab assistant will tell you if your report

was suﬃcient or needs revision.

A Principal Component Analyis

When analyzing data,there is often a risk that the data set becomes too large to handle.

This of course degrades the eﬃciency of the analysis,and sometimes makes any meaningful

analysis impossible.Of course science has many solutions for how to compress data and/or

sort out the relevant parts before any more sophisticated analysis.Principal Component

Analysis,PCA,is one of these general solutions.

The goal of PCA is to reduce the dimensionality of the data set without removing

information about the variation within the original set.This method is also known

8

under many other names and slight variations,the best known of which is probably

“Karhunen-Loeve transformation”.

So what reduction does PCA achieve?Suppose that we have a data set,where each

itemis a point in some coordinate system.The set uses all the dimensions in the coordinate

system in such a way that removing one dimension also removes important information.

But can these coordinates be transformed into another coordinate system,where some

dimensions are not very important?Let us turn and tilt the system,so that we get a

“better ﬁt” for the current data set!A “better ﬁt” is deﬁned as a coordinate system,where

some axes point in directions where most of the variation in the data set is,while others

point in directions with almost no variation at all.These new coordinates are (as usual)

linear combinations of the old coordinates.If we succeed,we get some axes along which

there are no or only very few coordinate values for our data set.Then we can disregard

these directions,and get a system with fewer dimensions.It is of course up to the user of

this method to decide what directions that can be disregarded without hurting the goal

of the analysis.

Stated in more mathematical form,we do the following:First you create a covariance

matrix,covX,for our n-dimensional vector X,that represents the points from the

measurement.Then you calculate the eigenvalues and eigenvectors of covX.Using these,

you can reorganize X into a set of orthogonal components.These components are linear

combinations of the original components of X,and the eigenvalues are the variances

of these new orthogonal components.The trick of this is the fact that the sum of the

variances is the same for the original set as for the new,orthogonal set,but we now have

components that are independent of each other.Thus,if we pick out a subset of these new

coordinates,and choose this subset as the ones with greatest variance,we have reduced

the dimensions,but lost very little of the information about variance in the original set.

If our original goal was to distinguish between diﬀerent classes of data sets,it is not

certain that this is best achieved by concentrating on the components with the greatest

variance.It is rather obvious that coordinates that show no variance at all between

sets can hardly be used to distinguish between these sets.It is,however,possible that

certain patterns in only slightly varying components may be the best criterion for our

class distinctions.But PCA does give a more eﬃcient way to describe the data set as

such,which can then be searched for the truly distinguishing properties.

For those who really want to know more,these methods are for example described in

text books for control theory,for signal processing etc.And of course they are described

on the internet.

B Special Matlab commands used in this assignment

acquire_signature Data acquistion from tablet

s=acquire_signature;

collects data from the witing tablet and returns an n ×3 matrix with one sample

per row of the x,y and pressure coordinates for the sample.n is the number of

samples collected during the signature writing and it is a function of the sampling

rate,the time to sign and how often and for how long the pen has been lifted from

9

the surface during the signature writing.The data collected with this command is

saved in Matlab by using File menu->Save and return.If you want to erase what

has so far been sampled without saving it,you just use File menu->Clear.

features extracts features from the saved data from the tablet

f = features(s1,s2,...,sn);

extracts features from the stored signals as described in the main text of this

document,and creates the n × 33 dimensional matrix f,where each row is the

feature vector for the corresponding signature.These features are the average power

density for the frequency bands of the x,y and p coordinates,respectively.

pca computes the principal components of the stored features for a set of sampled

signatures

R = pca(Fx);

computes the 33x33 matrix R,which is used to transform signature feature coordi-

nates into a coordinate system more suited for eﬃcient analysis.Fx is an n ×33

matrix,where each of the n rows is a feature vector from the set which shall later

be transformed (or any other feature vector used in the basis for the choice of the

transformation).

d3plot plots the ﬁrst three calculated principal components for a sampled signature

d3plot(y,s);

plots a point for each row of Fy.Fy is a matrix,where each row is the transformed

feature vector of a single signature,and all rows represent signatures from the same

user.The coordinates of the plotted points are the values for the ﬁrst three principal

components of the transformed vector.Thus,with six stored signatures,you get

six diﬀerent points in a three-dimensional coordinate system with d3plot."s"is the

symbol used by d3plot to mark the points in the shown graph on the screen.The

standard Matlab plot command has a list of available symbol characters.

plotsign plots a signature

plotsign(sx);

plots the signature stored in sx as it would look on a piece of paper.

dmap computes a distance map between transformed signatures

D = dmap(F)

computes a matrix D,where each element is the Euclidean distance between the

feature vectors of two signatures.F is a matrix where each row contains the feature

vector of one signature.

10

## Comments 0

Log in to post a comment