Biometric Authentication of Computer Users

nauseatingcynicalΑσφάλεια

22 Φεβ 2014 (πριν από 3 χρόνια και 8 μήνες)

82 εμφανίσεις

Biometric Authentication of Computer Users
TSIT01,TSIT02 Computer Security
Jonathan Fors
jonathan.fors@liu.se
December 2,2013
In this assignment you will study the basic principles of biometric user authentication.
This is done via the example of using digitally sampled handwritten signatures.The main
goal of the assignment is to give you a feeling of the basic possibilities and problems with
biometric authentication.This means that at the end of the study you should answer some
questions,using your experiences during the lab assignment and your general knowledge
as a basis for your answers.
1 Preparatory Work
Before you come to the computer lab room at your pre-booked time,you should read
the sections about user authentication in the course literature,especially,of course,the
subsection on biometrics.You should also study these instructions thoroughly,so that you
have a plan for what you should do during the two booked hours with access to assistance
in the lab room.
2 Background
Figure 1 depicts a general biometric user authentication system.When a user tries to be
accepted under a given identity,the user must first present the right type of signal to the
biometric reader,for example press the fingertip against a fingerprint reader.The size of
the direct output signal from a biometric reader normally exceeds by several magnitudes
what is needed for the purpose and what is convenient and efficient to handle.Thus it is
Figure 1:General biometric system
1
reduced to some crucial parameters,whose values should be unique for every prospective
user.But a biometric signal always has some variance for a given user,even measurements
of such relatively static properties as a fingerprint.This is because even though the
fingerprint itself may be exactly the same,the reader captures a picture,where picture
contrast thresholds,dirt on the finger or reader,stretching of the skin against the reader
etc.create slightly different images every time.So a user is not characterised by some
exact biometric parameter values,but by allowed ranges for these values.The decision
function fetches the stored values for the given identity in the database,and compares
these to the processed result for the currently captured signal.If they are similar enough,
the user is accepted,and otherwise the user is rejected.
From this short summary it is clear that it is crucial for the quality of the user
authentication that the processing and parameter extraction are designed with great care.
For fingerprints,the features to extract have been agreed upon since long.The remaining
problem is the image processing and how to handle differences between two readings for
the same finger within limits such as available processing capacity etc.While that is not
an easy problem,the situation in this lab assignment is even more complicated.
In this assignment you are to study computer samples of handwritten signatures.There
is no agreed upon set of parameters for this kind of biometric signal.The software used
in the assignment implements a very crude first step in testing such signals for similarity.
Your task is to use it hands on and have this as a background for a short discussion on
biometric principles.
2.1 System Principles
The system samples the x and y coordinate positions for a pen moved over a special
surface.It also captures the pressure p,including if the pen is lifted from the surface.
This is then treated as three separate signals in the time domain.
The first step in the processing is a frequency analysis.This means that we try to
extract how the signal was written,not what static pattern it left on a piece of paper.
What counts is what speeds that were used in all three directions.Which then means that
careful tracing of a written signature will probably yield values that are totally different
from those captured when the original signature was written.But also we could get very
similar values for signatures that on paper look different.Of course we do not have cyclic
frequency signals here,but the processing measures how much of the total energy of the
signal that is spent in different frequency regions.This measurement is called Power
Spectral Density,or PSD for short.
You use a command to create the PSD for all three directions.The command then
reduces the amount of data into a set that is reasonable to handle and compare.For
simplicity,this is done by dividing the total frequency range into 11 regions,and then
calculating the mean value for each region.Thus we now have 33 discrete values for each
signature to compare and study.But we have to decide how they are to be compared,so
that we capture what is really characteristic for each user.
The final step in the input processing is a Principal Component Analysis,PCA.This
method is a very general one in signal processing,and there is a short description of it
in appendix A.For the current purpose (the lab assignment),it is enough to know that
what PCA does,is that it uses a set of signals,and there it seeks out the directions in
2
the multidimensional space along which the signals vary most.In our case we have 33
points captured each time,which can then be regarded as points in a 33-dimensional space.
The PCA step reduces these to three new values,which try to capture the “coordinate
values” which show the greatest variation among users in a transformation of the original
33-dimensional room.As is pointed out in the appendix,this is not necessarily an ideal
signal processing for the purpose of user authentication,but it does give us a reasonable
set of values to use in the final study in the assignment.
3 The Computer Laboration
The biometric signatures are written on writing tablets that will be provied by the lab
assistant.These are connected to the lab computers via USB.On the computer,Matlab
is used to perform the computations.
The tablets require that an older of Matlab is used.This can be started by opening
the Start Menu,All Programs,Server-Based Software,Matlab,Matlab 8.0.This starts a
32-bit version of Matlab.Note that the window title for Matlab will be 2012b.This is
exactly the 8.0 version.
Starting Matlab does take some time,especially when everybody in the lab session
does it at the same time.When the program is started and the console is active,initialize
the course by
initcourse tsit01
or
initcourse tsit02
3.1 Capturing signatures
First each pair should capture a set of digitalized signatures with the help of specific
tablets and pens connected to the computer.Each person from each pair should write
at least six signatures.These signatures are captured in the computer by calling the
Matlab function acquire_signature,which will return the data for each signature as a
matrix with the name you choose,here s1.(The first time you call this function,you get
a warning that future versions of Matlab won’t support it.Just ignore this.)
s1 = acquire_signature;
When acquire_signature has been called,a new window appears and there you see the
pen movement when the pen is positioned on the signing pad.If you lift the pen,the
recording is paused until you place the pen on the tablet again.When the whole signature
has been written,the data collection is ended by typing Ctrl+S or by clicking on the File
menu and choosing Save and return.The other option in the file menu is Clear,which
does not work well with the current Matlab version.You will get a crash and the window
closes.
Unfortunately,some of the pens and tablets are better than others.For example
sometimes the tablet doesn’t start recording a signal directly or not until the second
attempt to write.Always wait for the little black,square dot on the tablet to appear
3
before you start writing!Don’t write too fast!And make sure you don’t press the switch
on the pen,because this turns the pen into a device controlling the cursor.Also a section
of the signature may be omitted when the slant of the pen was not right etc.Whenever
you feel directly that the recorded signature is useless,don’t use Clear (see above),but
just type Ctrl+N to get a new window.You can also finish the attempt as if it had
succeeded,and then start anew in the command window with the same variable as the
recipient of the result.
It is important that you capture and save at least six different fully captured signatures
from each person.A good idea is to give them the same name for each person but different
numbers (like s1,s2,...,s6 and t1,t2,...,t6).In order to check if a signature that
seemed alright really can be used,use the command plotsign to see how a saved signature
looks like.If things get too bad,ask the lab assistant for a new pen and tablet.Have
patience with this step!
3.2 Statistical Analysis
Now you should have two sets of at least six Matlab matrices,each containing the sample
points from one signature.Each matrix has three columns,one for the x values,x(t),one
for the y values,y(t),and one for the pressure,p(t).The number of rows depends on
how you wrote.The tablet registers 125 values per second,and a typical signature takes
3-5 seconds.Thus you should have some hundred rows,but probably not as much as a
thousand.The first step in the next part is the compression of your data.You should use
the commands
Fs=features(s1,...,s6);
Ft=features(t1,...,t6);
This command calculates the three PDSs for each signature from one user plus the
mean values,and stores them in the matrix Fs or Ft,which contains 33 values for each
signature,a total of 198 values.You now have one compressed matrix for each user in the
pair.
Then it is time for the principal component analysis.The goal is to calculate a
minimal set of characteristic values,which will enable the decision process to distinguish
the signatures made by one person from signatures done by any other person.
If we did a PCA for signals from just one person,we would get an indication of what
is consistent and what is varying for that person’s signatures.But that tells us nothing
about how this compares to other persons’ signatures.Thus we need an analysis for a
larger user population.In this assignment you first compare your two sets of signatures
to each other.
First you combine your respective signatures with the command
Fpair=[Fs;Ft];
Important:Note the semicolon!The next step is to perform the PCA,which is done with
R = pca(Fpair);
R is now a matrix,which you can use to transform you signatures with the commands
Fspca = Fs ∗ R;
Ftpca = Ft ∗ R;
4
Specifier Marker
o Circle
+ Plus sign
* Asterisk
.Point
x Cross
s Square
d Diamond
ˆ Upward-pointing triangle
v Downward-pointing triangle
> Right-pointing triangle
< Left-pointing triangle
p Pentagram
h Hexagram
Table 1:Possible plot markers and their representation
Specifier Color
y yellow
m magenta
c cyan
r red
g green
b blue
w white
k black
Table 2:Specifying plot colors
What have we so far achieved?We first found a transformation of our total signature
data,which should transform the data into directions according to their variance.The
result for the transformation of each of your signatures is stored in Fspca.and Ftpca.
Comparison It is now time to see if these transformations actually create data sets,which
are specific for each user.To study the result in graphical form,you should display the
three largest principal components of the signatures for each user with a user specific sign
in a plot.Start with one,using for example o to show its points in the graph
d3plot (Fspca,’o’);
The second argument controls the appearance of the plot markers.A list of available
markers are given in table 1.It is also possible to plot using different colors,as given in
table 2.For more information,see the Matlab command
help plot
You get the best result if you get both plots in the same graph,which is achieved with
the command
hold on;
5
To turn this off to get separate presentations you just write
hold off;
If the graph gets difficult to interpret,you can use the
view
and
axis
commands in the Matlab window to get better viewing angles.If you need help use the
built-in documentation commands
help view
help axis
An alternate way of showing the graphs is to use the Rotate 3d tool which can be found
as a round arrow just above the plot window.
Save this plot and include it in your report.It is important that you save the figure
with the correct file format,PNG.Do not use JPEG as this strongly distorts the data!
3.3 Working With More Users
The results so far have only tried to distinguish between signatures for two users.Now
you should include more.The larger set of users is simulated by a prepared file,which
contains registered signature matrices from five different persons.This file is called Ftot.
Your signature results are combined with these in a total result through the commands
load Ftot;
or just
Fpopulation=[Ftot;Fpair];
The next step is to use PCA for the whole population,which is done with the command
R = pca(Fpopulation);
R is now a new matrix,which you can use to transform the stored signatures.You
have five files Fnw,Fka,Fmo,Ffc and Fks.For each of these you should now repeat the
following:
load Fxx;
Fxxpca = Fxx ∗ R;
where xx of course is one of the letter pairs in the five file names above.Don’t forget to
also transform your own signatures with this new R into new PCA variables!
Fspca = Fs ∗ R;
Ftpca = Ft ∗ R;
You should have seven transformed “user characteristics” Fxpca in total at this stage.
Then you compare the results visually by creating a 3D-plot as you did for just your
two signatures.Since most of the tablets have had their sensitivity drift during the years
6
since the stored signatures were taken,you may find that the stored signatures form one
group while you current ones form another.
Save this combined plot and include it in your report!Again,use the PNG format.
3.4 Evaluation
It is now time to evaluate our system.Could it be used for biometric authentication of at
least a small set of users?Could it be developed into a real life system?If yes,how?If
no,why not?Please note that your conclusions should be founded on what you have seen
in this lab session,and they should be reported in written form and handed in to the lab
assistant in order to get your final grade on the lab assignment part of the course.
There are three things that you can do to better evaluate the result of the system.
The first one you should try,is to study graphically the Euclidean distance between the
seven original sets of signature features,the five stored and your own.This is done with
the help of the dmap function,which calculates the pair-wise distance between vectors,
and returns the result in a square matrix,where element (x,y) is the Euclidean distance
between vector x and vector y.You first calculate this matrix with the command
d = dmap (Fpopulation);
This matrix can now be studied with the commands
contourf(d);
colorbar;
Remember to use hold off before issuing these commands.Otherwise,the contour plot
will be overlaid on the previous plots.The graph shows the distance between all signature
feature vectors of the population.Also see the dmap documentation in Appendix B!The
x axis and y axis are individual signatures.The distances are colour coded,and the
translation of the colours is shown in the colour bar.What conclusions can you draw
about the system from this graph?What should an ideal graph look like?Also include
this graph in your report!
3.5 Forgeries and bad data
This last section is if there is time left in the lab session.Try to to change your own
signature.Deliberately try to write as if you were in a hurry,tense etc.Compare this
with your earlier signatures,using the tools from the earlier steps!Do not create a new
R-matrix.Finally,if there is still more time,you can try to forge a signature,preferably
letting one person in the pair imitate the signatures of the other.Again compare with the
original signatures and draw conclusions!
4 The report
In the report you should very briefly summarize what you did and what you called any
variables and files,which you refer to in the text.There is no need to repeat these
instructions!The main part of the report must consist of your answers to the following
questions.
7
1.What conclusions could you draw from d3plot?Were your values clearly differenti-
ated from the other users’ values?In all dimensions?Were the other users’ data
divided into clearly separate groups?
2.What did dmap show?What would a dmap look like for an ideal system?Did your
results agree with this ideal,or what were the differences?Did its results agree with
your results from d3plot?Did you get any further information from this presentation
of the data?
3.If you had time for the extra signatures,what did they show?
4.If you had time for the forgery attempt,what did it show?
5.What is your final verdict for this system?Can it be used for smaller populations?
Can it be developed,to better fit smaller populations and/or be used for larger
populations?If so,how?Note that your suggestions can concern type of equipment
as well as data processing.One important point is that individual differences
between the tablets,such as differences in pressure sensitivity,may create artificial
differences in the user templates used to create the transformation matrix R.Have
you seen any indications of this?Can it be compensated for in a real life system?
6.After this experiment,do you believe in digitalized handwritten signatures as a
possible method for biometric user authentication?Motivate!
The report should be handed in to your the lab assistant Jonathan Fors (via e-mail to
jonathan.fors@liu.se) at the latest before the start of the following exam period,unless
you have been given explicit permission to complete it later.Allowed format is preferably
.pdf.If you can get the relevant information into a reasonably short e-mail,that can also
be used for the text with the plots attached.The plots need to be saved as PNG images!
If you are very quick and well-prepared,you may be able to complete the report during
the actual lab session.Then you can of course print it and hand it to the assistant directly.
There are no special requirements for the layout,as long as it is logical and clear.
Your e-mail subject line and the file name of the attachment containing the report
must include your LiU-IDs!The lab assistant should not be required to sort,rename
and distinguish between a hundred saved attachments with identical names.Incorrectly
formatted e-mails and files will be ignored!The lab assistant will tell you if your report
was sufficient or needs revision.
A Principal Component Analyis
When analyzing data,there is often a risk that the data set becomes too large to handle.
This of course degrades the efficiency of the analysis,and sometimes makes any meaningful
analysis impossible.Of course science has many solutions for how to compress data and/or
sort out the relevant parts before any more sophisticated analysis.Principal Component
Analysis,PCA,is one of these general solutions.
The goal of PCA is to reduce the dimensionality of the data set without removing
information about the variation within the original set.This method is also known
8
under many other names and slight variations,the best known of which is probably
“Karhunen-Loeve transformation”.
So what reduction does PCA achieve?Suppose that we have a data set,where each
itemis a point in some coordinate system.The set uses all the dimensions in the coordinate
system in such a way that removing one dimension also removes important information.
But can these coordinates be transformed into another coordinate system,where some
dimensions are not very important?Let us turn and tilt the system,so that we get a
“better fit” for the current data set!A “better fit” is defined as a coordinate system,where
some axes point in directions where most of the variation in the data set is,while others
point in directions with almost no variation at all.These new coordinates are (as usual)
linear combinations of the old coordinates.If we succeed,we get some axes along which
there are no or only very few coordinate values for our data set.Then we can disregard
these directions,and get a system with fewer dimensions.It is of course up to the user of
this method to decide what directions that can be disregarded without hurting the goal
of the analysis.
Stated in more mathematical form,we do the following:First you create a covariance
matrix,covX,for our n-dimensional vector X,that represents the points from the
measurement.Then you calculate the eigenvalues and eigenvectors of covX.Using these,
you can reorganize X into a set of orthogonal components.These components are linear
combinations of the original components of X,and the eigenvalues are the variances
of these new orthogonal components.The trick of this is the fact that the sum of the
variances is the same for the original set as for the new,orthogonal set,but we now have
components that are independent of each other.Thus,if we pick out a subset of these new
coordinates,and choose this subset as the ones with greatest variance,we have reduced
the dimensions,but lost very little of the information about variance in the original set.
If our original goal was to distinguish between different classes of data sets,it is not
certain that this is best achieved by concentrating on the components with the greatest
variance.It is rather obvious that coordinates that show no variance at all between
sets can hardly be used to distinguish between these sets.It is,however,possible that
certain patterns in only slightly varying components may be the best criterion for our
class distinctions.But PCA does give a more efficient way to describe the data set as
such,which can then be searched for the truly distinguishing properties.
For those who really want to know more,these methods are for example described in
text books for control theory,for signal processing etc.And of course they are described
on the internet.
B Special Matlab commands used in this assignment
acquire_signature Data acquistion from tablet
s=acquire_signature;
collects data from the witing tablet and returns an n ×3 matrix with one sample
per row of the x,y and pressure coordinates for the sample.n is the number of
samples collected during the signature writing and it is a function of the sampling
rate,the time to sign and how often and for how long the pen has been lifted from
9
the surface during the signature writing.The data collected with this command is
saved in Matlab by using File menu->Save and return.If you want to erase what
has so far been sampled without saving it,you just use File menu->Clear.
features extracts features from the saved data from the tablet
f = features(s1,s2,...,sn);
extracts features from the stored signals as described in the main text of this
document,and creates the n × 33 dimensional matrix f,where each row is the
feature vector for the corresponding signature.These features are the average power
density for the frequency bands of the x,y and p coordinates,respectively.
pca computes the principal components of the stored features for a set of sampled
signatures
R = pca(Fx);
computes the 33x33 matrix R,which is used to transform signature feature coordi-
nates into a coordinate system more suited for efficient analysis.Fx is an n ×33
matrix,where each of the n rows is a feature vector from the set which shall later
be transformed (or any other feature vector used in the basis for the choice of the
transformation).
d3plot plots the first three calculated principal components for a sampled signature
d3plot(y,s);
plots a point for each row of Fy.Fy is a matrix,where each row is the transformed
feature vector of a single signature,and all rows represent signatures from the same
user.The coordinates of the plotted points are the values for the first three principal
components of the transformed vector.Thus,with six stored signatures,you get
six different points in a three-dimensional coordinate system with d3plot."s"is the
symbol used by d3plot to mark the points in the shown graph on the screen.The
standard Matlab plot command has a list of available symbol characters.
plotsign plots a signature
plotsign(sx);
plots the signature stored in sx as it would look on a piece of paper.
dmap computes a distance map between transformed signatures
D = dmap(F)
computes a matrix D,where each element is the Euclidean distance between the
feature vectors of two signatures.F is a matrix where each row contains the feature
vector of one signature.
10