Keystroke Biometric System Test Taker Setup and Data Collection

snakesailboatΑσφάλεια

23 Φεβ 2014 (πριν από 3 χρόνια και 3 μήνες)

224 εμφανίσεις

1


Keystroke Biometric System Test Taker Setup and Data Collection



Hassan Poorshatery, Geoffrey Garcia, Elizabeth Teracino, Xiaolu
Zhao
,

Vinnie Monaco
,

John Stewart,


and

Charles Tappert


{hp47222n, gg30810n, et75813p, zx84933n
, jm51645n, ctappert
}@pace.edu








Seidenberg School

of CSIS,

Pace University
,

White Plains, NY, 10606, USA




Abstract




Pace University’s Seidenberg School of C
omputer
Science and Information Systems

(CSIS)
has
developed

over the past seven years

a

robust


Pace
Ke
ystroke Biometric System

(PKBS)

for both
identifying and authenticating users via their typing
rhythms and patterns which can be used to uniquely
differentiate between users.



The
PKBS

consists of three component
s:
the

Keystroke Entry System

(KES) that collects

raw
keystroke

data over the Internet,
the

Keystroke
Feature Extractor

(KFE)

that extracts
feature vector
s

from the raw data
, and
the

Keystroke
Pattern
Classifier

(KPC) that
is used in the authentication
process.


The work
described in this

paper focuses

on
enhancements to the

keystroke entry system to
support
a real
-
world application
to authenticate

students

taking online tests
.


1.
Introduction



Keystroke biometrics is one of the least studied of
the
biometrics;

however this has been changing over
recent years due to the increase in online testing and
email monitoring in corporate settings.
It
is a
behavioural biometric that can be used for

identification
and authentication
purposes.
User
Authentication is a pr
ocess that determines whether
to confirm or deny a user’s claimed identity. [5]
Passwords are a

common

form of authentication,
however, user authentication can also be
accomplished with

biometric systems via what you
are (i.e. fingerprints, iris) or how yo
u behave
(handwriting, signature, typing rhythm). A
n

advantage of using a keystroke behavioural biometric
system

over alternatives

is that the only piece of
hardware needed is a keyboard, making this an
inexpensive
tool
.


Ke
ystroke dynamics are the
patterns of rhythm
and timing created when a person types.

They

include overall speed, variations of speed moving
between specific keys, common errors and the length
of time that keys are depressed.
Data is recorded

when each key was

pressed
,
for how lon
g (duration)
by recording when the key was released
,

and
the

latency

between each key stroke
. This

rhythm
is
believed to be unique to

an individual

and

is
captured

to develop a unique biometric

template for the future
a
uthentication of that
same
individual.



Pace University has been researching this

method
of identification and authentication

via
experimentation and the implementation of the
PKBS

software for more than
seven

years.


With the
increas
e of

enrolment

in online
education
there is
a
concern for evaluation security and academic
integrity
.[
4
] Ensuring that
students

are who they say
they are

during
online

examinations is extremely
important. In a similar vein, this sort of
authentication could be used for the same reasons
when traini
ng and orientation examinations are
administered in a business setting.



The Keystroke Biometric System process begins
with a
n initial training period where users
register
and
login to the system
where they are asked to

answer a set of practice ques
tions to gather initial
sampling data.
L
ater, another set of tests
is required
to test the

authentication of the users from
the initial
sampling
.

The KES was revised and updated to use
JavaScript instead of the original Java Applet
configuration in order

to eliminate the user’s need to
have Java installed and simplify the data entry
process on the user’s end.



Feature measurements are extracted from the

raw
data

using

the

Keystroke Feature Extractor program
(KFE)
and

are then
processed by
the
Keystroke
Pattern Classifier (KP
C)

which

uses the
k
-
Nearest
2


Neighbour

classifier.

[1]

The
collected

raw keystroke
data samples

(test)

are processed and compared
against the

archived
enrolment

samples

(train)

to
make an authentication decision. The test taker is
either matched (accepted) or not matched (rejected).





T
he purpose of this
paper
is
to explain how

to turn
the

KES system into a real
-
world application that can
be used for the authentication of students taking an
online test
,
with an additional focus on the
improvement of the accuracy of the system.


2. Revised KES Interface




The Keystroke Entry System
(KES)

collects raw
keystroke data over the Internet
.

The

Keystroke Entry
System (
KES
) has been
customized

for use
by
students
to
take

online

tests
.



2.1. Changes to the KES





Figure 1 shows

the components of the new KES
along with other

components of the
PKBS
.


Figure 1:
PKBS

revis
ed keystroke entry system








The system is

first

initiated by the
i
nstructor by

entering the course name

and questions

into
a

text file
called
“Prompts.txt”
.

Next
,

the

students login to the
KES website

which is the test taking environment

and register as

new u
ser
s
.

During this
registration
phase, the

s
tudent will be asked to enter the
ir

first
name, last name,
if they are right or left
-
hand
ed, and
whether

they are on a laptop or
desktop
.


R
egistered
students

can then

login to the
site

to take the test
,
answering

the questions
from the questions file
.
The
KES
displays each

question while
JavaScript event
handlers
monitor the text
input area

where keystrokes
entered by
a

student are captured
. After
completing
each question
,

the student is required to submit their
answer
which the system saves as keystroke
dynamics information in text files on the hosting
server
.





Unlike the previous KES,
the updated KES saves

new raw data text files
that
contain both the
keystroke codes and
the answer to

each question
from

each student.

After using a data convertor
program
t
he keystroke code

of

these raw data files

are

ready to be
later

analyzed and compared using the
Biometric Authentication Feature Extra
ctor and
Feature

(data) Classifier to identify the keystroke
patterns of each test taker

(student)
,

and

finally
,

to

authenticate them.

T
he
actual
answer
s as typed by the
students in
the raw data file

are

use
d

by
the
instructor
for
grading purposes
.




I
n the previous version of the KES,

t
he four
experimental

categories
were

copy task on a desktop,
copy task on a laptop, free
-
text entry on a desktop
and free
-
text entry on a laptop.

At the completion of
registration or, upon returning to the site,
a

user is
redirected to the activity selection

page.
The six
pieces of information sent to, and required by, the
Java

applet
included data
such as

experiment s
tyle
,

sequence number
,

keyboard style
,

and awareness.
Lastly
, the user
was required to
use
Interne
t Explorer
.

[5]

T
he
new
Keystroke Entry System
is a PHP

based
web application that uses

Java
Script

compatible with
the

Mozilla

Firefox browser

(
Figure 2
)
.

U
nlike the
previous version there is no need
to have

java
installed

on the user’s end
.












3


Figure 2: KES home page






Regarding

the application
requirements

and
recommendations from previous team research
,
unnecessary
information
has been removed.

For
example, s
tudents
are not

informed that their
keystrokes are captured

during test taking

for the
purpose of biometric auth
entication
.
Instead this is
simply acquired via background
or stealth mode
.
Test answers from the KES (Figure 3) are

valid only

if at least 200 keystrokes

(reduced from 300
keystrokes used in previous r
esearch
)

are collected.
Upon completion of the entry the user is thanked and
the data output is stored as a .txt file within the
application structure in the format of
<First>_<Last>_
<Course Name
>_
PROMPT
-
<
Question #>. In addition to the raw data file, an
additional file of the format
<First>_<Last>_PROMPTS_COMPLETED is
created which identifies all of the questions a user has
already answered.

For returning users the file is
appended following the data entry to include which
additional questions have been a
nswered.













Figure 3: The data

input screen


2.
2
.
Data Format



As can be seen in the
F
igure 4,

o
ther than the file
name which
displays the

course/test name

and

question number,

the raw data

file

generated

from
a
user’s input within
the K
eystroke Entry System
includes three

sections of information:

1.

Header area
:

Student or
User’s name


2.

Data Entry area
:

a.

#: Sequence number

for each
keystroke

b.

Key:

Character

displayed on
-
screen
:
a, b, c…

c.

Key
code
:

ASCII character code

corresponding to the Key field

d.

Press:
Time of key press

(ms)

e.

Release:
Time of key release

(ms)

f.

Duration

(ms)
:

Difference in Press
and Release time for a particular
Sequence

g.

Latency

(ms):
Difference in Press
time for
a

Sequence

and the
Release time from th
e previous
Sequence

3.

Answer area

a.

U
nformatted copy of the user’s
i
nput




Notes:



Key combinations, such as SHIFT + i,
re
corded as individual sequences

4




Holding down a key results in a new
sequence every 31 milliseconds following
an initial 500ms delay (depending on
computer configuration) and until the key is
released the release
time records as a 0

Figure 4: A raw data file

of

a student


2.
3
.
Alpha and Beta
V
ersion
.


In an effort to gather user feedback and test the
system, both
alpha

(initial)

and
beta
(real)

environment
s

were

configured. The
difference in the
two is that the
alpha version

is

intend
ed

to test the
new KES and PKBS as
a
general purpose
authentication system which
uses randomly selected
questions from
a list of generic questions, while

the
beta version
is intend
ed

to work as
an
online test
taker system as
a
real
-
world application of PKBS
.
The beta version

uses test questi
ons presented in a
particular order

determined by
the
instructor

where
all questions need to be answered by the students.


For
the
alpha testing,
four

training data samples

(free
text)

were collected from a group of

fourteen

users
.

The same group was asked for a second set of
four

testing data samples
a week

later. Although
the
questions are
generic
, it
simulate
s

taking an online
test
and the data is

marked as test data
.

For
the
beta
testing, a group
of

fourteen

students from Lake Erie
College, under the supervision of
Prof
essor

John
Stewart were to

be asked to submit sample training
data and complete an

actual test
as their final exam
using the system during the same sitting.


However,
due to the time restrictio
n,

we obtained both

the
training and testing data

in a single exam
session
,
later
split
ting

half of the
answered
questions of each
student into training and
testing
.

3.
Aut
hentication
Experimental Method


As F
igure 5

shows,
in order
to determine
authentication accuracy
the
K
BS

uses
a
manually
operate
d approach

with multiple steps,
which begins
with the capture of training and test data through the
Alpha and Beta Keystroke Entry Systems. Following
the collection of data, raw files are

input into the
Keystroke Feature Extractor to s
imultaneously
determine the

collective

key

features on the training
and testing data
.


Figure 5: PKBS Experimental Procedure




A number of measurements or features are used to
characterize a user’s
typing pattern. These features
are designed to describe an individual’s keystroke
dynamics ove
r writing samples of at least 200

characters. Th
e features characterize a user
’s key
-
press duration times, transition times in going from
one key to the next, th
e percentages of usage of the
non
-
letter keys and mouse
clicks, and the typing
speed
. The feature extractor program was designed
and implemented to ex
tract 239 feature
s.

The
resulting

output is a
feature vector file

which

is then
manually split

in half

into two files
corresponding to
training and testing
.



Finally, the

split files are input
in
to the
A
uthentication
C
lassifier

to determine authentication
accuracy
.

The BAS System uses the
k
-
Nearest
Neighbour

classifier. As part of the processing the
multi
-
class input data is dichotomized into two
classes. The test taker samples are decided to be
within
-
class or between
-
class by the classifier. With
-
class samples are decided by the classifier to be “you
are au
thenticated”. Between
-
class samples are
decided
by the classifier to be
: “you are not

authenticated”. [3]




5


4.
Efficient

Authentication
P
rocess
..


The
Keystroke Feature Extractor
program was
modified

by the previous team
to

offer
different

processes based upon whether it is working on train
or test data. By means of a switch in the interface

a
choice can be made to use the training process to
output a file containing the standardization x
-
min and
x
-
max values for each feature.
F
o
r

the

testing

process

the recorded x
-
min and x
-
max values

from
the training process

will be
imported

and used to
perform the standardization.
[5]


Figure 6 shows that

efficient experimental procedure
is as follows:

1.

Extract a feature file from the training
raw
data and output a file of xmin/xmax values

2.

Extract a feature file from the testing raw
data by reading in the xmin/xmax value file
from step 1

3.

Run the authentication classifier on the
training and testing feature files



Figure
6
: KBS e
fficient Authent
ication Process



5.
Test Results

Analysis
..


After generating
a
features vector file by using the
KFE (
F
igure 7), t
he authenti
cation component of this
system

will utilize the Biometric Authentication
System (BAS) which compares test

and train data to
determine if they are a match

(
F
igure 8)
. The system
asks for the maximum amount of dichotomy data to
use which equates to the maximum number of inter
or intra class samples to create for experimentation,
as well as the lowest N choices

(which is used to
optimize testing performance and stipulates the
maximum nearest neighbour test).

[2]

After the
dichotomy model is applied the data is saved and is
then ready to be processed using the BAS: Accuracy
Calculator

(Figure 9)
. The calculator
uses the output
file from the Biometric Authentication System and
applies nearest neighbour calculations to determine

the false acceptance rate and the false rejection rate
,
as well as

the overall performance of the test
for each
of the

nearest neighbour c
alculations.



Figure 8: KFE
interface



Figure 8: BAS or K
P
C

component interface




6


Figure 9: Accuracy Calculator





In

the

user authentication system, for a given
attempt by a user, one of the following four cases can
happen where u1 is
a

registered user and
u2 is
an

unregistered
user unknown to the system
:

True Positive
:

u1 claims to be u1 and is accepted

False Reject
:

u1 claims to be u1 but is rejected

False Accept
:
u2 claims to be u1 but is accepted

True Negative
:
u2 claims to be u1 a
nd is rejected


False Acceptance Rate (
FAR
)

and
False Rejection
Rate (
FRR
)

are the error rates used to evaluate the
performance of the biometric classifiers
.
[3]



As the raw keystroke data samples were gathered
via two different
scenarios as previously explained,
four different experiments were administered in order
to test the performance of the new system.


Table 1 shows the result for the generic (Alpha)
test by using 56 sample
s

of training data and 56
samples testing data

with the average of 279

keystrokes per sample. As it can be seen from this
table, the best performance is 96
.04
% when the kNN
is 9 with the lowest FAR(.
8
9%) and the highest
FRR(57.1
4
%).

The average performance is calculated
as 94.61%.






Table
1
: BAS

Results for Alpha version (generic)

kNN

FRR

FAR

Performance

1

36.9
0
%

7.6
9
%

90.71
%

3

60.7
1
%

1.85
%

94.9
4
%

5

57.1
4
%

1.3
0
%

95.65
%

7

57.1
4
%

1.17
%

95.7
1
%

9

57.1
4
%

0.
8
9%

96.0
4
%

Avg

53.78%

2.58%

94.61%




Table 2 illustrates

the result for the student’s real
-
exam
(
B
eta test)
by using
5
6 samples for training and
5
6 samples for testing the system with the average of
435

keystrokes per sample.


Table
2
:

BAS

Resu
lts for Beta version (real
-
exam
)

kNN

FRR

FAR

Performance

1

16.67%

16.83%

83.18%

3

39.29%

7.14%

91.10%

5

44.05%

7.42%

90.58%

7

44.05%

7.42%

90.58%

9

42.86%

6.87%

91.17%

Avg

37.38%

9.14%

89.32%



By comparing
T
able 1 and 2 we can see the
performance for the real
-
world application is lower
than

in

the generic

test.


Table 3 shows the result for the
merging both
the
generic

and
the
student’s real
-
exam

(Alpha and Beta)

by using
112

samples for training and
112

samples
for testing the system with the average of
3
85

keystrokes per sample.

The performance result is
higher than that of both the Alpha test (Table 1) and
Beta test (Table 2) at 97.41% when the kNN
is

9.
The FAR is 0.94% and is lower than that of the Beta
test, and only slightly higher than that of the Alpha
test (a 0.04% differ
ence) when the kNN is 9. The
trade
-
off is that the FRR for this case is higher than in
both the Alpha and Beta tests, by quite a bit. The
FRR is 61.90% as opposed to the 57.1
4
% of the
Alpha test, and 42.86% of the Beta test.


7


T
able 3: BAS

Results for mixing Alpha & Beta raw
data

kNN

FRR

FAR

Performance

1

42.26%

5.90%

93.11%

3

67.26%

1.57%

96.65%

5

63.69%

1.37%

96.94%

7

66.07%

1.16%

97.09%

9

61.90%

0.94%

97.41%

Avg

60.24%

2.19%

96.24%



Table 4 shows the resul
t of the merging of both
exams
(Alpha and Beta) by using 168 samples for
training and 56 samples for testing the system with
the same average of 385 keystrokes per sample. The
results show that this mix increased the performance
by almost an entire percen
tage point from the
previous mix (Table 3), and when the kNN is 9, the
performance is 98.38% which is the highest of the
four experiments. Furthermore, the FAR is the
lowest of the four experiments, at 0.33%. The FRR
is the highest as well however, at 71
.43%, when the
kNN is 9.

Table 4: BAS

Res
ults for mixing Alpha & Beta with

more train
ing samples


kNN

FRR

FAR

Performance

1

60.71%

1.85%

97.08%

3

78.57%

0.46%

98.12%

5

78.57%

0.46%

98.12%

7

71.43%

0.60%

98.12%

9

71.43%

0.33%

98.38%

Avg

72.14%

0.74%

97.96%



Both Tables 3 and 4 indicate that the more the
system is trained, the higher the performance
percentages.


I
n the last case illustrate
d

by
T
able 4
,

w
e tested
the new KBS
with
around

86
,2
40

test pattern
s

(merged
the
generic and student

real
-
exam

with more
training samples)
and obtained the best FAR of
0
.33%

when the FRR was
71.43%

and the
performance value was
98.38%.

While the best
performance for
Table

2 (pure training and testing
features from students

only
) is 91.17%.


As a resul
t, in order to increase

accuracy of the
authenticating students taking online exam

using
revised KES
,
it is necessary to increase the number of
extra samples merged into the data from prior
samples in the data bank in order to increase the
training of the
system. Furthermore, working to
improve the feature extractor to include more features
and decrease error rates will help to increase the
BAS

performance.

6.
Future W
orks
.

.....
The

current
Pace
Keystroke Biometric System
(PKBS)
is the result of several evolutionary
prototyping projects each focusing on different
components of the system. Because of
this approach
,

the system lacks
certain elements of
cohesion
,
automation

and process

which would be necessary
before being released
in a production environment. It
is recommended that the following steps be taken to
further the current work:



Key hold times for the space characters
should not considered because the user

usually pause after pressing space key to
recollect of what has to

be typed next.



Adapt
the

system

to the changing typing
patterns of the

u
sers



Implement a spell check tool for users on
internet browser which will discourage them
from copying/pasting from other programs
which do offer spell checking



Utilize a database for
all keystroke biometric
system data



Utilize web services

and stored procedures

for all components of the keystroke
biometric system rather than
executable files



Develop an administrator interface allowing
instructors the ability

to create sample and
actual tests, as well as set timeframes for
when they can be taken



Following each course individual keystroke
biometric information should be merged into
a master user table which tracks an
individual’s academic career



Develop the Key
stroke Entry System to be
cross
-
browser compatible



Develop the system to either immediately
process results or to have the processing run
as a nightly job



Develop a system (email and through the
admin interface) to alert an instructor of
suspicious results

8




Revising Keystroke Feature E
xtractor
(KFE)
program

to increase PKBS performance for
real
-
world application with small amount of
samples




Integrate the Keystroke Entry System with
Blackboard for security and a seamless user
experience



7.
Conclusion




Keystroke biometric is an inexpensive, yet
effective method of user identification and
authentication. The
Pace keystroke Biometric
System (
PKBS
)
, if developed further, could be
particularly ideal for student online testing when
embedded within a bro
wser and customized per the
institution utilizing the system.



While the main concern is currently surrounding
academic integrity during online testing, this sort of
authentication could be used for the same reasons
when training and orientation examinations are
administered in a business setting.


The result
s

o
f our four different experiments
demonstrate that in order to

increase
authentication
accuracy of

students taking online exam using
the
revised
Keystroke Entry System (
KES
)
, we need to
gather more training samples
which can be taken
prior to the exams with

the purpose of training the
system more prior to exams
.

Moreover, revising the
Keystroke Feature Extractor (KFE)
program to
generate an increase in features and decrease in error
rate would further prepare the system for real
-
world
applications with a sma
ller amount of samples, thus
increasing the BAS performance rate.



8.
Resources


[1
] S. Janapala, S. Roy, J. John, Luca Columbu,
J.

Carrozza,

R
.

Zack, and C
.

Tappert,

Refactoring a Keystroke Biometric System
”.

paper

b1,
Proc. Student
-
Faculty Research Day,
Seidenberg School of CSIS, Pace University,
New York.

[2
] S. Bharati, R. Haseem, R. Khan, M.
Ritzmann, A.

Wong, “Biometric Authentication
System using the Dichotomy Model”

paper

c3,
Proc. Student
-
Faculty Research Day,
Se
idenberg
School of CSIS, Pace University, New York
,
May 2008.


[3
]

C.

C Tappert, S.
-
H. Cha, M. Villani, and R.

S. Zack, “A Keystroke Biome
tric System for
Long
-
Text Input

.

Int. J. Info. Security and
Privacy (IJISP)
, Vol 4, No 1, 2010, pp
32
-

[4
]
R.S. Zack, C.C. Tappert and S.
-
H. Cha,
"Performance of a Long
-
Text
-
Input Keystroke
Biometric Authentication System Using an
Improved k
-
Nearest
-
Neighbor Classification
Method," Proc.
IEEE 4th Int Conf B
iometrics:
Theory, Apps, and Systems (BTAS 2010)
,
Washington, D.C., Sep 2010.


[5]
A. C. Caicedo, K. Chan, D. A. Germosen, S.

Indukuri
,

M. N. Malik, D. Tulasi, M. C. Wagner,
R. S. Zack and C.

C. Tappert
,”

Keystroke
Biometric: Data/Feature Experiments


paper

b5,
Proc. Student
-
Faculty Research Day,
Seidenberg
School of CSIS, Pace University, New York,
May 2010.