Keystroke Biometric Authentication System

dashingincestuousSecurity

Feb 22, 2014 (3 years and 7 months ago)

78 views

Spring 2009







Team Members

Spring '09 Team Members


Alpha Amatya


James Aliperti


Thomas Mariutto


Ankoor Shah


Michael Warren









Spring 2009 Focus


Web Based Authentication


continued development of the test
-
taker authentication
application


Development of New Concepts


weighted & unweighted top
n

choices


strong vs. weak enrollment


Modify Existing & Write New Programs


to simulate various scoring procedures


Run Experiments


to produce various scenarios


Reasons for Study


Keystroke Biometrics is one of the least studied Biometrics Applications used
for user authentication



Most studies use short input; passwords & user names



This study focuses on long text input


Free/Copy



Typing characteristics are said to be:



1) Unique to an individual



2) Difficult to duplicate



Very important for online test taking systems



Important for overall Computer System Security



No special equipment is needed

Contents of System

A PHP Website registers the user.


A modified Java applet captures 300 keystrokes and produces
two files: a raw data file and a text file.


A Java program, BioFeature++, extracts 230 feature
measurements.


A Java program, Biometric Authentication System (BAS),
performs authentication tests.


4 Quadrant Data Collection

36 Subjects

4 Quadrants

5 samples per quadrant


Types of Data Collected



Copy Text



Free Text

Entry Modes



Desktop



Laptop

Desktop

Laptop

Copy

Desktop Copy

Laptop Copy

Free

Text

Desktop Free

Laptop Free

Keystroke Biometric Authentication System (Data Flow)


Raw Data Sample

Deliverables


Recreate authentication experiment from Keystroke Book Chapter


Rewrite user and technical manuals


Modify classifier program to produce top
n

Within/Between choice and
distances


Create 1
st
, 3
rd

and 5
th

Nearest Neighbor output tables


Create output file of top 3 choices from Classifier program and obtain FRR,
FAR and Performance


Create ROC curves for each of the 4 quadrant data samples


Run two small
-
training, strong
-
enrollment authentication experiments


Run big
-
training, strong
-
enrollment authentication experiments,
incrementally increase training sizes


Write detailed descriptions of data formats


Investigate discrepancy between 230 and 239 Linguistic
-
model features



Hierarchical Fallback Models


Touch
-
type Model
-

-

based on keys struck by touch typists


254 distinctive measurements



Linguistic Model
-


-
based on language and most frequently used keys



230 distinctive measurements



Increased performance results found utilizing the
Linguistic Model

Experimental Recreation

Condition

Intra
-
Inter

Class Sizes

FRR

FAR

Performance

Train

Test

DeskCopy

180
-
3825

180
-
3825

11.1%

6.0%

93.8%

LapCopy

180
-
3825

180
-
3825

7.8%

4.4%

95.5%

DeskFree

171
-
3570

176
-
3740

28.4%

1.4%

97.4%

LapFree

180
-
3825

180
-
3825

15.6%

3.7%

95.7%

Average

15.7%

3.7%

95.6%

Condition

Intra
-
Inter

Class Sizes

FRR

FAR

Performance

Train

Test

DeskCopy

180
-
3825

180
-
3825

2.78%

2.1%

97.9%

LapCopy

180
-
3825

180
-
3825

3.3%

4.0%

96.0%

DeskFree

171
-
3570

165
-
3576

21.0%

1.1%

98.0%

LapFree

180
-
3825

180
-
3825

10.0%

3.3%

96.4%

Average

9.3%

2.6%

97.1%

Experimental Recreation
limiting sample size to 500

Condition

Intra
-
Inter

Class Sizes

FRR

FAR

Performance

Train

Test

DeskCopy

180
-
500

180
-
500

2.78%

4.6%

95.9%

LapCopy

180
-
500

180
-
500

2.2%

7.2%

94.1%

DeskFree

176
-
500

165
-
500

10.2%

2.2%

95.7%

LapFree

180
-
500

180
-
500

6.7%

7.0%

93.1%

Average

5.5%

2.6%

94.7%

Top n=10 W/B Choices And Distance


The implementation compared each sample from the
dichotomized test data with every sample from the
dichotomized train data.


The shortest Euclidean distance was taken for n=10
choices .


This distance and the choice class , Within (W) or
Between (B) was recorded.


This program was run for all four quadrants.


Each output contained 180 Within + 3825 Between =
4005 choice tables.


Cross Section of Output Fil
e

Overall Accuracy For n=10 Output File
using 1st,3rd,& 5st Nearest Neighbors


Implemented a program to check overall accuracy on the
outputs created in Deliverable 3.


Calculated FRR, FAR and performance for all the
experiments in the 4 quadrants.


Precisely matched Deliverable 1 outputs using 1
-
Nearest
Neighbor, thus proving our experiments are carried out
precisely and accurately.


Resulted in a slight improvement using 3 & 5 nearest
neighbors as expected.

1
st

and
3
rd

Nearest Neighbor Choice

Conditions

Within
Correct

Within
Wrong

Within
Total

Between
Correct

Between
Wrong

Between
Total

FRR

FAR

Performance

Desktop
Copy

175

5

180

3744

81

3825

2.78%

2.12%

97.85%

Laptop Copy

174

6

180

3671

154

3825

3.33%

4.03%

96.00%

Desktop
Free

139

37

176

3699

41

3740

21.02%

1.10%

98.01%

Laptop

Free

162

18

180

3699

1236

3825

10.0%

3.29%

96.40%

Conditions

Within
Correct

Within
Wrong

Within
Total

Between
Correct

Between
Wrong

Between
Total

FRR

FAR

Performance

Desktop
Copy

173

7

180

3776

49

3825

3.89%

1.28%

98.60%

Laptop Copy

172

8

180

3720

105

3825

4.44%

2.75%

97.18%

Desktop
Free

127

49

176

3731

9

3740

27.84%

0.24%

98.52%

Laptop

Free

156

24

180

3735

90

3825

13.33%

2.35%

97.15%

1
st
, 3
rd

and 5
th

Nearest Neighbor

Within and Between Choices derived using Euclidean
distance


95
95.5
96
96.5
97
97.5
98
98.5
99
99.5
100
DeskCopy
LapCopy
DeskFree
LapFree
Condition
Percent Accuracy
1-NN
3-NN
5-NN
Receiver Operating
Characteristics (ROC curve)

Graphical representation of FAR and FRR


FAR
-

False Acceptance Rate


authenticating an imposter


FRR
-

False Rejection Rate


rejecting a valid user

Top
n

Nearest Neighbor Responses


Unweighted



each output choice counted equally


Weighted



first output choice (more valuable) is scored

higher


Receiver Operating Characteristics

(ROC curve) Implementation: Weighted



Taking n=10 W/B choice output file as input, authenticated a
user if 1 or more of the 10 choices is Within(W).


Each match was scored using the formula




score +=(10
-
j+1)






where score: 0
-
>55 , j: 1
-
>10

& choice =W


Maximum score = 55


Minimum score = 0


FRR, FAR for
i
=0
-
> 55 was calculated and ROC plotted.






Receiver Operating Characteristics

(ROC curve) Implementation: Unweighted



Taking n=
10
W/B choice output file as input, authenticated a
user if
1
or more of the
10
choices is Within(W).


Each match was scored using the formula




score +=
1






where score:
0
-
>
10
, j:
1
-
>
10

& choice =W


Maximum score =
10


Minimum score =
0


FRR, FAR for
i
=
0
-
>
10
was calculated and ROC plotted.


Laptop Copy ROC Curve

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
FRR (%)
FAR (%)
UNWEIGHTED
WEIGHTED
Desktop Copy ROC Curve

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
FRR (%)
FAR (%)
UNWEIGHTED
WEIGHTED
Laptop Free ROC Cur
ve

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
FRR (%)
FAR (%)
UNWEIGHTED
WEIGHTED
Desktop Free ROC Curve

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
FRR (%)
FAR (%)
UNWEIGHTED
WEIGHTED
2
big training, strong enrollment
authentication experiments

Train on 36 subjects and test on 18 subjects

Test

Test
Size

Train

Train
Size

FRR

FAR

Performance

Lap

Free

180
-
3825

Lap
Copy

180
-
3825

5.6%

15.0%

85.4%

Lap

Free

180
-
3825

Lap

Copy

180
-
3825

3.9%

30.4%

70.8%

Desk
Free

176
-
3825

Desk
Copy

180
-
3825

13.1%

1.4%

98.1%

Desk
Free

165
-
3576

Desk

Copy

180
-
3825

5.5%

3.0%

96.9%

Average

7.0%

9.4%

87.8%

Increased Sample Size Experiments

Test

Test
Size

Train

Train
Size

FRR

FAR

Performance

Lap

Free

180
-
2000

Lap
Copy/
Desk
Free

1571
-
2000

1.1%

33.0%

69.6%

Desk

Free

180
-
2000

Desk
Copy/
Lap
Free

1620
-
2000

10.0%

5.1%

94.5%

Lap
Free

180
-
2000

All 4
Data

Sets

2000
-
2000

3.9%

78.4%

27.8%

Desk
Free

180
-
2000

All 4
Data

Sets

2000
-
2000

9.4%

45.6%

57.4%

Average

6.1%

40.5%

62.3%

Increased Sample Size Experiments

Test

Test
Size

Train

Train
Size

FRR

FAR

Performance

Lap

Free

180
-
3825

Lap
Copy/
Desk
Free

1571
-
4000

0.0%

31.0%

70.4%

Desk

Free

180
-
3825

Desk
Copy/
Lap
Free

1620
-
4000

13.9%

2.8%

96.7%

Lap
Free

180
-
3825

All 4
Data

Sets

3330
-
4000

10.6%

77.3%

25.9%

Desk
Free

180
-
3825

All 4
Data

Sets

3330
-
4000

10.6%

50.0%

51.8%

Average

7.4%

40.3%

61.2%

Increased Sample Size Experiments

Test

Test
Size

Train

Train
Size

FRR

FAR

Performance

Lap

Free

180
-
3825

Lap
Copy/
Desk
Free

1571
-
6000

0.0%

24.5%

76.6%

Desk

Free

180
-
3825

Desk
Copy/
Lap
Free

1620
-
6000

12.8%

2.2%

97.3%

Lap
Free

180
-
3825

All 4
Data

Sets

3330
-
6000

6.1%

63.7%

38.9%

Desk
Free

180
-
3825

All 4
Data

Sets

3330
-
6000

15.6%

38.6%

62.4%

Average

8.6%

32.3%

68.8%

Increased Sample Size Experiments

Test

Test
Size

Train

Train
Size

FRR

FAR

Performance

Lap

Free

180
-
3825

Lap
Copy/
Desk
Free

1571
-
8000

1.7%

26.2%

74.9%

Desk

Free

180
-
3825

Desk
Copy/
Lap
Free

1620
-
8000

15.6%

1.7%

97.7%

Lap
Free

180
-
3825

All 4
Data

Sets

3330
-
8000

7.2%

52.8%

49.2%

Desk
Free

180
-
3825

All 4
Data

Sets

3330
-
8000

14.4%

31.8%

98.9%

Average

9.7%

28.2%

80.2%

Linguistic Features

Identify Discrepancies


Feature measurements


Duration

-

Calculates the average response time and the standard
deviation


Transition


Divided into two types


Type I
-

short transition is the time between release and next press



Type II


long transition is the time between press and the next press


Percentage
-

Expressed as a ratio of total number of occurrences over total
number of KeyStrokes


Discrepancy between 239 and 230 feature measurements



Additon of 6 other least frequent consonants feature group include (q,v,j,x,z,k)



Removal of 15 long transition (type 2) feature group include (th, st, nd, an,
in,er,es,on,at,en, or, he, re, ti, ea
)












Small training, strong enrollment
authentication experiments

BioMetric

Test

Test
Size

Train

Train
Size

FRR

FAR

Performance

Keystroke

Lap
Free

360
-
15750

Lap Copy

360
-
15750

5.28%

14.56%

85.65%

Keystroke

Desk
Free

341
-
15059

Desk Copy

360
-
15750

17.30%

1.28%

98.36%

AVERAGE

11.3%

8.0%

92.1%

Big training, strong enrollment
authentication experiments. 5000, 10000
and 20000 inter
-
class samples

BioMetric

Test

Test
Size

Train

Train
Size

FRR

FAR

Performance

Keystroke

Lap
Free

180
-
3825

Lap Copy

180
-
3825

10.00%

3.29%

96.40%

Keystroke

Desk
Free

176
-
3740

Desk
Copy

165
-
3576

21.02%

1.10%

98.01%

Keystroke

Lap
Free

180
-
3825

Lap Copy

180
-
3825

10.00%

3.29%

96.40%

Keystroke

Desk
Free

176
-
3740

Desk
Copy

165
-
3576

21.02%

1.10%

98.01%


Documentation Creation


User Manual



Technical Manual


http://utopia.csis.pace.edu/cs691/20
08
-
2009/team4/

Future Work


Experiment to determine why Laptop input is less
consistent in comparison to Desktop input


Possible reasons


Different keyboard layouts


Different body positioning during typing


Desktop are more fixed and consistent


Data should be stored in a database as opposed
individual files


Older code should be re
-
factored in order to run more
efficiently


Combine last and this semesters work into one project

Conclusion




Thank you for your time and attention