pptx

disturbedtonganeseΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

77 εμφανίσεις

HW 2: Precision/Recall and ROC curves


Two ways to choose thresholds:


1.
Using the
spam.test

and
spam.predictions

files, choose
20 evenly spaced thresholds



2. Fill in the following table:

Recall

Precision

.05

.1

.15

.2

.25

.3

.35

.4

.45

.5

.55

.6

.65

.7

.75

.8

.85

.9

.95

1

Example

-
1.1477009 1

-
1.1443221 1

-
1.129131 1

-
1.1260133
-
1

-
1.1192881
-
1

-
1.1122983
-
1

-
1.1112734
-
1

-
1.0922439
-
1

-
1.0913565
-
1

-
1.0859089
-
1

-
1.0531426
-
1

-
1.0216998 1

-
1.0155433
-
1

-
1.0115367
-
1

-
1.0057719
-
1

-
0.99646887
-
1

-
0.99339625 1

-
0.98958456 1

-
0.98940658 1

-
0.98541375
-
1

-
0.98435981
-
1

-
0.97857992
-
1

-
0.97583079 1

-
0.97455149
-
1

-
0.97320474
-
1

-
0.97155339 1

-
0.96926511
-
1

-
0.93217364 1

-
0.86903936 1

-
0.75218565 1

-
0.6965955

-
1

-
0.65030719


-
1

-
0.33646797

1

-
0.30010269


1

0.24367837

1

0.24367837

1

0.79731602

1

5.9464105

1

5.9464105


1

29.879049

1


Prediction Class

Prediction Class

Prediction Class

Method 1

-
1.1477009 1

-
1.1443221 1

-
1.129131 1

-
1.1260133
-
1

-
1.1192881
-
1

-
1.1122983
-
1

-
1.1112734
-
1

-
1.0922439
-
1

-
1.0913565
-
1

-
1.0859089
-
1

-
1.0531426
-
1

-
1.0216998 1

-
1.0155433
-
1

-
1.0115367
-
1

-
1.0057719
-
1

-
0.99646887
-
1

-
0.99339625 1

-
0.98958456 1

-
0.98940658 1

-
0.98541375
-
1

-
0.98435981
-
1

-
0.97857992
-
1

-
0.97583079 1

-
0.97455149
-
1

-
0.97320474
-
1

-
0.97155339 1

-
0.96926511
-
1

-
0.93217364 1

-
0.86903936 1

-
0.75218565 1

-
0.6965955

-
1

-
0.65030719


-
1

-
0.33646797

1

-
0.30010269


1

0.24367837

1

0.24367837

1

0.79731602

1

5.9464105

1

5.9464105


1

29.879049

1


Prediction Class

Prediction Class

Prediction Class

Method 2

-
1.1477009 1

-
1.1443221 1

-
1.129131 1

-
1.1260133
-
1

-
1.1192881
-
1

-
1.1122983
-
1

-
1.1112734
-
1

-
1.0922439
-
1

-
1.0913565
-
1

-
1.0859089
-
1

-
1.0531426
-
1

-
1.0216998 1

-
1.0155433
-
1

-
1.0115367
-
1

-
1.0057719
-
1

-
0.99646887
-
1

-
0.99339625 1

-
0.98958456 1

-
0.98940658 1

-
0.98541375
-
1

-
0.98435981
-
1

-
0.97857992
-
1

-
0.97583079 1

-
0.97455149
-
1

-
0.97320474
-
1

-
0.97155339 1

-
0.96926511
-
1

-
0.93217364 1

-
0.86903936 1

-
0.75218565 1

-
0.6965955

-
1

-
0.65030719


-
1

-
0.33646797

1

-
0.30010269


1

0.24367837

1

0.24367837

1

0.79731602

1

5.9464105

1

5.9464105


1

29.879049

1


Prediction Class

Prediction Class

Prediction Class

Method 2

29.879049 1

5.9464105 1

5.9464105 1

0.79731602 1

0.24367837 1

0.24367837 1

-
0.30010269 1

-
0.33646797 1

-
0.65030719
-
1

-
0.6965955
-
1

-
0.75218565 1

-
0.86903936 1

-
0.93217364 1

-
0.96926511
-
1

-
0.97155339 1

-
0.97320474
-
1

-
0.97455149
-
1

-
0.97583079 1

-
0.97857992
-
1

-
0.98435981
-
1

-
0.98541375
-
1

-
0.98940658 1

-
0.98958456 1

-
0.99339625 1

-
0.99646887
-
1

-
1.0057719
-
1

-
1.0115367
-
1

-
1.0155433
-
1

-
1.0216998 1

-
1.0531426
-
1

Prediction Class

Prediction Class

Prediction Class

Recall

Precision

0.05

0.1

0.15

0.20

...

1

-
1.0859089
-
1

-
1.0913565
-
1

-
1.0922439
-
1

-
1.1112734
-
1

-
1.1122983
-
1

-
1.1192881
-
1

-
1.1260133
-
1

-
1.129131 1

-
1.1443221 1

-
1.1477009 1






Other questions on HW 2?

Quiz 2 review sheet


More on Kernels


So far we’ve seen kernels that map instances in

n

to
instances in

z

where
z

>
n
.




One way to create a kernel: Figure out appropriate feature
space
Φ(
x
), and find kernel function
k

which defines inner
product on that space.



More practically, we usually don’t know appropriate
feature space
Φ(
x
).



What people do in practice is either:

1.
Use one of the “classic” kernels (e.g., polynomial),


or

2.
Define their own function that is appropriate for their
task, and show that it qualifies as a kernel.





How to define your own kernel


Given training data (
x
1
,
x
2
, ...,
x
n
)



Algorithm for SVM learning uses
kernel matrix

(also
called
Gram matrix
):




We can choose some function
k
,
and compute the kernel
matrix
K

using the training data.



We just have to guarantee that our kernel defines an inner
product on some feature space.



Not as hard as it sounds.





What counts as a kernel?


Mercer’s Theorem: If the kernel matrix K is “symmetric
positive definite”, it defines a kernel, that is, it defines an
inner product in some feature space.



We don’t even have to know what that feature space is!



Kernel Functions as Similarity Measures


Note that the dot product gives a measure of similarity
between two vectors:

X
i

X
j

θ

So, often we can interpret a kernel function

as measuring similarity between a
test
vector
x

and a
training
vector
x
i


in feature space.


In this view, SVM classification of
x


amounts to a weighted sum of

similarities between
x

and support vectors in feature space.

Structural Kernels


In domains such as natural language processing and
bioinformatics, the similarities we want to capture are
often structural (e.g., parse trees, formal grammars).



Compare with “bag of words” method we’re using in HW
2.



An important area of kernel methods is defining
structural
kernels

that capture this similarity (e.g., sequence
alignment kernels, tree kernels, graph kernels, etc.)

From www.cs.pitt.edu/~tomas/cs3750/kernels.ppt:


Design criteria
-

we want kernels to be


valid



Satisfy Mercer condition of positive
semidefiniteness


good



embody the “true similarity” between objects


appropriate



generalize well


efficient



the computation of
k(x,x
’) is feasible





Example:

Watson

used tree kernels and
SVMs

to classify
question types for Jeopardy! questions

From
Moschitti

et al., 2011