Support Vector Machines

spraytownspeakerΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

98 εμφανίσεις

1










S
up
p
o
r
t

V
ec
t
o
r

Ma
c
hines



The

I
n
t
e
r
f
a
ce

to

li
b
s
v
m

in

pa
c
k
age

e
1071



b
y

D
a
vid

M
e
y
e
r

T
e
c
hnis
c
he

Uni
v
e
r
s
i
t
¨
a
t

Wien,

Aus
t
r
i
a

D
a
v
i
d
.
M
e
y
e
r
@
c
i
.
t
u
w
i
e
n
.
ac
.
a
t


Se
p
t
e
m
b
e
r

12,

2012



“Hy
p
e

or

Hallelujah?”

is

the

pr
ov
o
cati
v
e

title
used

b
y

B
e
nn
e
tt

&

Cam
p
b
ell
(
2000
)

in

an

ov
erview

of

S
up
p
or
t

V
ec
t
or

Ma
c
hines

(SVM).

SVMs

are

c
u
rr
e
n
t
l
y
a

hot

t
op
ic

in

the

ma
c
hine

learning

c
om
m
un
i
t
y
,

creating

a

similar

e
n
t
h
u
s
i
as
m

at
the

m
om
e
n
t

as

Ar
t
i
fi
ci
al

Neural
Ne
t
w
or
k
s

used

to

do

b
efo
re.

F
ar

from

b
eing

a
panacea,

SVMs

y
e
t

r
e
p
r
e
s
e
n
t

a

p
ow
erful

t
e
c
hn
i
q
u
e

for

general

(nonlinear)

cl
as
s
i
-

fication,

regression

and

ou
t
lie
r

d
e
t
ec
t
i
on

wi
t
h

an

i
n
t
u
i
t
i
v
e

m
o
del

r
e
p
r
e
s
e
n
t
at
i
on
.

The

pa
c
k
age

e1071

offers

an

i
n
t
e
r
f
ac
e


to

the

aw
ard
-
winn
ing
1

C
++
-

i
m
p
le
m
e
n
t
at
i
on

b
y

Chih
-
C
h
ung

Chang

and

Chih
-
Jen

Lin,

li
b
s
v
m

(
c
u
rr
e
n
t

v
e
r
-

sion:

2.6),

f
e
at
u
r
i
n
g:




C

-

and

ν

-
cl
as
s
i
fi
c
at
i
on




one
-
class
-
classification

(
n
o
v
el
t
y

d
e
t
ec
t
i
on
)




ǫ
-

and

ν

-
r
e
gr
e
ss
i
on
and

i
n
cl
ud
e
s
:



linear,

p
olynomial,

radial

basis

f
un
c
t
i
on
,

and

sigmoidal

k
e
r
n
el
s




for
m
ula

i
n
t
e
r
f
ac
e




k
-
fold

cross

v
al
i
d
at
i
on


F
or

f
u
r
t
h
e
r

i
m
p
le
m
e
n
t
at
i
on

d
e
t
ai
l
s

on

li
b
s
v
m
,

see

Chang

&

Lin

(
2001
)
.



Basic

co
nce
pt


SVMs

w
ere

de
v
elo
p
ed

b
y

C
or
t
e
s &

V
apnik

(
1995
)

for

binary

classification.

T
h
ei
r
a
pproa
c
h

m
a
y

b
e

roughly

s
k
e
t
c
h
e
d

as

foll
o
ws:


Class

separation:
basicall
y
,

w
e

are

l
o
oking

for

the

op
t
i
m
al

separating

h
y
p
e
r
-

plane

b
e
t
w
ee
n

the

t
w
o

classes

b
y

maximizing

the

ma
r
gin
b
e
t
w
ee
n

t
h
e
cl
as
s
e
s


cl
os
e
s
t

p
oi
n
t
s

(see

Figure

1
)

the

p
oi
n
t
s

lyin
g

on

the

b
ou
nd
ar
ie
s
are

called sup
p
ort

v
e
ctors,

and

the

middle

of

the

margin

is

our

op
t
i
m
al
separating

h
y
p
e
r
p
l
an
e;


A

smaller

v
ersion

of

t
h
i
s

a
r
t
icle

ap
p
eared

in

R
-
News,

V
o
l
.
1
/
3
,

9
.
2001

1
The

library

w
on

the

IJCNN

2001

Challenge

b
y

solving

t
w
o

of

t
h
r
ee

problems:

the

G
e
n
e
r
-

a
liz
a
t
i
o
n

A
b
ili
t
y

Challenge (G
A
C)

and

the

T
ext

Dec
o
ding

Challenge

(TDC).

F
or

more

i
n
f
o
r
-

m
a
t
i
o
n,

see:

h
tt
p
://
www
.
c
s
i
e
.
n
t
u
.
e
du
.
t
w
/
~
c
jli
n
/
p
a
p
e
r
s
/ij
c
nn
.
p
s
.
g
z
.

2









O
v
erlapping

classes:

data

p
oi
n
t
s

on

t
h
e

“wrong”

side

of

the

d
i
s
c
r
i
m
i
n
a
n
t

m
ar
-

gin

are

w
ei
g
h
t
e
d

d
o
wn

to

reduce

t
h
ei
r

influence

(
“soft

m
a
r
g
i
n

)
;


N
o
n
li
n
e
ar
i
t
y
:
when

w
e

cannot

find

a

lin
e
ar

separator,

data

p
oi
n
t
s

are

p
r
o
-

j
ec
t
e
d

i
n
t
o

an

(usually)


higher
-
dimensional

space where

the

data

p
oi
n
t
s
e
ff
ec
t
i
v
el
y

b
ecome

linearly

separable

(
t
h
i
s

p
r
o
j
ec
t
i
on

is realised

via

k
e
r
n
e
l
t
e
chniques

)
;


Problem

s
ol
u
t
io
n
:
the
whole

t
as
k

can

b
e

for
m
ulated

as

a

quadratic

op
t
i
m
iz
a
-

t
i
on

problem

whi
c
h

can

b
e

sol
v
ed

b
y

kn
o
wn

t
e
c
hn
i
q
u
e
s
.


A

program

able

to

p
erform

all

t
h
e
s
e

t
as
k
s

is

called

a

Sup
p
ort

V
e
ctor

M
a
c
h
i
n
e
.


M
a
r
g
i
n



S
e
p
a
r
a
t
i
n
g

H
y
p
e
r
p
l
a
n
e


















S
u
p
p
o
r
t

V
e
c
t
o
r
s


Figure

1:

Classification

(linear

separable

case)



Se
v
eral

e
x
t
e
n
s
i
on
s

h
av
e

b
een

de
v
elo
p
ed;

the

ones

c
u
rr
e
n
t
l
y

included

in

li
b
-

s
v
m

ar
e:


ν

-
classification:

t
h
i
s

m
o
del

all
o
ws

for

more

c
o
n
t
r
ol

ov
er

the

n
u
m
b
er

of

s
up
p
or
t
v
ec
t
or
s

(see

S
c
hol
k
opf

et

al.
,

2000
)

b
y

s
p
ecifying

an

ad
d
i
t
i
on
al

p
ar
am
e
t
e
r
ν

whi
c
h

appr
o
ximates

the

f
r
ac
t
i
on

of

s
up
p
or
t

v
ec
t
or
s
;


One
-
class
-
classification:

t
h
i
s

m
o
del

t
r
ie
s

to

find

the

s
up
p
or
t

of

a

d
i
s
t
r
i
bu
t
i
on
and

t
h
us

all
o
ws

for

ou
t
lie
r
/n
o
v
el
t
y

d
e
t
ec
t
i
on
;


M
u
l
t
i
-
cl
as
s


classification:

basicall
y
,

SVMs

can

only

sol
v
e

binary

cl
as
s
i
fi
c
a
-

t
i
on

problems.

T
o

all
o
w for

m
u
l
t
i
-
cl
as
s

classification,

li
b
s
v
m

uses

t
h
e
one
-
against
-
one

t
e
c
hn
i
q
u
e


b
y

fi
tt
i
n
g


all

binary

su
b
classifiers

and

fi
nd
i
n
g
the

c
or
r
ec
t

class

b
y

a

v
oting

m
e
c
h
an
i
s
m
;


ǫ
-
r
e
g
r
e
ss
io
n
:

here,

the

data

p
oi
n
t
s

lie

in

b
etw
e
en

the

t
w
o

b
orders

of

the

m
ar
gi
n
whi
c
h

is

maximized

under

s
u
i
t
ab
le

c
on
d
i
t
i
on
s

to

av
oid

ou
t
lie
r

i
n
cl
u
s
i
on
;

3







ν

-
regression:

wi
t
h


analogue

m
o
difications

of

the

regression

m
o
del

as

in

t
h
e
classifica
tion

c
as
e
.



Usage

in

R


The

R

i
n
t
e
r
f
ac
e

to

li
b
s
v
m

in

pa
c
k
age

e1071,

s
v
m
()
,
w
as

designed

to

b
e

as
i
n
t
u
i
t
i
v
e
as

p
ossible.

M
o
dels

are

fi
tt
e
d
and

new

data

are

p
r
e
d
ic
t
e
d
as

usual,

an
d
b
oth

the

v
ector/matrix

and

the

for
m
ula

i
n
t
e
r
f
ac
e

are

i
m
p
le
m
e
n
t
e
d
.


As

e
x
p
ec
t
e
d
for

R

s

s
t
at
i
s
t
ic
al

f
un
c
t
i
on
s
,

the

engine

t
r
ie
s

to
b
e

s
m
ar
t

a
b
ou
t

the

m
o
de

to
b
e
c
hosen,

using

the

d
e
p
e
nd
e
n
t

v
ar
i
ab
le

s

t
y
p
e

(
y
):

if

y

is

a

f
ac
t
or
,
the

e
n
gi
n
e
s
wi
t
c
h
e
s

to

classification

m
o
de,

otherwise,

i
t

b
eh
av
es

as

a

regression

ma
c
h
ine;

if
y

is

om
i
tt
e
d,

the

engine

assumes

a

n
o
v
el
t
y

d
e
t
ec
t
i
on

t
as
k
.



E
x
a
mpl
e
s


In

the

foll
o
wing

t
w
o

examples,
w
e
d
e
m
on
s
t
r
at
e

the

p
r
ac
t
ic
al

use

of

s
v
m
()
along

wi
t
h


a

comparison

to

classification

and

regression

t
r
ee
s

as

i
m
p
le
m
e
n
t
e
d
in

r
p
a
r
t
()
.


C
l
a
ss
i
fica
t
i
o
n


In
t
h
i
s

example,

w
e

use

the

glass

data

from

the

UCI

R
e
p
os
i
t
or
y

of

M
a
c
h
i
n
e

Learning

Databases

f
or

classific
ation.

The

t
as
k

is

to

p
r
e
d
ic
t

the

t
y
p
e

of

a

gl
as
s
on

basis

of

i
t
s

c
hemical

analysis.

W
e

start

b
y

s
p
li
tt
i
n
g

the

data

i
n
t
o

a

train

an
d
test

s
e
t
:


>

li
b
r
a
r
y
(
e
1071
)

>

li
b
r
a
r
y
(
r
pa
r
t
)

>

da
t
a
(
G
l
a
ss
,


pa
ck
ag
e
=
"
m
l
b
e
n
c
h
"
)

>

##

s
p
lit


da
t
a


i
n
t
o


a

t
r
a
i
n


an
d

t
e
s
t


s
e
t

>

i
nd
ex


<
-


1
:
n
r
o
w
(
G
l
a
ss
)

>

t
e
s
ti
nd
ex


<
-


s
a
m
p
l
e
(
i
nd
ex
,


t
r
un
c
(
l
e
ng
t
h
(
i
nd
ex
)
/
3
))

>

t
e
s
t
s
e
t


<
-


G
l
a
ss[
t
e
s
ti
nd
ex
,
]

>

t
r
a
i
n
s
e
t


<
-


G
l
a
ss[
-
t
e
s
ti
nd
ex
,
]


Both

for

the

SVM

and

the

p
ar
t
i
t
i
on
i
n
g

t
r
ee

(via

r
p
a
r
t
()
)
,

w
e

fi
t

the

m
o
de
l

an
d
try

to

p
r
e
d
ic
t

the

test

s
e
t

v
al
u
e
s
:


>

##

s
v
m

>

s
v
m
.
m
od
e
l


<
-


s
v
m
(
T
y
p
e


~

.,

da
t
a


=

t
r
a
i
n
s
e
t
,


c
o
s
t


=

100
,


ga
mm
a


=

1
)

>

s
v
m
.
p
r
e
d


<
-


p
r
e
d
i
c
t
(
s
v
m
.
m
od
e
l
,


t
e
s
t
s
e
t
[
,
-
10
]
)


(The

d
e
p
e
nd
e
n
t

v
ariable,

Type,

has

column

n
u
m
b
er

10.

c
o
s
t

is a

g
eneral

p
e
n
al
-

izing

p
ar
am
e
t
e
r

for

C

-
classification

and

g
a
mm
a

is

the

radial

basis

f
un
c
t
i
on
-
s
p
eci
fi
c
k
ernel

p
ar
am
e
t
e
r
.
)


>

##

r
pa
r
t

>

r
pa
r
t
.
m
od
e
l


<
-


r
pa
r
t
(
T
y
p
e


~

.,

da
t
a


=

t
r
a
i
n
s
e
t
)

>

r
pa
r
t
.
p
r
e
d


<
-


p
r
e
d
i
c
t
(
r
pa
r
t
.
m
od
e
l
,


t
e
s
t
s
e
t
[
,
-
10
]
,


t
y
p
e


=

"
c
l
a
ss
"
)


A

c
r
oss
-
t
ab
u
l
at
i
on

of

the

true

v
ersus

the

p
r
e
d
ic
t
e
d

v
alues

y
iel
d
s
:

4


m
e
t
h
o
d

M
i
n.

1s
t

Q
u.

M
e
d
i
an

M
e
an

3rd

Q
u.

M
ax
.

Accuracy


s
v
m

0.
56

0.
61

0.
65

0.
64

0.
66

0.
69

r
p
ar
t

0.
36

0.
45

0.
5

0.
48

0.
52

0.
54

Kappa

s
v
m

0.
55

0.
64

0.
66

0.
66

0.
7

0.
73

r
p
ar
t

0.
4

0.
5

0.
53

0.
54

0.
59

0.
63







>

##

c
o
m
pu
t
e


s
v
m


c
on
f
u
s
i
on


m
a
t
r
i
x

>

t
ab
l
e
(
p
r
e
d


=

s
v
m
.
p
r
e
d
,


t
r
u
e


=

t
e
s
t
s
e
t
[
,
10
]
)


t
r
u
e

p
r
e
d

1

2

3

5

6

7

1

16

4

1

0

0

0

2

8

20

1

4

3

2

3

2

1

2

0

0

0

5

0

0

0

1

0

0

6

0

0

0

0

1

0

7

0

0

0

0

0

5


>

##

c
o
m
pu
t
e


r
pa
r
t


c
on
f
u
s
i
on


m
a
t
r
i
x

>

t
ab
l
e
(
p
r
e
d


=

r
pa
r
t
.
p
r
e
d
,


t
r
u
e


=

t
e
s
t
s
e
t
[
,
10
]
)


t
r
u
e

p
r
e
d

1

2

3

5

6

7

1

17

5

0

0

0

0

2

7

17

1

0

2

1

3

2

1

3

0

0

0

5

0

2

0

5

2

0

6

0

0

0

0

0

0

7

0

0

0

0

0

6








T
able

1:

P
erformance

of

s
v
m
()

and

r
p
a
r
t
()

for

classification

(10

r
e
p
lic
at
i
on
s
)



Finall
y
,

w
e

compare

the

p
erformance

of

the

t
w
o

m
e
t
h
o
d
s

b
y
c
om
pu
t
i
n
g

t
h
e
r
e
s
p
ec
t
i
v
e

accuracy

rates and

the

k
appa

indices

(as

c
om
pu
t
e
d

b
y

c
l
a
ss
A
g
r
ee
-

m
e
n
t
()

also

c
o
n
t
ai
n
e
d

in

pa
c
k
age

e1071).

In

T
able

1
,

w
e

summarize

the

r
e
s
u
l
t
s
of

10

r
e
p
lic
at
i
on
s

S
up
p
or
t

V
ec
t
or

Ma
c
hines

sh
o
w

b
etter

r
e
s
u
l
t
s
.


Non
-
linear

ǫ
-
R
e
g
r
e
ss
i
o
n


The

regression

c
apab
ili
t
ie
s of

SVMs

are

d
e
m
on
s
t
r
at
e
d

on

the

ozone

data.

Agai
n,
w
e

s
p
li
t

the

data

i
n
t
o

a

train

and

test

s
e
t
.


>

li
b
r
a
r
y
(
e
1071
)

>

li
b
r
a
r
y
(
r
pa
r
t
)

>

da
t
a
(
O
z
on
e
,


pa
ck
ag
e
=
"
m
l
b
e
n
c
h
"
)

>

##

s
p
lit


da
t
a


i
n
t
o


a

t
r
a
i
n


and

t
e
s
t


s
e
t

>

i
nd
ex


<
-


1
:
n
r
o
w
(
O
z
on
e
)

>

t
e
s
ti
nd
ex


<
-


s
a
m
p
l
e
(
i
nd
ex
,


t
r
un
c
(
l
e
ng
t
h
(
i
nd
ex
)
/
3
))

>

t
e
s
t
s
e
t


<
-


na
.
o
m
it
(
O
z
on
e
[
t
e
s
ti
nd
ex
,
-
3
]
)

5







>

t
r
a
i
n
s
e
t


<
-


na
.
o
m
it
(
O
z
on
e
[
-
t
e
s
ti
nd
ex
,
-
3
]
)

>

##

s
v
m

>

s
v
m
.
m
od
e
l


<
-


s
v
m
(
V
4


~

.,

da
t
a


=

t
r
a
i
n
s
e
t
,


c
o
s
t


=

1000
,


ga
mm
a


=

0
.
0001
)

>

s
v
m
.
p
r
e
d


<
-


p
r
e
d
i
c
t
(
s
v
m
.
m
od
e
l
,


t
e
s
t
s
e
t
[
,
-
3
]
)

>

c
r
o
ss
p
r
od
(
s
v
m
.
p
r
e
d


-

t
e
s
t
s
e
t
[
,
3
]
)


/

l
e
ng
t
h
(
t
e
s
ti
nd
ex
)


[
,
1
]
[
1
,
]


12
.
02348


>

##

r
pa
r
t

>

r
pa
r
t
.
m
od
e
l


<
-


r
pa
r
t
(
V
4


~

.,

da
t
a


=

t
r
a
i
n
s
e
t
)

>

r
pa
r
t
.
p
r
e
d


<
-


p
r
e
d
i
c
t
(
r
pa
r
t
.
m
od
e
l
,


t
e
s
t
s
e
t
[
,
-
3
]
)

>

c
r
o
ss
p
r
od
(
r
pa
r
t
.
p
r
e
d


-

t
e
s
t
s
e
t
[
,
3
]
)


/

l
e
ng
t
h
(
t
e
s
ti
nd
ex
)


[
,
1
]

[
1
,
]


21
.
03352


M
i
n.

1s
t

Q
u.

M
e
d
i
an

M
e
an

3rd

Q
u.

M
ax
.

svm

8.
08

10.
87

11.
39

11.
61

11.
99

15.
61

rpart

14.
28

17.
41

19.
68

20.
59

21.
11

30
.
22


T
able

2:

P
erformance

of

s
v
m
()

and

r
p
a
r
t
()

for

regression

(Mean

Squared

E
rr
or
,

10

r
e
p
lic
at
i
on
s
)



W
e

compare

the

t
w
o


m
e
t
h
o
d
s

b
y

the

mean

squared

error

(MSE)

see

T
ab
le

2

for

a

summary

of

10

replications.

Again,

as

for

classification,

s
v
m
()


d
o
es

a
b
etter

job

than

r
p
a
r
t
()

i
n

f
ac
t
,

m
u
c
h

b
e
tt
e
r
.



E
l
e
m
e
n
t
s


of

the

s
v
m


o
b

j
e
ct


The

f
un
c
t
i
on

s
v
m
()

r
e
t
u
r
n
s

an

o
b
j
ec
t

of

class


s
v
m

,

whi
c
h

partly

includes

t
h
e
foll
o
wing

c
om
p
on
e
n
t
s
:


S
V
:

matrix

of

s
up
p
or
t

v
ec
t
or
s

f
ou
nd
;


l
a
b
e
l
s
:

t
h
ei
r

la
b
els

in

classification

m
o
d
e;


i
nd
e
x
:

index

of

the

s
up
p
or
t


v
ec
t
or
s

in

the

input

data

(could

b
e

used

e.g.,

f
or
t
h
ei
r

visualization

as

part

of

the

data

s
e
t
)
.


If

the

cross
-
classification

feature

is

enabled,

the

s
v
m


o
b
j
ec
t


will

c
o
n
t
ai
n


s
om
e
ad
d
i
t
i
on
al

inf
ormation

descri
b
ed

b
el
o
w
.



Other

main

f
e
a
t
ur
e
s


Class

W
e
ig
h
t
i
n
g:

if

one

wishes

to

w
ei
g
h
t

the

classes

d
i
ff
e
r
e
n
t
l
y

(e.g.,

in

c
as
e
of

as
y
mm
e
t
r
ic

class

sizes

to

av
oid
p
ossibly
o
v
e
r
p
r
o
p
or
t
i
on
al

influence

of
bigger

classes

on

the

margin),

w
ei
g
h
t
s

m
a
y
b
e

s
p
ecified

in

a

v
ec
t
or

wi
t
h
named

c
om
p
on
e
n
t
s
.


In

case

of

t
w
o

classes

A

and

B,

w
e

could

use

s
om
e
t
h
i
n
g
li
k
e:

m

<
-


s
v
m
(
x
,


y
,


c
l
a
ss
.
w
e
i
gh
t
s


=

c
(
A


=

0
.
3
,


B

=

0
.
7
))

6







Cross
-
classification:

to
assess

the

q
u
al
i
t
y

of

the

t
r
ai
n
i
n
g

r
e
s
u
l
t
,

w
e

can

p
e
r
-

fo
rm

a

k
-
fold

cross
-
classification

on

the
t
r
ai
n
i
n
g

data

b
y

s
e
tt
i
n
g

the

p
a
-

r
am
e
t
e
r

c
r
o
ss

to

k

(
d
e
f
au
l
t
:
0).

The
s
v
m

o
b
j
ec
t

will

t
h
e
n

c
o
n
t
ai
n

s
om
e
ad
d
i
t
i
on
al

v
alues,

de
p
ending on

w
h
e
t
h
e
r

classification

or

regression

is

p
e
r
-

formed.

V
alues

for

cl
as
s
i
fi
c
at
i
on
:


acc
u
r
ac
i
e
s
:

v
ec
t
or

of

accuracy

v
alues

for

ea
c
h

of

the

k

p
r
e
d
ic
t
i
on
s

t
o
t
.
acc
u
r
ac
y
:

total

ac
c
u
r
ac
y


V
alues

for

r
e
gr
e
ss
i
on
:


M
S
E
:

v
ec
t
o
r

of

mean

squared

errors

for

ea
c
h

of

the

k

p
r
e
d
ic
t
i
on
s

t
o
t
.
M
S
E
:

total

mean

squared

e
rr
or

s
c
o
rr
c
o
e
f
:

Squared

correlation

c
o
e
ffi
cie
n
t

(of

the

p
r
e
d
ic
t
e
d

and

the

t
r
u
e
v
alues

of

the

d
e
p
e
nd
e
n
t

v
ar
i
ab
le
)



Tips

on

practical

us
e




Note

that

SVMs

m
a
y

b
e

v
ery

sensible

to

the

pro
p
er

c
hoice

of

p
ar
am
e
-

t
e
r
s
,

so

all
wa
ys
c
he
c
k

a

range

of

p
ar
am
e
t
e
r

co
m
binations,

at

l
e
as
t

on

a
reasonable

s
ub
s
e
t

of

y
our

d
at
a.




F
or

classification

t
as
k
s
,

y
ou

will

m
os
t

li
k
ely

use

C

-
classification

wi
t
h

t
h
e
RBF

k
ernel

(
d
e
f
au
l
t
)
,


b
ecause

of

i
t
s

g
oo
d

general

p
erformance

and

t
h
e
few

n
u
m
b
er

of

p
ar
am
e
t
e
r
s

(only
t
w
o:

C

and

γ
).

The

au
t
h
or
s

of

li
b
s
v
m
s
u
gge
s
t

to

try

small

and

large

v
alues

for

C


li
k
e

1

to

1000

fi
r
s
t
,

t
h
e
n

t
o
decide

whi
c
h

are

b
etter

for

the

data

b
y

cross

v
alidation,

and

finally

to
t
r
y
se
v
eral

γ

s

for

the

b
etter

C


s
.




H
ow
e
v
er,

b
etter

r
e
s
u
l
t
s

are
ob
t
ai
n
e
d

b
y

using

a

grid sear
c
h

ov
er

all

p
a
-

r
am
e
t
e
r
s
.

F
or

t
h
i
s
,
w
e

recommend

to

use

the

t
un
e
.
s
v
m
()

f
un
c
t
i
on

i
n
e
1071
.




Be

careful

wi
t
h

large

d
at
as
e
t
s

as

t
r
ai
n
i
n
g

t
i
m
e
s

m
a
y

increase

rather

f
as
t
.




Scaling

of

the

data

usually

d
r
as
t
ic
al
l
y


impr
ov
es

the

r
e
s
u
l
t
s
.


T
h
e
r
e
f
or
e
,

s
v
m
()

scales

the

data

b
y

d
e
f
au
l
t
.



M
o
del

F
o
r
m
ul
a
t
io
ns


and

K
e
r
ne
l
s


Dual

r
e
p
r
e
s
e
n
t
at
i
on

of

m
o
dels

i
m
p
le
m
e
n
t
e
d
:




C

-
cl
as
s
i
fi
c
at
i
on
:



m
i
n

α

1

α

Q
α



e

α

2

s.t.

0



α
i



C
,

i

=

1,

.

.

.

,

l
,

(
1)

y

α


=

0

,


where

e

is

the

un
i
t
y

v
ec
t
or
,

C

is

the

up
p
er

b
ound,

Q

is

an

l

b
y

l

p
os
i
t
i
v
e
s
e
m
i
d
e
fi
n
i
t
e

matrix,

Q
i
j




y
i

y
j

K

(
x
i

,

x
j

),

and

K

(
x
i

,

x
j

)



φ
(
x
i

)

φ
(
x
j

)

is

the

k
e
r
n
el
.

7


i

i

i







ν

-
cl
as
s
i
fi
c
at
i
on
:




m
i
n

α

1

α

Q
α

2

s.t.

0



α
i



1
/
l
,

i

=

1,

.

.

.

,

l
,

(
2)

e

α




ν
,

y

α


=

0

.

where

ν



(0,

1].




one
-
class

cl
as
s
i
fi
c
at
i
on
:




m
i
n

α

1

α

Q
α

2







ǫ
-
r
e
gr
e
ss
i
on
:

s.t.

0



α
i



1
/
(
ν

l
),

i

=

1,

.

.

.

,

l
,

(
3)

e

α


=

1

,




m
i
n

α
,
α


1

(
α



α

)

Q
(
α



α

)

+

2

l

l






ǫ

X
(
α
i


+

α
i

)

+

X

y
i

(
α
i



α
i

)

i
=1



i
=1

s.t.

0



α
i

,

α
i



C
,

i

=

1,

.

.

.

,

l
,

(
4)

l







ν

-
r
e
gr
e
ss
i
on
:

X
(
α




α

)

=

0

.

i
=1




m
i
n

α
,
α


1

(
α



α

)

Q
(
α



α

)

+

z

(
α

2

i





α

)

s.t.

0



α
i

,

α
i



C
,

i

=

1,

.

.

.

,

l
,

(
5)

e

(
α




α

)

=

0

e

(
α


+

α

)

=

C

ν

.


A
v
ailable

k
e
r
n
el
s
:




k
e
r
n
el

f
or
m
u
l
a

p
ar
am
e
t
e
r
s

li
n
e
ar
p
ol
y
n
om
i
al
radial

basis

f
c
t
.
s
i
gm
oi
d

u

v

γ
(u

v

+

c
0

)
d
e
x
p
{

γ
|
u



v
|
2

}
t
an
h
{
γ
u

v

+

c
0

}

(
n
on
e
)
γ
,

d,

c
0
γ

γ
,

c
0

8






C
o
ncl
us
io
n


W
e

ho
p
e

that

s
v
m

pr
o
vides

an

e
as
y
-
t
o
-
u
s
e

i
n
t
e
r
f
ac
e

to the

w
orld

of

SVMs,

w
h
i
c
h
n
ow
ad
a
ys

h
av
e

b
ecome a

p
opular

t
e
c
hn
i
q
u
e

in

flexible

m
o
delling.

There

ar
e
some

dr
a
wba
c
ks,

t
h
ough
:


SVMs

scale

rather

badly

wi
t
h

the
data

size

due

t
o
t
h
e
quadratic

op
t
i
m
iz
at
i
on

al
gor
i
t
h
m

and

the

k
ernel

t
r
an
s
f
orm
at
i
on
.


F
u
r
t
h
e
r
mor
e
,
the

c
or
r
ec
t

c
hoice

of

k
ernel

p
ar
am
e
t
e
r
s

is

crucial

for

ob
t
ai
n
i
n
g

g
oo
d
r
e
s
u
l
t
s
,
whi
c
h

p
r
ac
t
ic
al
l
y

means

that
an

e
x
t
e
n
s
i
v
e

sear
c
h

m
u
s
t

b
e

c
on
du
c
t
e
d

on

the

p
a
-

r
am
e
t
e
r

spac
e

b
efore

r
e
s
u
l
t
s

can

b
e

trusted,
and

t
h
i
s

of
t
e
n

complicates
the
t
as
k
(the

au
t
h
or
s

of

li
b
s
v
m

c
u
rr
e
n
t
l
y

c
on
du
c
t

some

w
ork

on

m
e
t
h
o
d
s

of
e
ffi
cie
n
t

au
-

t
om
at
ic

p
ar
am
e
t
e
r

s
elec
t
i
on
)
.

Finall
y
,

the
c
u
rr
e
n
t

i
m
p
le
m
e
n
t
at
i
on

is

op
t
i
m
ize
d
for

the

radial

basis

f
un
c
t
i
on

k
ernel

onl
y
,

whi
c
h

clearly

m
i
g
h
t

b
e

s
u
b
op
t
i
m
al

f
or
y
our

d
at
a.



R
e
f
e
r
e
nce
s


B
e
nn
e
tt
,

K.

P
.

&

Camp
b
ell,

C.

(2000).
S
up
p
or
t

v
ec
t
or

ma
c
hines:

Hy
p
e

or

h
al
-

lelujah?

SIGKDD
Explo
r
ations,

2(2).

h
tt
p
://
www
.
ac
m
.
o
r
g
/
s
i
g
s
/
s
i
gkdd
/

e
xp
l
o
r
a
ti
on
s
/i
ss
u
e
2
-
2
/
b
e
nn
e
tt
.
pd
f
.


Chang,

C.
-
C. &

Lin,

C.
-
J.

(2001).

LIBSVM:

a

library

for

s
up
p
or
t

v
ec
t
or

ma
-

c
hines
.

S
of
t
w
ar
e

a
v
ailable

at

h
tt
p
://
www
.
c
s
i
e
.
n
t
u
.
e
du
.
t
w
/
~
c
jli
n
/li
b
s
v
m
,

d
e
t
ai
le
d


d
o
c
u
m
e
n
t
at
i
on


(
al
gor
i
t
h
m
s
,


for
m
ulae,

.

.

.

)

can

b
e

found

in

h
tt
p
:

//
www
.
c
s
i
e
.
n
t
u
.
e
du
.
t
w
/
~
c
jli
n
/
p
a
p
e
r
s
/li
b
s
v
m
.
p
s
.
g
z


C
or
t
e
s
,


C.

&

V
apnik,

V.

(1995).

S
up
p
or
t
-
v
ec
t
or


n
e
t
w
or
k
.


Machine

L
e
a
r
n
i
n
g
,

20,

1

25.


S
c
hol
k
opf,

B.,

Smola,

A.,

Wi
lliamson,

R.

C.,

&
Bartlett,

P
.

(2000).

New

s
up
p
or
t
v
ec
t
or

al
gor
i
t
h
m
s
.


Neu
r
al

Computation,

12,

1207

1245.


V
apnik,

V.

(1998).

Statisti
c
al

l
e
arning

th
e
ory.

New

Y
ork:

W
ile
y
.