How to use the DK Support Vector Machine (SVM) MATLAB toolbox

jamaicacooperativeAI and Robotics

Oct 17, 2013 (4 years and 27 days ago)

101 views


1

How to use the D
2
K Support Vector Machine (SVM) MATLAB
toolbox


This MATLAB toolbox consist of several m
-
files designed to perform two
tasks:


1.

Multiclass classification

2.

Regression

1

How to do a SVM multiclass classification

SVM multiclass classification is

done in three steps:


1.

Loading and normalizing data (function
normsv
)

2.

Training of SVM (function
multiclass)

3.

Obtaining the results (function
classify)

1.1

Loading and normalizing data

After loading the data to MATLAB, it is necessary to normalize them to the
c
orresponding form for the specified kernel function. Function
normsv
can be
used for this purpose (
Table
1
-
1
).


Usage:

[X A B] = normsv(X,kernel,isotropic)

Parameters:

X

data to be normalized


kernel

kernel type ('linear','poly',
'rbf')


isotropic

isotropic (1), or anisotropic (0,default)
scaling.

Returned values:

X

normalized data


A, B

These matrices can be used for
normalizing another data in the same way
as X:

n = size(X1,2);

for i=1:n

X1(:,i) = A(i)*X1(:,i) + B(i);

en
d

Table
1
-
1

Parameters and returned values of function
normsv

X

is matrix. Every row represents one training input (
Table
1
-
2
):


12.4

34.5

...

16.3

X
1

23.4

15.2

...

13.1

X
2

...

...

..
.

...

...

26.1

23.5

...

9.1

X
n

Table
1
-
2

Format of the matrix
X

(input vectors)

Variable
kernel

represents the type of kernel function; the data will be
normalized specifically for this kernel. Three types
of kernel functions are

2

available in this version : linear ('linear') , polynomial ('poly') and Gausian RBF
('rbf'). Description of these kernel functions can be found in section
3
.


Parameter
isotropic

determines the type of scal
ing that will be used. Data are
normalized to the range <
-
1,1>, for the three mentioned kernels. If isotropic
scaling is used, every column is normalized with the same normalizing
coefficients. On the other hand, anisotropic scaling uses for every column

different coefficients. It means, that after normalization, each component of
the input vector will have the same importance.

1.2


Training of SVM for multiclass classification


Function
multiclass

performs training of SVM for
multiclass

classification
(
Table
1
-
3
):


Usage:

[b,alpha,Ynew,cl] = multiclass(X,Y,kernel,p1,C,filename)

Parameters:

X

Training inputs (normalized)
-

matrix


Y

Training targets
-

vector


kernel

type of kernel function, default 'linear', see
3
.


p1

parameter of the kernel function, default 1


C

non separable case, default
Inf


filename

name of the log file, default
'multiclasslog.txt' . Here are written the
messages during the training.

Returned values:

alpha

Matrix of Lagrange

Multipliers


b

vector of bias terms


Ynew

Modified training outputs


cl

Number of classes

Table
1
-
3

Parameters and returned values of function
multiclass

Training inputs should be in the form described

in section
1.1

and
normalized. Vector of training targets consists of one column. Values of
Y
1

to
Y
n

are integers from the range < 1,
cl
>, where
cl

is number of the classes.
Y
i

determines to which class belongs vector
X
i
.


Pa
rameter
C
is a real number from range ( 0,
Inf
>. If it is not possible to
separate classes using specific kernel function, or it is needed to accept
prticular error in order to improve the generalization, the constraints have to
be relaxed. Setting
C < In
f

relaxes the constraints and certain number of
wrong classifications will be accepted. .


It is possible to check the status of the training process. Short message about
each important operation is written to the file specified by parameter
filename
.
Dur
ing the training process, only viewing of the file is possible
-

editing of the
file would cause sharing violation and crash of the training.


Function
multiclass

returns several values.
Alpha
is a matrix of Lagrangian
multiplyers and has a form described

in
Table
1
-
4
.


3



size
m

(number of classes)

size
n

(number of
input vectors

alpha_X
1
_cl
1

alpha_X
1
_cl
2

...

alpha_X
1
_cl
m

alpha_X
2
_cl
1

alpha_X
1
_cl
2

...

alpha_X
2
_cl
m


...


...


...


...

alpha_X
n
_cl
1

alpha_X
1
_cl
2

...

alpha_X
n
_cl
m

Table
1
-
4

Format of the matrix
alpha

(lagrangian multiplyers)

Each column of matrix
alpha

is a set of Lagrangian multiplyers and can be
used to separate points of class cl
i

from all other points in the trainin
g set. If
alpha_X
i
_cl > 0, vector X
i

is called support vector for this particular
classification (Vectors
X

for which are
alphas

equal to zero in all columns
could be removed and the results will remain the same). Please see
[Burges

99].


Returned value
b

represents a bias term. It is a column vector and its length
is
m
. Each element of
b

(
b
i

) belongs to the corresponding set
alpha_X_cl
i
.
In
this version are parameters
b
i

not equal to zero only if linear kernel is used.
By using polynomial and gausian
RBF kernel , separation planes in the
transformed space will always contain origin ([Burges 99]. In that case is bias
term always equal to zero.


Ynew
is a transformed vector
Y.
The transformation is shown at an example
(4 clases,
Table
1
-
5
).


Y


Ynew



cl. 1

cl. 2

cl. 3

cl. 4

4


-
1

-
1

-
1


1

2


-
1


1

-
1

-
1

1



1

-
1

-
1

-
1

...


...

...

...

...

3


-
1

-
1


1

-
1

Table
1
-
5

Format of the matrix
Ynew
(modified output)

Multiclass classific
ation is done by decomposition of the problem to
n

two
-
class classifications, where
n

is number of the classes.
Multiclass

calculates
Lagrange multipliers and bias terms for separating each class from all other
classes. For each two
-
class classification, f
unction
svcm

is called.
This
function is based on the algorithm described in [Burges 99].

If there is only two
-
class classification needed, function
svcm

could be called
directly (
Table
1
-
6
).


Usage:

[b,alpha,h] = svcm(X,Y,kernel,C,p
1,filename,h)

Parameters and
returned values:

X, kernel, C,
p1, alpha, b

Meaning of this parameters is the same
as by function
multiclass


filename

default is 'classlog.txt'



Y

Vector of targets has a slightly different

4

form than the one used for
mu
lticlass
. It is
one column vector with length

n

(number
of training inputs).
Y
i

is 1 when
X
i


belongs
to class 1, or
-
1 when
X
i

belongs to class
2 ,or better, do not belong to class 1


h

if possible, this raw Hessian (see section
3.1) should be used. I
f not, new one is
calculated (to avoid multiplex calculation
of h. Usualy used for calculations on the
same training set but with different
parameters
-

but not different kernel
parameters!!)

Table
1
-
6

Parame
ters and returned values of function
svcm

1.3

Obtaining the results.


After training of the SVM, the results could be obtained using function
classify
(
Table
1
-
7
)
.


Usage:

[res,res1,Rawres] =
classify(Xtest,Xtrain,Ynew,alpha,b,kernel,p1
)

Parameters:

Xtest

Testing inputs
-

matrix, normalized the
same way as Xtrain (see
1.1
)


Xtrain

Training inputs
-

matrix


Ynew

Training outputs
-

in modified form,
output from
multiclass

= matrix(!)


alpha

Matrix of Lagra
nge Multipliers for each
discrimination plane, every column is one
set
output from
multiclass



b

vector of bias terms ,
output from
multiclass


kernel

type of kernel function (must be the same
as for training!)


p1

parameter of the kernel (se
e
3
), must be
the same as for training

Returned

values:

Rawres

matrix of distances of points Xtest from
discrimination lines for each class (in
transformed space)


res

transformed Rawres, points of Xtest are
classified with 1 o
r 0 for each class
-

matrix


res1

Another format of res, the same as Y for
function
multiclass


Table
1
-
7

Parameters and returned values of function
classify

This function calculates the distance of each poi
nt
Xtest
i

from the
discrimination plane in the transformed space. Please note, that it is not a
perpendicular distance in the space of
Xtest

but the distance in the
transformed multidimensional space. Depending on the kernel function, the

5

dimension of tra
nsformed space is usualy much greather than the dimension
of input vectors. Acording to the theory of SVM, the discrimination surface in
the transformed space is a plane. Matrix of distances
Rawres

is in the form
described in
Table
1
-
8
.



size
n
test

(number of input vectors in
Xtest
)

size
m

(number of
classes)

distance_Xtest
1
_cl
1

distance_Xtest
2
_cl
1

...

distance_Xtest
n
_cl
1

distance_Xtest
1
_cl
2

distance_Xtest
2
_cl
2

...

distance_Xtest
n
_cl
1


...


...


...


...

distance_Xtest
1
_c
l
m

distance_Xtest
2
_cl
m

...

distance_Xtest
n
_cl
m

Table
1
-
8

Format of the matrix
Rawres

The distances could be both, positive or negative. This reflects the position of
the point to the discrimination plane (c
lass
cl
i

or not class
cl
i
). After this,
following procedure is applied:

1.

If all distances for the point
Xtest
i

are negative:
Xtest
i

belongs to the class
with the smallest distance (in absolut value).

2.

If there are two and more positive distances:

Xtest
i

belongs to the class
with the largest distance (in absolute value) .


Figure
2
-
1

illustrates the situation.
If
Xtest
i

is located in some of the
problematic areas, this conflict must be solved. If more than three classes are
used,

problematic areas are overlaping in a more complicated way. However,
conflicts are solved in the same way.

2

How to use SVM for regression


SVM regression is as well as the SVM multiclass classification done in three
steps:


1.

Loading and normalizing data (f
unction
normsv
)

2.

Training of SVM (function
svrm)

3.

Obtaining the results (function
regress)


Process of SVM regression is very similar to SVM classification. Anyway,
there are some important differences.


2.1

Loading and normalizing data


This step is exactly
the same as for SVM multiclass classification. See
section
1.1
, please.





6


Figure
2
-
1

SVM

three
-
class classification, the conflicts solving

2.2

Training of SVM for regression


Trai
ning of SVM for regression is performed by function
svrm
(
Table
2
-
1
). This
routine performs similar function as
svcm
, but there are some differences.
Svrm

is based on the algorithm described in [Smola 98].


Matrix of inputs X shoul
d be in the form described in section
1.1
. Y is a vector
of real numbers, where
Y
i

=
f(X
i
)

(
X
i

is a input vector). Information about
different availiable kernel functions and their parameters can be found in
section 3.1.



Usa
ge:

[b, beta,H] = svrm(X,Y,kernel,p1,C,loss,e,filename,h)

Parameters:

X

Training inputs (normalized)
-

matrix


Y

Training targets
-

vector of real numbers(!)


kernel

type of kernel function, availiable: 'linear', 'poly',
'rbf', default 'linear', see

3
.


p1

parameter of the kernel function, default 1


C

penalty term, default
Inf


loss

type of loss function,'ei' ε
-
楮ien獩瑩seⰧ,uad✠
quad牡瑩挬⁤efau汴‧l椧
=
=
e
=
楮獥n獩瑩s楴iⰠdefau汴‰.0
=
=
f楬ename
=
name映瑨e=g=f楬eⰠdef
au汴‧牥l牥獳汯r⹴x琧t⸠䡥牥=
a牥⁷物瑴en⁴he=獳sge猠su物ng⁴he=a牮楮g.
=
=
h
=
楦=po獳楢汥Ⱐ瑨楳⁲慷⁈=獳楡s=瑲tx
獥e⁳=捴con=
problematic
area

problematic
area

problema
tic
area

problematic
area


7

3.1) should be used. If not, new one is calculated
(to avoid multiplex calculation of h. Usualy used for
calculations on the s
ame training set but with
different parameters
-

but not with different kernel
parameters!!)

Returned
values:

b

bias term
-

scalar


beta

vector of differences of Lagrange Multipliers


H

Not normalized and not adjusted Hessian matrix. If
there was h in

the input parameters, H == h. If not,
H was calculated during the run of
svrm.

Table
2
-
1

Parameters and returned values of function
svrm

Parameters
e, C

and

loss

are strongly interconnected. Parameter
loss

defines the loss function. Loss function determines, how will be penalized the
SVM's error. There are implemented two types of loss functions (
Figure
2
-
2
).

















Figure
2
-
2

Im
plemented loss functions


If
ε
-
insensitive loss function
is used, errors between
-
e

and
e

are ignored. If
C=Inf is set, regresion curve will follow the training data inside of the margin
determined by
e
(
Figure
2
-
3
).


C

is number fro
m range ( 0 ,
Inf

>. If
C
<
Inf

is set, constraints are relaxed and
regression curve need not to remain in the margin determined by
e
. In some
cases (for example in case of defective data) this leads to generality
improvement. If kernel with not infinite V
C dimension is used, it could be
necessary to relax constraints. It might by not possible to calculate the
regression curve following the training data in the margin
2e.

Parameter
C

determines angle λ of the loss function (
Figure
2
-
2
). For
C

=
Inf
, λ=90˚ (no
error out of +e
-
e is tolerated) and for
C

= 0, λ = 0˚ (every error is accepted).

-
e


e

error

penalty

Errors in this
area are ignored

λ=f(C)

ε
-
insensitive loss
function

error

penalty

quadratic loss function

Shape of the curve is
influenced by C


8


Figure
2
-
3

SVM regres
sion, 'ei' loss function,
e

= 0.2,
C

= Inf



Quadratic loss function

penalizes every error. It is recommended to use this
loss function. If using this function, memory requirements are four times less,
than if
ε
-
intensitive loss function is used. Paramete
r
e

is not used for this
function and can be set to arbitrary value (0.0 is prefered).


Figure
2
-
4

SVM regression, 'quad' loss function,
C

= 10

2e

Training

data

SVM

Training

data

SVM

C = 10


9


Figure
2
-
5

SVM regression, 'quad' loss function,
C

= 0.5

Figures
Figure
2
-
4

and
Figure
2
-
5

illustrate the influence of parameter
C
. It is
recommended to use
C

from range ( 0, 100 > In this range is SVM most
sen
sible for change of the value of
C
. Larger values of
C

influence SVM very
similar as
C

= Inf does.


It is possible to check the status of the training process. Short message about
each performed operation is written to the file specified by parameter
file
name
. During the training process, only viewing of the file is possible
-

editing of the file would cause sharing violation and crashing of training.


Returned values
b

(scalar) and
beta
(column vector) represent output of SVM
according to [Smola 98].

2.3

Obta
ining results using

regress

Results of SVM regression can be obtained using function
regress
(
Table
2
-
2
)
.



Usage:

[Ytest,k] = regress(Xtrain,Xtest,b,beta,kernel,p1,k)

Parameters:

Xtrain

Training inputs
-

matrix that was

used for training)


Xtest

Testing inputs
-

matrix (normalized the same way
as Xtrain)


kernel

kernel function (must be the same as for
training),see
3


p1

parameter of the kernel, default 1 , but must be the
same as for

trainning


beta

Difference of Lagrange Multipliers,
output from
svrm


b

bias term
output from
svrm


k

matrix of dot products of Xtrain and Xtest. This
matrix should be used it is availiable from previous
calculation. If it's not, during run of this fu
nction will
Tra
ining

data

SVM

C = 0.5


10

be calculated new one and returned on the output.
For more details, see also section
3.2
.

Returned
values:

Ytest

testing output, one column vector.


k

If there was k availiable as an input parameter, the
same m
atrix is returned on the output. Otherwise, k
is new calculated matrix
of dot products of Xtrain
and Xtest.

Table
2
-
2

Parameters and returned values of function
regress

Meaning of the parameters was discussed

in the previous sections.
Ytest

is a
real valued one column vector of results. Parameter
k

is described in section
3.2

3

Important SVM subfunctions

3.1

Function product


This function calculates different types of dot products of v
ectors, depending
on the kernel function (
Table
3
-
1
).


Usage:

h = product(X,kernel,p1)

Parameters:

X

matrix of inputs


kernel

type of kernel function

'linear' = usual dot product

'poly' = p1 is degree of polynomial

'rbf'

= p1 is width of rbfs (sigma)

Returned
value:

h

matrix of dot products of input vector

Table
3
-
1

Parameters and returned values of function
product

This function is basic for SVM. It contains different k
ernel functions (three in
this version) used for calculation of vector's dot product in transformed space.
Used kernel function determines the properties and performance of SVM.


Matrix
X
has to be normalized for the used kernel function (see
1.1
). Every
row of X is considered as one input vector (
Table
3
-
2
).


0.4

-
0.5

...

0.3

X
1

-
0.4

0.2

...

0.1

X
2

...

...

...

...

...

0.1

0.5

...

-
0.1

X
n

Table
3
-
2

Form
at of the matrix
X

Matrix
h
has the form described in

Table
3
-
3
.
Please note, that matrix
h

is
symmetric.






11


size
n

(number of vectors
X
)

size
n


1

X
2

X
1

...

X
n

X
1

X
1

X
2

1

...

X
n

X
2


...


...


...


...

X
1

X
n

X
2

X
n

...

1

Table
3
-
3

Format of the matrix
h

Operation


in
Table
3
-
3

represents the dot product of two vectors in the
transformed space. Operation


is determined by used kernel function. There
are three different kernel functions implemented in this version (
Table
3
-
4
):



Parameter

Kernel type

Description

'linear'

linear

,parameter
p1

is not
used

'poly'

polynomial

, p1 i
s degree of
the polynomial

'rbf'

Gausian RBF

,p1 is width of
the RBF's

Table
3
-
4

Availiable kernel types

3.2

Function
product_res


Simmilar to function
product,
this function also calculat
es different types of dot
products of vectors. It is used for calculation of the results (
Table
3
-
5
).


Usage:

k = product_res(Xtrain,Xtest,kernel,p1)

Parameters:

Xtrain

Training inputs (normalized)


Xtest

Testing inputs (n
ormalized the same way as
Xtrain)


kernel

type of kernel function (see
3
)


p1

parameter of the kernel (see
3
)

Returned
value

k

matrix of dot products

Table
3
-
5

Parameters and returned values of function
product_res

Format of parameters
Xtrain

and
Xtest

was described in previous sections.


Returned value
k

is a matrix and is described in
Table
3
-
6
.






12


size
nte

(number of

vectors
Xtest
)

size
ntr

(number of
vectors
Xtrain
)

Xtrain
1

X瑥獴
1

Xtrain
1

X瑥獴
2

...

Xtrain
1

X瑥獴
nte

Xtrain
2

X瑥獴
1

Xtrain
2

X瑥獴
2

...

Xtrain
2

X瑥獴
nte


...


...


...


...

Xtrain
ntr

X瑥獴
1

Xtrain
ntr

X瑥獴
2

...

Xtrain
ntr

X瑥獴
nte

Table
3
-
6

Format of the matrix
k

Meaning of the symbol


is explained in section
3
.


4

References


Burges, CH.,1999 ,
Tutorial on Support Vector Machines for Pattern
Recognition.

This paper could be

downloaded at
http://svm.research.bell
-
labs.com/SVMdoc.html



Smola A.,
Schölkopf

B.,1998,
A Tutorial on Support Vector Regression
,
NeuroCOLT2 Technical Report Series, October 1998