1
How to use the D
2
K Support Vector Machine (SVM) MATLAB
toolbox
This MATLAB toolbox consist of several m

files designed to perform two
tasks:
1.
Multiclass classification
2.
Regression
1
How to do a SVM multiclass classification
SVM multiclass classification is
done in three steps:
1.
Loading and normalizing data (function
normsv
)
2.
Training of SVM (function
multiclass)
3.
Obtaining the results (function
classify)
1.1
Loading and normalizing data
After loading the data to MATLAB, it is necessary to normalize them to the
c
orresponding form for the specified kernel function. Function
normsv
can be
used for this purpose (
Table
1

1
).
Usage:
[X A B] = normsv(X,kernel,isotropic)
Parameters:
X
data to be normalized
kernel
kernel type ('linear','poly',
'rbf')
isotropic
isotropic (1), or anisotropic (0,default)
scaling.
Returned values:
X
normalized data
A, B
These matrices can be used for
normalizing another data in the same way
as X:
n = size(X1,2);
for i=1:n
X1(:,i) = A(i)*X1(:,i) + B(i);
en
d
Table
1

1
Parameters and returned values of function
normsv
X
is matrix. Every row represents one training input (
Table
1

2
):
12.4
34.5
...
16.3
X
1
23.4
15.2
...
13.1
X
2
...
...
..
.
...
...
26.1
23.5
...
9.1
X
n
Table
1

2
Format of the matrix
X
(input vectors)
Variable
kernel
represents the type of kernel function; the data will be
normalized specifically for this kernel. Three types
of kernel functions are
2
available in this version : linear ('linear') , polynomial ('poly') and Gausian RBF
('rbf'). Description of these kernel functions can be found in section
3
.
Parameter
isotropic
determines the type of scal
ing that will be used. Data are
normalized to the range <

1,1>, for the three mentioned kernels. If isotropic
scaling is used, every column is normalized with the same normalizing
coefficients. On the other hand, anisotropic scaling uses for every column
different coefficients. It means, that after normalization, each component of
the input vector will have the same importance.
1.2
Training of SVM for multiclass classification
Function
multiclass
performs training of SVM for
multiclass
classification
(
Table
1

3
):
Usage:
[b,alpha,Ynew,cl] = multiclass(X,Y,kernel,p1,C,filename)
Parameters:
X
Training inputs (normalized)

matrix
Y
Training targets

vector
kernel
type of kernel function, default 'linear', see
3
.
p1
parameter of the kernel function, default 1
C
non separable case, default
Inf
filename
name of the log file, default
'multiclasslog.txt' . Here are written the
messages during the training.
Returned values:
alpha
Matrix of Lagrange
Multipliers
b
vector of bias terms
Ynew
Modified training outputs
cl
Number of classes
Table
1

3
Parameters and returned values of function
multiclass
Training inputs should be in the form described
in section
1.1
and
normalized. Vector of training targets consists of one column. Values of
Y
1
to
Y
n
are integers from the range < 1,
cl
>, where
cl
is number of the classes.
Y
i
determines to which class belongs vector
X
i
.
Pa
rameter
C
is a real number from range ( 0,
Inf
>. If it is not possible to
separate classes using specific kernel function, or it is needed to accept
prticular error in order to improve the generalization, the constraints have to
be relaxed. Setting
C < In
f
relaxes the constraints and certain number of
wrong classifications will be accepted. .
It is possible to check the status of the training process. Short message about
each important operation is written to the file specified by parameter
filename
.
Dur
ing the training process, only viewing of the file is possible

editing of the
file would cause sharing violation and crash of the training.
Function
multiclass
returns several values.
Alpha
is a matrix of Lagrangian
multiplyers and has a form described
in
Table
1

4
.
3
size
m
(number of classes)
size
n
(number of
input vectors
alpha_X
1
_cl
1
alpha_X
1
_cl
2
...
alpha_X
1
_cl
m
alpha_X
2
_cl
1
alpha_X
1
_cl
2
...
alpha_X
2
_cl
m
...
...
...
...
alpha_X
n
_cl
1
alpha_X
1
_cl
2
...
alpha_X
n
_cl
m
Table
1

4
Format of the matrix
alpha
(lagrangian multiplyers)
Each column of matrix
alpha
is a set of Lagrangian multiplyers and can be
used to separate points of class cl
i
from all other points in the trainin
g set. If
alpha_X
i
_cl > 0, vector X
i
is called support vector for this particular
classification (Vectors
X
for which are
alphas
equal to zero in all columns
could be removed and the results will remain the same). Please see
[Burges
99].
Returned value
b
represents a bias term. It is a column vector and its length
is
m
. Each element of
b
(
b
i
) belongs to the corresponding set
alpha_X_cl
i
.
In
this version are parameters
b
i
not equal to zero only if linear kernel is used.
By using polynomial and gausian
RBF kernel , separation planes in the
transformed space will always contain origin ([Burges 99]. In that case is bias
term always equal to zero.
Ynew
is a transformed vector
Y.
The transformation is shown at an example
(4 clases,
Table
1

5
).
Y
Ynew
cl. 1
cl. 2
cl. 3
cl. 4
4

1

1

1
1
2

1
1

1

1
1
1

1

1

1
...
...
...
...
...
3

1

1
1

1
Table
1

5
Format of the matrix
Ynew
(modified output)
Multiclass classific
ation is done by decomposition of the problem to
n
two

class classifications, where
n
is number of the classes.
Multiclass
calculates
Lagrange multipliers and bias terms for separating each class from all other
classes. For each two

class classification, f
unction
svcm
is called.
This
function is based on the algorithm described in [Burges 99].
If there is only two

class classification needed, function
svcm
could be called
directly (
Table
1

6
).
Usage:
[b,alpha,h] = svcm(X,Y,kernel,C,p
1,filename,h)
Parameters and
returned values:
X, kernel, C,
p1, alpha, b
Meaning of this parameters is the same
as by function
multiclass
filename
default is 'classlog.txt'
Y
Vector of targets has a slightly different
4
form than the one used for
mu
lticlass
. It is
one column vector with length
n
(number
of training inputs).
Y
i
is 1 when
X
i
belongs
to class 1, or

1 when
X
i
belongs to class
2 ,or better, do not belong to class 1
h
if possible, this raw Hessian (see section
3.1) should be used. I
f not, new one is
calculated (to avoid multiplex calculation
of h. Usualy used for calculations on the
same training set but with different
parameters

but not different kernel
parameters!!)
Table
1

6
Parame
ters and returned values of function
svcm
1.3
Obtaining the results.
After training of the SVM, the results could be obtained using function
classify
(
Table
1

7
)
.
Usage:
[res,res1,Rawres] =
classify(Xtest,Xtrain,Ynew,alpha,b,kernel,p1
)
Parameters:
Xtest
Testing inputs

matrix, normalized the
same way as Xtrain (see
1.1
)
Xtrain
Training inputs

matrix
Ynew
Training outputs

in modified form,
output from
multiclass
= matrix(!)
alpha
Matrix of Lagra
nge Multipliers for each
discrimination plane, every column is one
set
output from
multiclass
b
vector of bias terms ,
output from
multiclass
kernel
type of kernel function (must be the same
as for training!)
p1
parameter of the kernel (se
e
3
), must be
the same as for training
Returned
values:
Rawres
matrix of distances of points Xtest from
discrimination lines for each class (in
transformed space)
res
transformed Rawres, points of Xtest are
classified with 1 o
r 0 for each class

matrix
res1
Another format of res, the same as Y for
function
multiclass
Table
1

7
Parameters and returned values of function
classify
This function calculates the distance of each poi
nt
Xtest
i
from the
discrimination plane in the transformed space. Please note, that it is not a
perpendicular distance in the space of
Xtest
but the distance in the
transformed multidimensional space. Depending on the kernel function, the
5
dimension of tra
nsformed space is usualy much greather than the dimension
of input vectors. Acording to the theory of SVM, the discrimination surface in
the transformed space is a plane. Matrix of distances
Rawres
is in the form
described in
Table
1

8
.
size
n
test
(number of input vectors in
Xtest
)
size
m
(number of
classes)
distance_Xtest
1
_cl
1
distance_Xtest
2
_cl
1
...
distance_Xtest
n
_cl
1
distance_Xtest
1
_cl
2
distance_Xtest
2
_cl
2
...
distance_Xtest
n
_cl
1
...
...
...
...
distance_Xtest
1
_c
l
m
distance_Xtest
2
_cl
m
...
distance_Xtest
n
_cl
m
Table
1

8
Format of the matrix
Rawres
The distances could be both, positive or negative. This reflects the position of
the point to the discrimination plane (c
lass
cl
i
or not class
cl
i
). After this,
following procedure is applied:
1.
If all distances for the point
Xtest
i
are negative:
Xtest
i
belongs to the class
with the smallest distance (in absolut value).
2.
If there are two and more positive distances:
Xtest
i
belongs to the class
with the largest distance (in absolute value) .
Figure
2

1
illustrates the situation.
If
Xtest
i
is located in some of the
problematic areas, this conflict must be solved. If more than three classes are
used,
problematic areas are overlaping in a more complicated way. However,
conflicts are solved in the same way.
2
How to use SVM for regression
SVM regression is as well as the SVM multiclass classification done in three
steps:
1.
Loading and normalizing data (f
unction
normsv
)
2.
Training of SVM (function
svrm)
3.
Obtaining the results (function
regress)
Process of SVM regression is very similar to SVM classification. Anyway,
there are some important differences.
2.1
Loading and normalizing data
This step is exactly
the same as for SVM multiclass classification. See
section
1.1
, please.
6
Figure
2

1
SVM
three

class classification, the conflicts solving
2.2
Training of SVM for regression
Trai
ning of SVM for regression is performed by function
svrm
(
Table
2

1
). This
routine performs similar function as
svcm
, but there are some differences.
Svrm
is based on the algorithm described in [Smola 98].
Matrix of inputs X shoul
d be in the form described in section
1.1
. Y is a vector
of real numbers, where
Y
i
=
f(X
i
)
(
X
i
is a input vector). Information about
different availiable kernel functions and their parameters can be found in
section 3.1.
Usa
ge:
[b, beta,H] = svrm(X,Y,kernel,p1,C,loss,e,filename,h)
Parameters:
X
Training inputs (normalized)

matrix
Y
Training targets

vector of real numbers(!)
kernel
type of kernel function, availiable: 'linear', 'poly',
'rbf', default 'linear', see
3
.
p1
parameter of the kernel function, default 1
C
penalty term, default
Inf
loss
type of loss function,'ei' ε

楮ien獩瑩seⰧ,uad✠
quad牡瑩挬efau汴‧l椧
=
=
e
=
楮獥n獩瑩s楴iⰠdefau汴‰.0
=
=
f楬ename
=
name映瑨e=g=f楬eⰠdef
au汴‧牥l牥獳汯rx琧t⸠䡥牥=
a牥⁷物瑴en⁴he=獳sge猠su物ng⁴he=a牮楮g.
=
=
h
=
楦=po獳楢汥Ⱐ瑨楳慷⁈=獳楡s=瑲tx
獥e=捴con=
problematic
area
problematic
area
problema
tic
area
problematic
area
7
3.1) should be used. If not, new one is calculated
(to avoid multiplex calculation of h. Usualy used for
calculations on the s
ame training set but with
different parameters

but not with different kernel
parameters!!)
Returned
values:
b
bias term

scalar
beta
vector of differences of Lagrange Multipliers
H
Not normalized and not adjusted Hessian matrix. If
there was h in
the input parameters, H == h. If not,
H was calculated during the run of
svrm.
Table
2

1
Parameters and returned values of function
svrm
Parameters
e, C
and
loss
are strongly interconnected. Parameter
loss
defines the loss function. Loss function determines, how will be penalized the
SVM's error. There are implemented two types of loss functions (
Figure
2

2
).
Figure
2

2
Im
plemented loss functions
If
ε

insensitive loss function
is used, errors between

e
and
e
are ignored. If
C=Inf is set, regresion curve will follow the training data inside of the margin
determined by
e
(
Figure
2

3
).
C
is number fro
m range ( 0 ,
Inf
>. If
C
<
Inf
is set, constraints are relaxed and
regression curve need not to remain in the margin determined by
e
. In some
cases (for example in case of defective data) this leads to generality
improvement. If kernel with not infinite V
C dimension is used, it could be
necessary to relax constraints. It might by not possible to calculate the
regression curve following the training data in the margin
2e.
Parameter
C
determines angle λ of the loss function (
Figure
2

2
). For
C
=
Inf
, λ=90˚ (no
error out of +e

e is tolerated) and for
C
= 0, λ = 0˚ (every error is accepted).

e
e
error
penalty
Errors in this
area are ignored
λ=f(C)
ε

insensitive loss
function
error
penalty
quadratic loss function
Shape of the curve is
influenced by C
8
Figure
2

3
SVM regres
sion, 'ei' loss function,
e
= 0.2,
C
= Inf
Quadratic loss function
penalizes every error. It is recommended to use this
loss function. If using this function, memory requirements are four times less,
than if
ε

intensitive loss function is used. Paramete
r
e
is not used for this
function and can be set to arbitrary value (0.0 is prefered).
Figure
2

4
SVM regression, 'quad' loss function,
C
= 10
2e
Training
data
SVM
Training
data
SVM
C = 10
9
Figure
2

5
SVM regression, 'quad' loss function,
C
= 0.5
Figures
Figure
2

4
and
Figure
2

5
illustrate the influence of parameter
C
. It is
recommended to use
C
from range ( 0, 100 > In this range is SVM most
sen
sible for change of the value of
C
. Larger values of
C
influence SVM very
similar as
C
= Inf does.
It is possible to check the status of the training process. Short message about
each performed operation is written to the file specified by parameter
file
name
. During the training process, only viewing of the file is possible

editing of the file would cause sharing violation and crashing of training.
Returned values
b
(scalar) and
beta
(column vector) represent output of SVM
according to [Smola 98].
2.3
Obta
ining results using
regress
Results of SVM regression can be obtained using function
regress
(
Table
2

2
)
.
Usage:
[Ytest,k] = regress(Xtrain,Xtest,b,beta,kernel,p1,k)
Parameters:
Xtrain
Training inputs

matrix that was
used for training)
Xtest
Testing inputs

matrix (normalized the same way
as Xtrain)
kernel
kernel function (must be the same as for
training),see
3
p1
parameter of the kernel, default 1 , but must be the
same as for
trainning
beta
Difference of Lagrange Multipliers,
output from
svrm
b
bias term
output from
svrm
k
matrix of dot products of Xtrain and Xtest. This
matrix should be used it is availiable from previous
calculation. If it's not, during run of this fu
nction will
Tra
ining
data
SVM
C = 0.5
10
be calculated new one and returned on the output.
For more details, see also section
3.2
.
Returned
values:
Ytest
testing output, one column vector.
k
If there was k availiable as an input parameter, the
same m
atrix is returned on the output. Otherwise, k
is new calculated matrix
of dot products of Xtrain
and Xtest.
Table
2

2
Parameters and returned values of function
regress
Meaning of the parameters was discussed
in the previous sections.
Ytest
is a
real valued one column vector of results. Parameter
k
is described in section
3.2
3
Important SVM subfunctions
3.1
Function product
This function calculates different types of dot products of v
ectors, depending
on the kernel function (
Table
3

1
).
Usage:
h = product(X,kernel,p1)
Parameters:
X
matrix of inputs
kernel
type of kernel function
'linear' = usual dot product
'poly' = p1 is degree of polynomial
'rbf'
= p1 is width of rbfs (sigma)
Returned
value:
h
matrix of dot products of input vector
Table
3

1
Parameters and returned values of function
product
This function is basic for SVM. It contains different k
ernel functions (three in
this version) used for calculation of vector's dot product in transformed space.
Used kernel function determines the properties and performance of SVM.
Matrix
X
has to be normalized for the used kernel function (see
1.1
). Every
row of X is considered as one input vector (
Table
3

2
).
0.4

0.5
...
0.3
X
1

0.4
0.2
...
0.1
X
2
...
...
...
...
...
0.1
0.5
...

0.1
X
n
Table
3

2
Form
at of the matrix
X
Matrix
h
has the form described in
Table
3

3
.
Please note, that matrix
h
is
symmetric.
11
size
n
(number of vectors
X
)
size
n
1
X
2
X
1
...
X
n
X
1
X
1
X
2
1
...
X
n
X
2
...
...
...
...
X
1
X
n
X
2
X
n
...
1
Table
3

3
Format of the matrix
h
Operation
in
Table
3

3
represents the dot product of two vectors in the
transformed space. Operation
is determined by used kernel function. There
are three different kernel functions implemented in this version (
Table
3

4
):
Parameter
Kernel type
Description
'linear'
linear
,parameter
p1
is not
used
'poly'
polynomial
, p1 i
s degree of
the polynomial
'rbf'
Gausian RBF
,p1 is width of
the RBF's
Table
3

4
Availiable kernel types
3.2
Function
product_res
Simmilar to function
product,
this function also calculat
es different types of dot
products of vectors. It is used for calculation of the results (
Table
3

5
).
Usage:
k = product_res(Xtrain,Xtest,kernel,p1)
Parameters:
Xtrain
Training inputs (normalized)
Xtest
Testing inputs (n
ormalized the same way as
Xtrain)
kernel
type of kernel function (see
3
)
p1
parameter of the kernel (see
3
)
Returned
value
k
matrix of dot products
Table
3

5
Parameters and returned values of function
product_res
Format of parameters
Xtrain
and
Xtest
was described in previous sections.
Returned value
k
is a matrix and is described in
Table
3

6
.
12
size
nte
(number of
vectors
Xtest
)
size
ntr
(number of
vectors
Xtrain
)
Xtrain
1
X瑥獴
1
Xtrain
1
X瑥獴
2
...
Xtrain
1
X瑥獴
nte
Xtrain
2
X瑥獴
1
Xtrain
2
X瑥獴
2
...
Xtrain
2
X瑥獴
nte
...
...
...
...
Xtrain
ntr
X瑥獴
1
Xtrain
ntr
X瑥獴
2
...
Xtrain
ntr
X瑥獴
nte
Table
3

6
Format of the matrix
k
Meaning of the symbol
is explained in section
3
.
4
References
Burges, CH.,1999 ,
Tutorial on Support Vector Machines for Pattern
Recognition.
This paper could be
downloaded at
http://svm.research.bell

labs.com/SVMdoc.html
Smola A.,
Schölkopf
B.,1998,
A Tutorial on Support Vector Regression
,
NeuroCOLT2 Technical Report Series, October 1998
Comments 0
Log in to post a comment