CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
1

P a g e
SUPPORT VECTOR MACHINE
INTRODUCTION
::
The number of documents on World Wide Web (Internet) is ever increasing and its growth is doubling
every day. To classify each of documents by humans is not possible and also not feasible. Managing
structure of such
huge documents is not possible so we shall discuss few methods of organizing the
data into proper structure. As well as
,
we shall look into the details of classifying new data into the
already present category.
Support vector machines
(
SVM
) is
a set of r
elated supervised learning
method that can be used for
Text Classification
.
Analyz
ing
Data
.
Recognize Patterns
.
Regression Analysis
.
B
io

informatics.
Sig
nature/hand writing recognition
.
E

mail Spam
C
ategorization
.
Supervised learning
is the machine learning task of
deduc
ing a
category
from
supervised
training data.
The training data consist of a set of
training
examples
. In supervised learning, each example is a
pair
consisting of an input object and a desired output value. A supervis
ed learning algorithm analyzes the
training data and
then predicts the correct output categorization
for
given data

set
input.
For e.g.
Teacher teaches student to identify apple and oranges by giving some features of that. Next time when
student sees apple
or orange he can easily classify the object based on his learning from his teacher,
this is called supervised learning.
He can identify the object only if it is apple or orange, b
ut if the given
object was grapes
the student cannot identify it.
Sparse
Matrix
is the matrix containing
many values
that
are 0
. Computing many 0 in the matrix is time
consuming and utilizing lots of resources without giving optimized output. So this matrix is compressed
into
Sparse Data
which contains non

zero values of the Sp
arse Matrix.
It is usually 2

dimensional array
which contains the non

zero value and the position in the original matrix. By this
Sparse data
, data
is
easily compressed, and this compression almost always results in significantly less computer data
storag
e usage
.
In my project I have utilized the Support Vector Machine (SVM) for text classification. In this the new
set of input data set is classified into the given category. SVM is not used for cluste
ring the data into
new category, b
ut it classifies
data
into already present categories
.
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
2

P a g e
UNDERSTANDING SUPPORT VECTOR MACHINE (SVM) FOR
LINEARLY SEPARABLE DATA
Consider each
document to be a single dot in the figure. And dot of different color specifies different
category.
Here we have documents of two category and we have to find the boundary separating two
documents.
The
M
argin
of a linear classifier is the width by which the length of the boundary can be increased before hitting
the data points of different category. The line
is safe to pick having the highest margin between the two data

sets. The data points which lie on the m
argin are known as
Support Vectors
.
The next step is to find the hyper plane which best separates the two categories.
SVM does this by taking a
set of points and separating
those poin
ts using mathematical formulas. From that we can find the
positive and ne
gative hyper plane. The mathematical formula for finding hyper plane is :
(
w
∙
x
)
+ b = +1 (positive labels)
(
w
∙
x
)
+ b =

1 (negative labels)
(
w
∙
x
)
+
b =
0 (hyperplane)
From the equation above and using linear algebra we can find the values of
w
and
b
.
Thus, we have the
model that contains the solution for
w
and b and with margin 2/
. The margin is calculated as
follow.
Margin
In SVM, this model is used to classify new data. With the
above
solutions
and calculated margin value
,
new data can
be classified into category.
The following
figure demonstrates the margin and support
vectors for
linearly separable data.
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
3

P a g e
Maximum margin and support vectors for the given data sets are shown in figure.
UNDERSTANDING SUPPORT VECTOR MACHINE (SVM) FOR
NON

LINEARLY SEPARABLE DATA
In the non

linearly separable plane,
data are input in an input space that cannot be separated with a
linear
hyperplane. To separate the data linearly,
we have to map the
points
to a feature space using
a
kernel method.
After the data are separated in the feature space we can map the points back to the
input space with a curvy hyper plane.
The following figure
demonst
rates the data
flow of SVM.
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
4

P a g e
In reality,
you will find that
most
of the
data set
s
are not as
simple
a
nd well behaved
.
There will be
some points that are on the
wrong side of the class, points that are far off from the classes, or points
that are
m
ixed together in a spiral or checkered pattern.
R
esearchers have
looked into those probl
ems
and tackled the pr
oblem to solve
the few points that are in the wrong class, SVM minimized the
following
equation to create what is called a
soft

margin hyper plane
.
T
he higher value of the C maximizes the margin value whereas the lower value of C lowers the margin
value.
TYPES OF KERNEL
::
Computation of various points in the feature space can be very costly because feature space can be
typically
said to be
infinite

dimensional
. The kernel function is used for to reduce these co
st. The
reason is that the data poin
ts appear in dot product and the kernel function are able to compute the
inner products of these points. So there is no need of mapping the points explicitly in the feature
space. By using the
kernel function
we can directly compute the data points through
inner product and
find equivalent points on the hyper plane.
The kernel functions
which are being developed for SVM are still a
research
topic
.
No appropriate
kernel has been found out which is universal for all kind of data. Anybody can develop their
own kernel
depending upon requirements.
The following are some basic types of kernel :
1.)
Polynomial kernel with degree
d
.
2.)
Radial basis function kernel with width s
Closely related to radial basis function
of
neural networks
.
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
5

P a g e
3.)
Sigmoid with parameter k and q
4.) Linear Kernel
K
(x,y
)= x' * y
STRENGTH OF KERNELS
::
Kernels are the
most tricky
and important part of using SVM
because it creates the kernel
matrix, which summarize all the data
.
In practice, a low degree polynomial kernel or R
adial
B
asis
F
unction
kernel with a
reasonable width is a good initial try for most applications.
Linear kernel is considered to be the most important choice for text classification
because
of the already

high

enough feature dimension
.
There are
many ongoing
research to estima
te the kernel m
atrix.
SPARSE MATRIX AND SPARSE DATA ::
Sparse Matrix
is the matrix containing many values that are 0. Computing many 0 in the matrix is time
consuming and utilizing lots of resources without giving optimized output. So this matrix is compressed
into
Sparse Data
which contains non

zero values of the Sparse Ma
trix.
It is usually 2

dimensional array
which contains the non

zero value and the position in the original matrix. By this
Sparse data
, data
is
easily compressed, and this compression almost always results in significantly less computer data
storage usage
.
In SVM
the speed of computation decreases as it contains use of the linear regression and it contains
many values in the training set whose term frequency value is zero. So lots of time is wasted by
computing through these values.
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
6

P a g e
SVM algorithms speed
up
tremendously
if the data is
sparse
i.e.
it contains
many values
that
are 0
. The
reason for that is the Sparse Data compute lots of dot product and they iterate only over non

zero
values. So SVM can use only the sparse data during its computation so
that the memory and data
storage are less utilized and so the cost is also reduced.
Storing a sparse matrix
The
simple
data structure
used
for a matrix is a two

dimensional array. Each entry in the array
represents an element
a
i
,
j
of the matrix and can be
accessed by the two
indices
i
and
j
. For an
m
×
n
matrix, enough memory to store at least (
m
×
n
) entries to represent the matrix is needed.
Substantial memory requirement reductions can be realized by storing only the non

zero entries.
This
can
yield huge savings in
memory when
compared to a
simple
approach
. Different data structure can
be utilized d
epending on the number and distribution of the non

zero entries
.
Formats can be divided into two groups:
T
hose
s
upport
ing
efficient m
odification
.
Those s
upport
ing
efficient matrix operations.
The efficient modification group includes DOK, LIL, and COO and is typically used to construct
the matrix.
After
the matrix is constructed, it is typically converted to a format, such as CSR or
CSC, which is
more efficient for matrix operations
.
D
ictionary of keys (DOK)
DOK represents non

zero values as a dictionary
mapping (row, column
) tuples
to values.
Good method
for contructing sparse array,
but poor for iterating over non

zero values in sorted order
.
L
ist of lists (LIL)
LIL stores one list per row, where each entry stores a column index and value. Typically, these entries
are kept sorted by column index for faster lookup.
Coordinate list (COO)
COO stores a list of
(row, column, value
)
tuples. In this
the entries are sorted
(
row index
then
column index
value
) to improve random access times.
Yale format
The Yale Sparse Matrix Format stores an initial sparse
m
×
n
matrix,
Where
M
=
row
in
three one

dimensional arrays.
NNZ
=
number of nonzero entries of
M
.
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
7

P a g e
A
rray A
=
length
=
NNZ, and holds all nonzero entries
. Order

top bottom right left.
A
rray
IA
=
length
is
m
+ 1
.
IA(i) contains the index in A of the first nonzero element of row i.
Row i of the original matrix extends from A(IA(i)) to A(IA(i+1)

1), i.e. from the start
of one row to the last index before the start of the next.
Array
JA=
column in
dex of each element of A, length
=
NNZ.
Taking the example of
the
following and then computing various value in
matrix
to appropriate values.
[ 1 2 0 0 ]
[ 0 3 9 0 ]
[ 0 1 4 0 ]
S
o
computing it we get values as,
A = [ 1 2 3 9 1 4 ]
,
IA = [ 0 2 4 6 ]
and
JA = [ 0 1 1 2 1 2 ]
.
ADVANTAGES AND DISADVANTAGES OF
SUPPORT VECTOR MACHINE (SVM)
ADVANTAGES:
In high dimensional spaces Support Vector Machines are very effective.
When number of dimensions is greater than the number of
samples in such cases also
it is found to be very effective.
Memory Efficient because it uses subset of training points(support vectors) as decisive
factors for classification.
Versatile:
For different
decision function
we can define different kernel as l
ong as they
provide correct result
.
Depending upon our requirement we can define our own
kernel
.
DISADVANTAGES
:
If the number of features is much greater than the n
umber of samples, the method is
l
ikely to give poor performances.
It is useful for small training samples.
SVMs do not directly provide probability estimates, so these must be calculated using
indirect techniques
.
We can have
Non

traditional data like strings and trees as input to SVM instead of
feature
d
vectors
.
Should
select appropriate kernel for their project according to requirement
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
8

P a g e
DESCRIPTION OF THE EXAMPLE ::
As
shown in figure
,
we can see that
these points lay on a 1

dimensional plane and cannot be
separated
by a linear hyper
plane.
Following steps are followed
1.) Map into feature space.
2.) Use Polynomial kernel
Φ(X
1
) = (X
1
, X
1
^
2
)
to map points on the two dimensional plane.
3.) Comp
ute the positive , negative and zero hyperplane.
4.) We get the support vectors and the margin value from it.
From these value of the margin we can classify the new input data set into different class depending
upon their values.
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
9

P a g e
SOURCE CODE
FOR
1

DIMENSIONAL
LINEAR CLASSIFIER OF DATA IN
SVM USING POLYNOMIAL KERNEL
#include<stdio.h>
#include<conio.h>
#include<math.h>
#include<iostream.h>
void main()
{
int data_set[4][2]={{1,0},{

1,1},{

1,2},{1,3}};
int data_set_after_k
ernel[4][3];
int i,j,k,l;
float d1,d2,D,w11,w12,w1,w21,w22,w2,b1,b2,b;
//to calculate the dataset with the polynomial kernel so getting the new
//data set as class value(x) value(pow(x,2))
for(i=0 ; i<4; i++)
{
for(j=0;j<
3;j++)
{
if(j==2)
data_set_after_kernel[i][j]=data_set[i][j

1]*data_set[i][j

1];
else
data_set_after_kernel[i][j]=data_set[i][j];
}
}
clrscr();
printf("
\
n");
for(k=0;k<4;k++)
{
for(l=0;l<3;l++)
{
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
10

P a g e
printf("%d
\
t",data_set_after_kernel[k][l]);
}
printf("
\
n");
}
//plot this points on the feature space and now finding the
//hyperplane we will use the equation as (w.x)+b=labels
//here we have labels +1,0,

1.
//w1x1+w2x2
+b=+1
//w1x1+w2x2+b=+1
//w1x1+w2x2+b=

1
//compute the value of D to cuompute the value of 3 variable w1,w1 and b.
d1=((data_set_after_kernel[0][1]*data_set_after_kernel[1][2]*1)+(data_set_after_kernel[0][2]*data_s
et_after_kernel[3][1]*1)+(1*data_set_after_kernel[1][1]*data_set_after_kernel[3][2]));
d2=((data_set_after_kernel[0][1]*1*data_set_after_kernel[3][2])+(d
ata_set_after_kernel[0][2]*data_s
et_after_kernel[1][1]*1)+(1*data_set_after_kernel[1][2]*data_set_after_kernel[3][1]));
D=d1

d2;
//calculate the value of variable w1
w11=((data_set_after_kernel[0][2]*1*(

1*data_set_after_kernel[1][0]))+(1*data_set_a
fter_kernel[1][2]*(

1*data_set_after_kernel[3][0]))+((

1*data_set_after_kernel[0][0])*1*data_set_after_kernel[3][2]));
w12=((data_set_after_kernel[0][2]*1*(

1*data_set_after_kernel[3][0]))+(1*data_set_after_kernel[3][2]*(

1*data_set_after_kernel[1][0]))+
((

1*data_set_after_kernel[0][0])*1*data_set_after_kernel[1][2]));
w1=(w11

w12)/D;
//calculate the value of variable w2
w21=((data_set_after_kernel[0][1]*1*(

1*data_set_after_kernel[3][0]))+(1*data_set_after_kernel[3][1]*(

1*data_set_after_kernel[1]
[0]))+((

1*data_set_after_kernel[0][0])*1*data_set_after_kernel[1][1]));
w22=((data_set_after_kernel[0][1]*1*(

1*data_set_after_kernel[1][0]))+(1*data_set_after_kernel[1][1]*(

1*data_set_after_kernel[3][0]))+((

1*data_set_after_kernel[0][0])*1*data_set_a
fter_kernel[3][1]));
w2=(w21

w22)/D;
//calculate the variable b in the following steps
b1=(data_set_after_kernel[0][1]*data_set_after_kernel[3][2]*(

1*data_set_after_kernel[1][0]))+(data_set_after_kernel[0][2]*data_set_after_kernel[1][1]*(

1*data_se
t_after_kernel[3][0]))+(data_set_after_kernel[1][2]*data_set_after_kernel[3][1]*(

1*data_set_after_kernel[0][0]));
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
11

P a g e
b2=(data_set_after_kernel[0][1]*data_set_after_kernel[1][2]*(

1*data_set_after_kernel[3][0]))+(data_set_after_kernel[0][2]*data_set_after_k
ernel[3][1]*(

1*data_set_after_kernel[1][0]))+(data_set_after_kernel[1][1]*data_set_after_kernel[3][2]*(

1*data_set_after_kernel[0][0]));
b=(b1

b2)/D;
printf("The value of w1 is: %f
\
n",w1);
printf("The value of w2 is: %f
\
n",w2);
printf("The valu
e of b is: %f
\
n",b);
//Points of the positive y==0plane can be calculated as follows::
//w1x1+w2x2+b=+1
float data_set_positive[4][2];
for(int x=0;x<4;x++)
{
for(int y=0;y<2;y++)
{
if(y==0)
data_set_positive[x][y]=
data_set_after_kernel[x][1];
else
data_set_positive[x][y]=(1

b

(w1*data_set_after_kernel[x][1]))/w2;
}
}
//Points of the negative plane can be calculated as follows::
//w1x1+w2x2+b=

1
float data_set_negative[4][2];
for(int r=0;r<4;r
++)
{
for(int t=0;t<2;t++)
{
if(t==0)
data_set_negative[r][t]=data_set_after_kernel[r][1];
else
data_set_negative[r][t]=(

1

b

(w1*data_set_after_kernel[r][1]))/w2;
}
}
//Points of the zero plane can be calculated as fol
lows::
//w1x1+w2x2+b=0
float data_set_zero[4][2];
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
12

P a g e
for(int e=0;e<4;e++)
{
for(int f=0;f<2;f++)
{
if(f==0)
data_set_zero[e][f]=data_set_after_kernel[e][1];
else
data_set_zero[e][f]=(

b

(w1*data_set_after_kernel[e][1]))/w2;
}
}
//printing the hyperplane points as follows::
printf("
\
n");
for(k=0;k<4;k++)
{
for(l=0;l<2;l++)
{
printf("%f
\
t",data_set_positive[k][l]);
}
printf("
\
n");
}
printf("
\
n");
for(k=0;k<4;k++)
{
for(l=0;l<2;l++)
{
printf("%f
\
t",data_set_negative[k][l]);
}
printf("
\
n");
}
printf("
\
n");
for(k=0;k<4;k++)
{
for(l=0;l<2;l++)
{
printf("%f
\
t",data_set_zero[k][l]);
}
printf("
\
n");
}
//calculating the margin for these dataset we get the following.
//we will use the following formula for calculating the margin.
CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH
PROF. T.Y. LIN
(SVM)
007332832
13

P a g e
//2/SQRT(w.w)
float margin;
margin=2/sqrt((pow(w1,2)+pow(w2,2)));
printf("
\
n The margin for the given dataset is : %f",
margin);
}
REFERENCES ::
1.)
http://xanadu.cs.sjsu.edu/~drtylin/classes/cs267/project/tam_ngo/
.
2.)
http://www.wikipedia.com/
.
3.)
http://www.support

vector.net/icml

tutorial.pdf/
4.)
http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf/
5.) http://en.wikipedia.org/wiki/Support_vector_machine/
.
6.) http://en.wikipedia.org/wiki/Sparse_data/.
Comments 0
Log in to post a comment