SUPPORT VECTOR MACHINE

Τεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 8 μήνες)

96 εμφανίσεις

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

1

|
P a g e

SUPPORT VECTOR MACHINE

INTRODUCTION

::

The number of documents on World Wide Web (Internet) is ever increasing and its growth is doubling
every day. To classify each of documents by humans is not possible and also not feasible. Managing
structure of such

huge documents is not possible so we shall discuss few methods of organizing the
data into proper structure. As well as
,

we shall look into the details of classifying new data into the

Support vector machines

(
SVM
) is

a set of r
elated supervised learning

method that can be used for

Text Classification
.

Analyz
ing

Data
.

Recognize Patterns
.

Regression Analysis
.

B
io
-
informatics.

Sig
nature/hand writing recognition
.

E
-
mail Spam
C
ategorization
.

Supervised learning

is the machine learning task of
deduc
ing a
category

from
supervised

training data.
The training data consist of a set of
training

examples
. In supervised learning, each example is a
pair

consisting of an input object and a desired output value. A supervis
ed learning algorithm analyzes the
training data and
then predicts the correct output categorization
for
given data
-
set
input.
For e.g.
Teacher teaches student to identify apple and oranges by giving some features of that. Next time when
student sees apple

or orange he can easily classify the object based on his learning from his teacher,
this is called supervised learning.

He can identify the object only if it is apple or orange, b
ut if the given
object was grapes

the student cannot identify it.

Sparse
Matrix

is the matrix containing

many values
that
are 0
. Computing many 0 in the matrix is time
consuming and utilizing lots of resources without giving optimized output. So this matrix is compressed
into
Sparse Data

which contains non
-
zero values of the Sp
arse Matrix.

It is usually 2
-

dimensional array
which contains the non
-
zero value and the position in the original matrix. By this
Sparse data
, data

is

easily compressed, and this compression almost always results in significantly less computer data
storag
e usage
.

In my project I have utilized the Support Vector Machine (SVM) for text classification. In this the new
set of input data set is classified into the given category. SVM is not used for cluste
ring the data into
new category, b
ut it classifies
data
.

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

2

|
P a g e

UNDERSTANDING SUPPORT VECTOR MACHINE (SVM) FOR

LINEARLY SEPARABLE DATA

Consider each
document to be a single dot in the figure. And dot of different color specifies different
category.

Here we have documents of two category and we have to find the boundary separating two
documents.

The
M
argin

of a linear classifier is the width by which the length of the boundary can be increased before hitting
the data points of different category. The line
is safe to pick having the highest margin between the two data
-
sets. The data points which lie on the m
argin are known as
Support Vectors
.

The next step is to find the hyper plane which best separates the two categories.
SVM does this by taking a
set of points and separating

those poin
ts using mathematical formulas. From that we can find the

positive and ne
gative hyper plane. The mathematical formula for finding hyper plane is :

(
w

x
)

+ b = +1 (positive labels)

(
w

x
)

+ b =
-
1 (negative labels)

(
w

x
)

+

b =

0 (hyperplane)

From the equation above and using linear algebra we can find the values of
w

and
b
.
Thus, we have the
model that contains the solution for
w
and b and with margin 2/

. The margin is calculated as
follow.

Margin

In SVM, this model is used to classify new data. With the

above

solutions

and calculated margin value
,
new data can

be classified into category.
The following
figure demonstrates the margin and support
vectors for
linearly separable data.

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

3

|
P a g e

Maximum margin and support vectors for the given data sets are shown in figure.

UNDERSTANDING SUPPORT VECTOR MACHINE (SVM) FOR

NON
-
LINEARLY SEPARABLE DATA

In the non
-
linearly separable plane,
data are input in an input space that cannot be separated with a
linear

hyperplane. To separate the data linearly,
we have to map the
points

to a feature space using

a
kernel method.
After the data are separated in the feature space we can map the points back to the
input space with a curvy hyper plane.
The following figure
demonst
rates the data

flow of SVM.

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

4

|
P a g e

In reality,
you will find that
most
of the
data set
s

are not as
simple

a
nd well behaved
.

There will be
some points that are on the

wrong side of the class, points that are far off from the classes, or points
that are

m
ixed together in a spiral or checkered pattern.
R
esearchers have

looked into those probl
ems
and tackled the pr
oblem to solve
the few points that are in the wrong class, SVM minimized the
following

equation to create what is called a

soft
-
margin hyper plane
.

T
he higher value of the C maximizes the margin value whereas the lower value of C lowers the margin
value.

TYPES OF KERNEL

::

Computation of various points in the feature space can be very costly because feature space can be
typically
said to be
infinite
-
dimensional
. The kernel function is used for to reduce these co
st. The
reason is that the data poin
ts appear in dot product and the kernel function are able to compute the
inner products of these points. So there is no need of mapping the points explicitly in the feature
space. By using the
kernel function

we can directly compute the data points through

inner product and
find equivalent points on the hyper plane.

The kernel functions
which are being developed for SVM are still a

research

topic
.

No appropriate

kernel has been found out which is universal for all kind of data. Anybody can develop their

own kernel
depending upon requirements.

The following are some basic types of kernel :

1.)

Polynomial kernel with degree
d
.

2.)
Radial basis function kernel with width s

Closely related to radial basis function
of
neural networks
.

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

5

|
P a g e

3.)
Sigmoid with parameter k and q

4.) Linear Kernel

K
(x,y
)= x' * y

STRENGTH OF KERNELS

::

Kernels are the
most tricky

and important part of using SVM
because it creates the kernel
matrix, which summarize all the data
.

In practice, a low degree polynomial kernel or R
B
asis
F
unction

kernel with a
reasonable width is a good initial try for most applications.

Linear kernel is considered to be the most important choice for text classification

because
-
high
-
enough feature dimension
.

There are
many ongoing

research to estima
te the kernel m
atrix.

SPARSE MATRIX AND SPARSE DATA ::

Sparse Matrix
is the matrix containing many values that are 0. Computing many 0 in the matrix is time
consuming and utilizing lots of resources without giving optimized output. So this matrix is compressed
into
Sparse Data

which contains non
-
zero values of the Sparse Ma
trix.

It is usually 2
-

dimensional array
which contains the non
-
zero value and the position in the original matrix. By this
Sparse data
, data

is

easily compressed, and this compression almost always results in significantly less computer data
storage usage
.

In SVM
the speed of computation decreases as it contains use of the linear regression and it contains
many values in the training set whose term frequency value is zero. So lots of time is wasted by
computing through these values.

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

6

|
P a g e

SVM algorithms speed

up
tremendously

if the data is
sparse
i.e.
it contains
many values
that
are 0
. The
reason for that is the Sparse Data compute lots of dot product and they iterate only over non
-
zero
values. So SVM can use only the sparse data during its computation so
that the memory and data
storage are less utilized and so the cost is also reduced.

Storing a sparse matrix

The
simple

data structure
used
for a matrix is a two
-
dimensional array. Each entry in the array
represents an element
a
i
,
j

of the matrix and can be

accessed by the two

indices

i

and
j
. For an
m
×
n

matrix, enough memory to store at least (
m
×
n
) entries to represent the matrix is needed.

Substantial memory requirement reductions can be realized by storing only the non
-
zero entries.
This

can
yield huge savings in
memory when
compared to a
simple

approach
. Different data structure can
be utilized d
epending on the number and distribution of the non
-
zero entries
.

Formats can be divided into two groups:

T
hose
s
upport
ing

efficient m
odification
.

Those s
upport
ing

efficient matrix operations.

The efficient modification group includes DOK, LIL, and COO and is typically used to construct
the matrix.
After

the matrix is constructed, it is typically converted to a format, such as CSR or
CSC, which is
more efficient for matrix operations
.

D
ictionary of keys (DOK)

DOK represents non
-
zero values as a dictionary
mapping (row, column
) tuples

to values.
Good method
for contructing sparse array,

but poor for iterating over non
-
zero values in sorted order
.

L
ist of lists (LIL)

LIL stores one list per row, where each entry stores a column index and value. Typically, these entries
are kept sorted by column index for faster lookup.

Coordinate list (COO)

COO stores a list of
(row, column, value
)

tuples. In this

the entries are sorted
(
row index

then
column index

value
) to improve random access times.

Yale format

The Yale Sparse Matrix Format stores an initial sparse
m
×
n

matrix,

Where
M

=

row
in

three one
-
dimensional arrays.

NNZ
=
number of nonzero entries of
M
.

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

7

|
P a g e

A
rray A

=
length
=

NNZ, and holds all nonzero entries
. Order
-
top bottom right left.

A
rray
IA
=

length
is
m

+ 1
.

IA(i) contains the index in A of the first nonzero element of row i.

Row i of the original matrix extends from A(IA(i)) to A(IA(i+1)
-
1), i.e. from the start

of one row to the last index before the start of the next.

Array

JA=
column in
dex of each element of A, length
=

NNZ.

Taking the example of

the
following and then computing various value in
matrix

to appropriate values.

[ 1 2 0 0 ]

[ 0 3 9 0 ]

[ 0 1 4 0 ]

S
o

computing it we get values as,

A = [ 1 2 3 9 1 4 ]

,
IA = [ 0 2 4 6 ]

and
JA = [ 0 1 1 2 1 2 ]
.

SUPPORT VECTOR MACHINE (SVM)

In high dimensional spaces Support Vector Machines are very effective.

When number of dimensions is greater than the number of
samples in such cases also
it is found to be very effective.

Memory Efficient because it uses subset of training points(support vectors) as decisive
factors for classification.

Versatile:
For different

decision function

we can define different kernel as l
ong as they
provide correct result
.
Depending upon our requirement we can define our own
kernel
.

:

If the number of features is much greater than the n
umber of samples, the method is

l
ikely to give poor performances.

It is useful for small training samples.

SVMs do not directly provide probability estimates, so these must be calculated using
indirect techniques
.

We can have
Non
-
traditional data like strings and trees as input to SVM instead of
feature
d

vectors
.

Should

select appropriate kernel for their project according to requirement

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

8

|
P a g e

DESCRIPTION OF THE EXAMPLE ::

As
shown in figure
,

we can see that

these points lay on a 1
-
dimensional plane and cannot be

separated
by a linear hyper

plane.
Following steps are followed

1.) Map into feature space.

2.) Use Polynomial kernel
Φ(X
1
) = (X
1
, X
1
^
2
)

to map points on the two dimensional plane.

3.) Comp
ute the positive , negative and zero hyperplane.

4.) We get the support vectors and the margin value from it.

From these value of the margin we can classify the new input data set into different class depending
upon their values.

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

9

|
P a g e

SOURCE CODE

FOR
1
-
DIMENSIONAL
LINEAR CLASSIFIER OF DATA IN
SVM USING POLYNOMIAL KERNEL

#include<stdio.h>

#include<conio.h>

#include<math.h>

#include<iostream.h>

void main()

{

int data_set[4][2]={{1,0},{
-
1,1},{
-
1,2},{1,3}};

int data_set_after_k
ernel[4][3];

int i,j,k,l;

float d1,d2,D,w11,w12,w1,w21,w22,w2,b1,b2,b;

//to calculate the dataset with the polynomial kernel so getting the new

//data set as class value(x) value(pow(x,2))

for(i=0 ; i<4; i++)

{

for(j=0;j<
3;j++)

{

if(j==2)

data_set_after_kernel[i][j]=data_set[i][j
-
1]*data_set[i][j
-
1];

else

data_set_after_kernel[i][j]=data_set[i][j];

}

}

clrscr();

printf("
\
n");

for(k=0;k<4;k++)

{

for(l=0;l<3;l++)

{

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

10

|
P a g e

printf("%d
\
t",data_set_after_kernel[k][l]);

}

printf("
\
n");

}

//plot this points on the feature space and now finding the

//hyperplane we will use the equation as (w.x)+b=labels

//here we have labels +1,0,
-
1.

//w1x1+w2x2
+b=+1

//w1x1+w2x2+b=+1

//w1x1+w2x2+b=
-
1

//compute the value of D to cuompute the value of 3 variable w1,w1 and b.

d1=((data_set_after_kernel[0][1]*data_set_after_kernel[1][2]*1)+(data_set_after_kernel[0][2]*data_s
et_after_kernel[3][1]*1)+(1*data_set_after_kernel[1][1]*data_set_after_kernel[3][2]));

d2=((data_set_after_kernel[0][1]*1*data_set_after_kernel[3][2])+(d
ata_set_after_kernel[0][2]*data_s
et_after_kernel[1][1]*1)+(1*data_set_after_kernel[1][2]*data_set_after_kernel[3][1]));

D=d1
-
d2;

//calculate the value of variable w1

w11=((data_set_after_kernel[0][2]*1*(
-
1*data_set_after_kernel[1][0]))+(1*data_set_a
fter_kernel[1][2]*(
-
1*data_set_after_kernel[3][0]))+((
-
1*data_set_after_kernel[0][0])*1*data_set_after_kernel[3][2]));

w12=((data_set_after_kernel[0][2]*1*(
-
1*data_set_after_kernel[3][0]))+(1*data_set_after_kernel[3][2]*(
-
1*data_set_after_kernel[1][0]))+
((
-
1*data_set_after_kernel[0][0])*1*data_set_after_kernel[1][2]));

w1=(w11
-
w12)/D;

//calculate the value of variable w2

w21=((data_set_after_kernel[0][1]*1*(
-
1*data_set_after_kernel[3][0]))+(1*data_set_after_kernel[3][1]*(
-
1*data_set_after_kernel[1]
[0]))+((
-
1*data_set_after_kernel[0][0])*1*data_set_after_kernel[1][1]));

w22=((data_set_after_kernel[0][1]*1*(
-
1*data_set_after_kernel[1][0]))+(1*data_set_after_kernel[1][1]*(
-
1*data_set_after_kernel[3][0]))+((
-
1*data_set_after_kernel[0][0])*1*data_set_a
fter_kernel[3][1]));

w2=(w21
-
w22)/D;

//calculate the variable b in the following steps

b1=(data_set_after_kernel[0][1]*data_set_after_kernel[3][2]*(
-
1*data_set_after_kernel[1][0]))+(data_set_after_kernel[0][2]*data_set_after_kernel[1][1]*(
-
1*data_se
t_after_kernel[3][0]))+(data_set_after_kernel[1][2]*data_set_after_kernel[3][1]*(
-
1*data_set_after_kernel[0][0]));

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

11

|
P a g e

b2=(data_set_after_kernel[0][1]*data_set_after_kernel[1][2]*(
-
1*data_set_after_kernel[3][0]))+(data_set_after_kernel[0][2]*data_set_after_k
ernel[3][1]*(
-
1*data_set_after_kernel[1][0]))+(data_set_after_kernel[1][1]*data_set_after_kernel[3][2]*(
-
1*data_set_after_kernel[0][0]));

b=(b1
-
b2)/D;

printf("The value of w1 is: %f
\
n",w1);

printf("The value of w2 is: %f
\
n",w2);

printf("The valu
e of b is: %f
\
n",b);

//Points of the positive y==0plane can be calculated as follows::

//w1x1+w2x2+b=+1

float data_set_positive[4][2];

for(int x=0;x<4;x++)

{

for(int y=0;y<2;y++)

{

if(y==0)

data_set_positive[x][y]=
data_set_after_kernel[x][1];

else

data_set_positive[x][y]=(1
-
b
-
(w1*data_set_after_kernel[x][1]))/w2;

}

}

//Points of the negative plane can be calculated as follows::

//w1x1+w2x2+b=
-
1

float data_set_negative[4][2];

for(int r=0;r<4;r
++)

{

for(int t=0;t<2;t++)

{

if(t==0)

data_set_negative[r][t]=data_set_after_kernel[r][1];

else

data_set_negative[r][t]=(
-
1
-
b
-
(w1*data_set_after_kernel[r][1]))/w2;

}

}

//Points of the zero plane can be calculated as fol
lows::

//w1x1+w2x2+b=0

float data_set_zero[4][2];

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

12

|
P a g e

for(int e=0;e<4;e++)

{

for(int f=0;f<2;f++)

{

if(f==0)

data_set_zero[e][f]=data_set_after_kernel[e][1];

else

data_set_zero[e][f]=(
-
b
-
(w1*data_set_after_kernel[e][1]))/w2;

}

}

//printing the hyperplane points as follows::

printf("
\
n");

for(k=0;k<4;k++)

{

for(l=0;l<2;l++)

{

printf("%f
\
t",data_set_positive[k][l]);

}

printf("
\
n");

}

printf("
\
n");

for(k=0;k<4;k++)

{

for(l=0;l<2;l++)

{

printf("%f
\
t",data_set_negative[k][l]);

}

printf("
\
n");

}

printf("
\
n");

for(k=0;k<4;k++)

{

for(l=0;l<2;l++)

{

printf("%f
\
t",data_set_zero[k][l]);

}

printf("
\
n");

}

//calculating the margin for these dataset we get the following.

//we will use the following formula for calculating the margin.

CS267 TOPICS IN DATABASE SYSTEMS SUPPORT VECTOR MACHINE PARIN SHAH

PROF. T.Y. LIN

(SVM)

007332832

13

|
P a g e

//2/SQRT(w.w)

float margin;

margin=2/sqrt((pow(w1,2)+pow(w2,2)));

printf("
\
n The margin for the given dataset is : %f",
margin);

}

REFERENCES ::

1.)
.

2.)
http://www.wikipedia.com/
.

3.)

http://www.support
-
vector.net/icml
-
tutorial.pdf/

4.)

http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf/

5.) http://en.wikipedia.org/wiki/Support_vector_machine/
.

6.) http://en.wikipedia.org/wiki/Sparse_data/.