Overview Support Vector Machines (SVMs) Separating Hyperplanes

grizzlybearcroatianΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

109 εμφανίσεις

Machine Learning for Data Mining
Week 11:Support Vector Machines
Christof Monz
Overview
Week 11:Support Vector Machines
Christof Monz
1
I
Classication and generalization
I
Support vector machines
I
Lagrange multipliers
I
Dual reformulation
Support Vector Machines (SVMs)
Week 11:Support Vector Machines
Christof Monz
2
I
SVMs are used for classication
I
Similar to perceptrons they aim to nd a
hyperplane that linearly separates data points
belong to dierent classes
I
In addition SVMs aim to nd the hyperplane
that is least likely to overt the training data
Separating Hyperplanes
Week 11:Support Vector Machines
Christof Monz
3
I
Which one is better:B
1
or B
2
?Why?
I
Many other separating hyperplanes are possible
Boundaries
Week 11:Support Vector Machines
Christof Monz
4
I
Boundaries go through the points closest to the
separating hyperplane
I
Bigger margins are better!
SVM Hyperplanes
Week 11:Support Vector Machines
Christof Monz
5
I
Hyperplanes with larger margin are less likely to
overt the training data
I
Problem:nd the hyperplane that
 separates the classes
 has the largest margin
I
This is a constrained optimization problem!
I
First we need to dene the hyperplanes and
boundaries mathematically
Decision Boundary
Week 11:Support Vector Machines
Christof Monz
6
I
~x
i
is the vector of attribute values representing
instance i in the training data
I
~w is the vector of weights for all attributes
I
b is a real number representing the y-intercept
I
The decision boundary is the set of points ~x
such that:~w~x+b =0
Decision Boundary
Week 11:Support Vector Machines
Christof Monz
7
I
If two points ~x
a
and ~x
b
lie on the decision
boundary,then ~w ~x
a
+b =~w ~x
b
+b =0
I
Consequently ~w (~x
b
~x
a
) =0,where ~x
b
~x
a
is
a vector parallel to the decision boundary
I
Since ~w (~x
b
~x
a
) =0,~w must be perpendicular
to the decision boundary
I
For any point ~x
s
above the decision boundary
(circles in the example),it must hold that
~w ~x
s
+b >0 and for any point below the
boundary that ~w ~x
s
+b <0
Decision Boundary
Week 11:Support Vector Machines
Christof Monz
8
I
It is convenient to represent classes by +1 and
1 using
y =

1;if ~w~x+b >0
1;if ~w~x+b <0
I
~w can be rescaled such that for all points ~x lying
on the respective boundaries it holds that
~w~x+b =1 or ~w~x+b =1
These points are called the support vectors
I
d is the distance between the decision boundary
and the margins and it is dened as d =
2
jj~wjj
Learning a Linear SVM
Week 11:Support Vector Machines
Christof Monz
9
I
The task of learning a linear SVM consists of
estimating the parameters ~w and b
I
The rst criterion is that all points in the
training data must be classied correctly:
~w~x
i
+b 1 if y
i
=1
~w~x
i
+b 1 if y
i
=1
This can be re-written as:
y
i
(~w~x
i
+b) 1 for 1 i N
Learning a Linear SVM
Week 11:Support Vector Machines
Christof Monz
10
I
The second SVM learning criterion is that the
margins should be as large as possible
I
We want to minimize f (~w) =
1
2
jj~wjj
2
i.e.min
~w
1
2
jj~wjj
2
I
This minimization is subject to the constraint
y
i
(~w~x
i
+b) 1 for 1 i N
I
Constrained minimization (optimization) can be
solved with Lagrange multipliers
Lagrange Multipliers
Week 11:Support Vector Machines
Christof Monz
11
I
Lagrange multipliers are used to solve
optimization problems with linear equality and
inequality constraints
I
The function of a Lagrange multiplier is to
incorporate the constraints into the expression
one wants to minimize or maximize
I
Given a function f (~x) and a number of
constraints g
i
(~x) =0,the Lagrangian is dened
as:L(~x;) =f (~x) +

m
i=1

i
g
i
(~x)
Lagrange Multipliers
Week 11:Support Vector Machines
Christof Monz
12
I
The Lagrangian L(~x;) =f (~x) +

m
i=1

i
g
i
(~x) is
solved in two steps

L
x
i
=0 for 1 i n

L

i
=0 for 1 i m
I
Example:f (x;y) =x+2y with constraint
g(x;y) =x
2
+y
2
4 =0
L(x;y;) =x+2y+(x
2
+y
2
4)
Lagrange Multipliers
Week 11:Support Vector Machines
Christof Monz
13
I
L(x;y;) =x+2y+(x
2
+y
2
4)
I
L
x
=1+2x =0
L
y
=2+2y =0
L

=x
2
+y
2
4 =0
I
Solving for the respective variables we can
compute the minimum for f (x;y)
Learning a Linear SVM
Week 11:Support Vector Machines
Christof Monz
14
I
min
~w
1
2
jj~wjj
2
with constraints y
i
(~w~x
i
+b) 1 for 1 i N
I
The Lagrangian form of this problem is
L
P
=
1
2
jj~wjj
2

N

i=1

i
(y
i
(~w x
i
+b) 1)
I
L
P
w
=0 )~w=
N

i=1

i
y
i
~x
i
L
P
b
=0 )
N

i=1

i
y
i
=0
Learning a Linear SVM
Week 11:Support Vector Machines
Christof Monz
15
I
We still need to incorporate the Lagrangian
multiplication constraints:

i
0

i
(y
i
(~w~x
i
+b) 1) =0
I
For all instances where y
i
(~w~x
i
+b) 6=1 it must
be be the case that 
i
=0
I
Solving for ~w,b,and all 
i
is still a complicated
task
I
Solution:rst solve for the Lagrange multipliers
only (dual problem)
The Dual Problem
Week 11:Support Vector Machines
Christof Monz
16
I
L
P
=
1
2
jj~wjj
2
+
N

i=1

i
(y
i
(~w~x
i
+b) 1)
I
Insert ~w=
N

i=1

i
y
i
~x
i
and
N

i=1

i
y
i
=0 into the
equation above
I
Note that jj~wjj
2
=~w
T
~w
The Dual Problem
Week 11:Support Vector Machines
Christof Monz
17
1
2
N

i=1

i
y
i
~x
i
T
N

j=1

j
y
j
~x
j

N

i=1

i
(y
i
(
N

j=1

j
y
j
~x
j
~x
i
+b)1)
=
1
2
N

i=1
N

j=1

i

j
y
i
y
j
~x
i
~x
j

N

i=1

i
y
i
(
N

j=1

j
y
j
~x
j
~x
i
+b) +
N

i=1

i
=
1
2
N

i;j=1

i

j
y
i
y
j
~x
i
~x
j
(
N

i;j=1

i

j
y
i
y
j
~x
j
~x
i
+b
N

i=1

i
y
i
) +
N

i=1

i
=
1
2
N

i;j=1

i

j
y
i
y
j
~x
i
~x
j

N

i;j=1

i

j
y
i
y
j
~x
j
~x
i
+
N

i=1

i
=
1
2
N

i;j=1

i

j
y
i
y
j
~x
i
~x
j
+
N

i=1

i
The Dual Problem
Week 11:Support Vector Machines
Christof Monz
18
I
~

0
=argmax
~


1
2
N

i=1
N

j=1

i

j
y
i
y
j
~x
i
~x
j
+
N

i=1

i
with constraints 
i
0 and
N

i=1

i
y
i
=0
I
This problem can be solved with Quadratic
Programming
I
Once the 
i
values are determined we can
compute ~w and b
 ~w=
N

i=1

i
y
i
~x
i
 b is obtained from inserting the support vectors into

i
(y
i
(~w~x
i
+b) 1) =0 and solving for b
Recap
Week 11:Support Vector Machines
Christof Monz
19
I
SVMs aim to meet the balance between
correctness and generalization
I
Decision boundaries
I
Margins
I
Support vector
I
Constrained optimization
I
Lagrange multipliers
I
Dual reformulation