Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Lecture 16:Multiclass Support Vector Machines
Hao Helen Zhang
Spring,2013
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Outlines
Traditional Methods for Multiclass Problems
Onevsrest approaches
Pairwise approaches
Recent development for Multiclass Problems
Simultaneous Classication
Various loss functions
Extensions of SVM
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Traditional Methods for Multiclass Problems
Multiclass Classication Setup
Label:f1;+1g!f1;2;:::;Kg.
Classication decision rule:
f:R
d
=)f1;2;:::;Kg:
Classication accuracy is measured by
Equalcost:Generalization Error (GE)
Err(f) = P(Y 6= f (X)):
Unequalcost:the risk
R(f ) = E
Y;X
C(Y;f (X)):
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Traditional Methods for Multiclass Problems
Traditional Methods
Main ideas:
(i) Decompose the multiclass classication problem into multiple
binary classication problems.
(ii) Use the majority voting principle (a combined decision from
the committee) to predict the label
Common approaches:simple but eective
Onevsrest (onevsall) approaches
Pairwise (onevsone,allvsall) approaches
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Traditional Methods for Multiclass Problems
Onevsrest Approach
One of the simplest multiclass classier;commonly used in SVMs;
also known as the onevsall (OVA) approach
(i) Solve K dierent binary problems:classify\class k"versus
\the rest classes"for k = 1; ;K.
(ii) Assign a test sample to the class giving the largest f
k
(x)
(most positive) value,where f
k
(x) is the solution from the kth
problem
Properties:
Very simple to implement,perform well in practice
Not optimal (asymptotically):the decision rule is not Fisher
consistent if there is no dominating class (i.e.
arg max p
k
(x) <
1
2
).
Read:Rifkin and Klautau (2004)\In Defense of Onevsall
Classication"
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Traditional Methods for Multiclass Problems
Pairwise Approach
Also known as allvsall (AVA) approach
(i) Solve
K
2
dierent binary problems:classify\class k"versus
\class j"for all j 6= k.Each classier is called g
ij
.
(ii) For prediction at a point,each classier is queried once and
issues a vote.The class with the maximum number of
(weighted) votes is the winner.
Properties:
Training process is ecient,by dealing with small binary
problems.
If K is big,there are too many problems to solve.If K = 10,
we need to train 45 binary classiers.
Simple to implement;perform competitively in practice.
Read:Park and Furnkranz (2007)\Ecient Pairwise
Classication"
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
One Single SVM approach:Simultaneous Classication
Label:f1;+1g!f1;2;:::;Kg.
Use one single SVM to construct a decision function vector
f = (f
1
;:::;f
K
):
Classier (Decision rule):
f (x) = argmax
k=1;;K
f
k
(x):
If K = 2,there is one f
k
and the decision rule is sign(f
k
).
In some sense,multiple logistic regression is a simultaneous
classication procedure
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
SVM for Multiclass Problem
Multiclass SVM:solving one single regularization problem by
imposing a penalty on the values of f
y
(x) f
l
(x)'s.
Weston and Watkins (1999)
Cramer and Singer (2002)
Lee et al.(2004)
Liu and Shen (2006);multiclass learning:Shen et al.
(2003)
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Various Multiclass SVMs
Weston and Watkins (1999):
a penalty is imposed only if f
y
(x) < f
k
(x) +2 for k 6= y.
Even if f
y
(x) < 1,a penalty is not imposed as long as f
k
(x) is
suciently small for k 6= y;
Similarly,if f
k
(x) > 1 for k 6= y,we do not pay a penalty if
f
y
(x) is suciently large.
L(y;f(x)) =
X
k6=y
[2 (f
y
(x) f
k
(x))]
+
:
Lee et al.(2004):L(y;f(x)) =
P
k6=y
[f
k
(x) +1]
+
:
Crammer and Singer (2002):Liu and Shen (2006),
L(y;f(x)) = [1 min
k
ff
y
(x) f
k
(x)g]
+
.
To avoid the redundancy,a sumtozero constraint
P
K
k=1
f
k
= 0 is
sometimes enforced.
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Linear Multiclass SVMs
For linear classication problems,we have
f
k
(x) =
k
x +
0k
;k = 1; ;K:
The sumtozero constraint can be replaced by
K
X
k=1
0k
= 0;
K
X
k=1
k
= 0:
The optimization problem becomes
min
f
n
X
i =1
L(y
i
;f(x
i
)) +
K
X
k=1
k
k
k
2
subject to the sumtozero constraint.
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Nonlinear Multiclass SVMs
To achieve the nonlinear classication,we assume
f
k
(x) =
0
k
(x) +
k0
;k = 1; ;K:
where (x) represents the basis functions in the feature space F.
Similar to the binary classication,the nonlinear MSVM can
be conveniently solved using a kernel function.
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Regularization Problems for Nonlinear MSVMs
We can represent the MSVM as the solution to a regularization
problem in the RKHS.
Assume that
f(x) = (f
1
(x); ;f
K
(x)) 2
K
Y
k=1
(f1g +H
k
)
under the sumtozero constraint.
Then a MSVM classier can be derived by solving
min
f
n
X
i =1
L(y
i
;f(x
i
)) +
K
X
k=1
kg
k
k
2
H
k
;
where f
k
(x) = g
k
(x) +
0k
;g
k
2 H
k
,
0k
2 R.
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Generalized Functional Margin
Given (x;y),a reasonable decision vector f(x) should
encourage a large value for f
y
(x)
have small values for f
k
(x);k 6= y.
Dene the K 1vector of relative dierences as
g = (f
y
(x) f
1
(x); ;f
y
(x) f
y1
(x);f
y
(x) f
y+1
(x); ;f
y
(x) f
K
(x)):
Liu et al.(2004) called the vector g the generalized functional
margin of f
g characterizes correctness and strength of classication of x
by f.
f indicates a correct classication of (x;y) if
g(f(x);y) > 0
k1
:
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
01 Loss with Functional Margin
A point (x;y) is misclassied if y 6= arg max
k
f
k
(x):
Dene the multivariate sign function as
sign(u) = 1 if u
min
= min(u
1
; ;u
m
) > 0;
1 if u
min
0:
where u = (u
1
; ;u
m
).Using the functional margin,
The 01 loss becomes
I (ming(f(x);y) < 0) =
1
2
[1 sign(g(f(x);y))]:
The GE becomes to
R[f ] =
1
2
E [1 sign(g(f(x);y))]:
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Generalized Loss Functions Using Functional Margin
A natural way to generalize the binary loss is
n
X
i =1
`(ming(f(x
i
);y
i
)):
In particular,the loss function L(y;f(x)) can be expressed as
V(g(f(x);y)) with
Weston and Watkins (1999):V(u) =
P
K1
j=1
[2 u
j
]
+
:
Lee et al.(2004):V(u) =
P
K1
j=1
[
P
K1
c=1
u
c
K
u
j
+1]
+
:
Liu and Shen (2006):V(u) = [1 min
j
u
j
]
+
.
All of these loss functions are the upper bounds of the 01 loss.
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Various Loss Functions
Generalized Functional Margin
Characteristics of Support Vector Machines
High accuracy,high exibility
Naturally handle large dimensional data
Sparse representation of the solutions (via support vectors):
fast for making future prediction
No probability estimates (hard classiers)
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Other Active Problems in SVM
Variable/Feature Selection
Linear SVM:Bradley and Mangasarian (1998),Guyon et al.
(2000),Rakotomamonjy (2003),Jebara and Jaakkola (2000)
Nonlinear SVM:Weston et al.(2002),Grandvalet (2003),
Basis pursuit (Zhang 2003),COSSO selection (Lin & Zhang
(2003)
Proximal SVM { faster computation
Robust SVM { get rid of outliers
Choice of kernels
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
The L
1
SVM
Replace the L
2
penalty by the L
1
penalty.
The L
1
penalty tends to give sparse solutions.
For f (x) = h(x)
T
+
0
,the L
1
SVM solves
min
0
;
n
X
i =1
[1 y
i
f (x
i
)]
+
+
d
X
j=1
j
j
j:(1)
The solution will have at most n nonzero coecients
j
.
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
L
1
Penalty versus L
2
Penalty
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Robust Support Vector Machines
Hinge loss is unbounded;sensitive to outliers (e.g.wrong
labels etc)
Support Vectors:y
i
f (x
i
) 1.
Truncated hinge loss:T
s
(u) = H
1
(u) H
s
(u),where
H
s
(u) = [s u]
+
:
Remove some\bad"SVs (Wu and Liu,2006).
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
Decomposition:Dierence of Convex Functions
Key:D.C.decomposition (Di.Convex functions).
T
s
(u) = H
1
(u) H
s
(u).
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Overview of Multiclass Learning
Simultaneous Classication by MSVMs
Extensions of SVM
D.C.Algorithm
D.C.Algorithm:The Dierence Convex Algorithm for mini
mizing
J() = J
vex
() +J
cav
()
1.Initialize
0
.
2.Repeat
t+1
= argmin
(J
vex
() +
J
0
cav
(
t
);
t
)
until convergence of
t
.
The algorithm converges in nite steps (Liu et al.(2005)).
Choice of initial values:Use SVM's solution.
RSVM:The set of SVs is a only a SUBSET of the original one!
Nonlinear learning can be achieved by the kernel trick.
Hao Helen Zhang
Lecture 16:Multiclass Support Vector Machines
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment