Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Lecture 16:Multiclass Support Vector Machines

Hao Helen Zhang

Spring,2013

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Outlines

Traditional Methods for Multiclass Problems

One-vs-rest approaches

Pairwise approaches

Recent development for Multiclass Problems

Simultaneous Classication

Various loss functions

Extensions of SVM

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Traditional Methods for Multiclass Problems

Multiclass Classication Setup

Label:f1;+1g!f1;2;:::;Kg.

Classication decision rule:

f:R

d

=)f1;2;:::;Kg:

Classication accuracy is measured by

Equal-cost:Generalization Error (GE)

Err(f) = P(Y 6= f (X)):

Unequal-cost:the risk

R(f ) = E

Y;X

C(Y;f (X)):

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Traditional Methods for Multiclass Problems

Traditional Methods

Main ideas:

(i) Decompose the multiclass classication problem into multiple

binary classication problems.

(ii) Use the majority voting principle (a combined decision from

the committee) to predict the label

Common approaches:simple but eective

One-vs-rest (one-vs-all) approaches

Pairwise (one-vs-one,all-vs-all) approaches

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Traditional Methods for Multiclass Problems

One-vs-rest Approach

One of the simplest multiclass classier;commonly used in SVMs;

also known as the one-vs-all (OVA) approach

(i) Solve K dierent binary problems:classify\class k"versus

\the rest classes"for k = 1; ;K.

(ii) Assign a test sample to the class giving the largest f

k

(x)

(most positive) value,where f

k

(x) is the solution from the kth

problem

Properties:

Very simple to implement,perform well in practice

Not optimal (asymptotically):the decision rule is not Fisher

consistent if there is no dominating class (i.e.

arg max p

k

(x) <

1

2

).

Read:Rifkin and Klautau (2004)\In Defense of One-vs-all

Classication"

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Traditional Methods for Multiclass Problems

Pairwise Approach

Also known as all-vs-all (AVA) approach

(i) Solve

K

2

dierent binary problems:classify\class k"versus

\class j"for all j 6= k.Each classier is called g

ij

.

(ii) For prediction at a point,each classier is queried once and

issues a vote.The class with the maximum number of

(weighted) votes is the winner.

Properties:

Training process is ecient,by dealing with small binary

problems.

If K is big,there are too many problems to solve.If K = 10,

we need to train 45 binary classiers.

Simple to implement;perform competitively in practice.

Read:Park and Furnkranz (2007)\Ecient Pairwise

Classication"

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

One Single SVM approach:Simultaneous Classication

Label:f1;+1g!f1;2;:::;Kg.

Use one single SVM to construct a decision function vector

f = (f

1

;:::;f

K

):

Classier (Decision rule):

f (x) = argmax

k=1;;K

f

k

(x):

If K = 2,there is one f

k

and the decision rule is sign(f

k

).

In some sense,multiple logistic regression is a simultaneous

classication procedure

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

SVM for Multiclass Problem

Multiclass SVM:solving one single regularization problem by

imposing a penalty on the values of f

y

(x) f

l

(x)'s.

Weston and Watkins (1999)

Cramer and Singer (2002)

Lee et al.(2004)

Liu and Shen (2006);multiclass -learning:Shen et al.

(2003)

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Various Multiclass SVMs

Weston and Watkins (1999):

a penalty is imposed only if f

y

(x) < f

k

(x) +2 for k 6= y.

Even if f

y

(x) < 1,a penalty is not imposed as long as f

k

(x) is

suciently small for k 6= y;

Similarly,if f

k

(x) > 1 for k 6= y,we do not pay a penalty if

f

y

(x) is suciently large.

L(y;f(x)) =

X

k6=y

[2 (f

y

(x) f

k

(x))]

+

:

Lee et al.(2004):L(y;f(x)) =

P

k6=y

[f

k

(x) +1]

+

:

Crammer and Singer (2002):Liu and Shen (2006),

L(y;f(x)) = [1 min

k

ff

y

(x) f

k

(x)g]

+

.

To avoid the redundancy,a sum-to-zero constraint

P

K

k=1

f

k

= 0 is

sometimes enforced.

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Linear Multiclass SVMs

For linear classication problems,we have

f

k

(x) =

k

x +

0k

;k = 1; ;K:

The sum-to-zero constraint can be replaced by

K

X

k=1

0k

= 0;

K

X

k=1

k

= 0:

The optimization problem becomes

min

f

n

X

i =1

L(y

i

;f(x

i

)) +

K

X

k=1

k

k

k

2

subject to the sum-to-zero constraint.

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Nonlinear Multiclass SVMs

To achieve the nonlinear classication,we assume

f

k

(x) =

0

k

(x) +

k0

;k = 1; ;K:

where (x) represents the basis functions in the feature space F.

Similar to the binary classication,the nonlinear MSVM can

be conveniently solved using a kernel function.

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Regularization Problems for Nonlinear MSVMs

We can represent the MSVM as the solution to a regularization

problem in the RKHS.

Assume that

f(x) = (f

1

(x); ;f

K

(x)) 2

K

Y

k=1

(f1g +H

k

)

under the sum-to-zero constraint.

Then a MSVM classier can be derived by solving

min

f

n

X

i =1

L(y

i

;f(x

i

)) +

K

X

k=1

kg

k

k

2

H

k

;

where f

k

(x) = g

k

(x) +

0k

;g

k

2 H

k

,

0k

2 R.

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Generalized Functional Margin

Given (x;y),a reasonable decision vector f(x) should

encourage a large value for f

y

(x)

have small values for f

k

(x);k 6= y.

Dene the K 1-vector of relative dierences as

g = (f

y

(x) f

1

(x); ;f

y

(x) f

y1

(x);f

y

(x) f

y+1

(x); ;f

y

(x) f

K

(x)):

Liu et al.(2004) called the vector g the generalized functional

margin of f

g characterizes correctness and strength of classication of x

by f.

f indicates a correct classication of (x;y) if

g(f(x);y) > 0

k1

:

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

0-1 Loss with Functional Margin

A point (x;y) is misclassied if y 6= arg max

k

f

k

(x):

Dene the multivariate sign function as

sign(u) = 1 if u

min

= min(u

1

; ;u

m

) > 0;

1 if u

min

0:

where u = (u

1

; ;u

m

).Using the functional margin,

The 0-1 loss becomes

I (ming(f(x);y) < 0) =

1

2

[1 sign(g(f(x);y))]:

The GE becomes to

R[f ] =

1

2

E [1 sign(g(f(x);y))]:

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Generalized Loss Functions Using Functional Margin

A natural way to generalize the binary loss is

n

X

i =1

`(ming(f(x

i

);y

i

)):

In particular,the loss function L(y;f(x)) can be expressed as

V(g(f(x);y)) with

Weston and Watkins (1999):V(u) =

P

K1

j=1

[2 u

j

]

+

:

Lee et al.(2004):V(u) =

P

K1

j=1

[

P

K1

c=1

u

c

K

u

j

+1]

+

:

Liu and Shen (2006):V(u) = [1 min

j

u

j

]

+

.

All of these loss functions are the upper bounds of the 0-1 loss.

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Various Loss Functions

Generalized Functional Margin

Characteristics of Support Vector Machines

High accuracy,high exibility

Naturally handle large dimensional data

Sparse representation of the solutions (via support vectors):

fast for making future prediction

No probability estimates (hard classiers)

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Other Active Problems in SVM

Variable/Feature Selection

Linear SVM:Bradley and Mangasarian (1998),Guyon et al.

(2000),Rakotomamonjy (2003),Jebara and Jaakkola (2000)

Nonlinear SVM:Weston et al.(2002),Grandvalet (2003),

Basis pursuit (Zhang 2003),COSSO selection (Lin & Zhang

(2003)

Proximal SVM { faster computation

Robust SVM { get rid of outliers

Choice of kernels

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

The L

1

SVM

Replace the L

2

penalty by the L

1

penalty.

The L

1

penalty tends to give sparse solutions.

For f (x) = h(x)

T

+

0

,the L

1

SVM solves

min

0

;

n

X

i =1

[1 y

i

f (x

i

)]

+

+

d

X

j=1

j

j

j:(1)

The solution will have at most n nonzero coecients

j

.

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

L

1

Penalty versus L

2

Penalty

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Robust Support Vector Machines

Hinge loss is unbounded;sensitive to outliers (e.g.wrong

labels etc)

Support Vectors:y

i

f (x

i

) 1.

Truncated hinge loss:T

s

(u) = H

1

(u) H

s

(u),where

H

s

(u) = [s u]

+

:

Remove some\bad"SVs (Wu and Liu,2006).

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

Decomposition:Dierence of Convex Functions

Key:D.C.decomposition (Di.Convex functions).

T

s

(u) = H

1

(u) H

s

(u).

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

Overview of Multiclass Learning

Simultaneous Classication by MSVMs

Extensions of SVM

D.C.Algorithm

D.C.Algorithm:The Dierence Convex Algorithm for mini-

mizing

J() = J

vex

() +J

cav

()

1.Initialize

0

.

2.Repeat

t+1

= argmin

(J

vex

() +

J

0

cav

(

t

);

t

)

until convergence of

t

.

The algorithm converges in nite steps (Liu et al.(2005)).

Choice of initial values:Use SVM's solution.

RSVM:The set of SVs is a only a SUBSET of the original one!

Nonlinear learning can be achieved by the kernel trick.

Hao Helen Zhang

Lecture 16:Multiclass Support Vector Machines

## Comments 0

Log in to post a comment