# Hw-3-12-LDA_SVMx

AI and Robotics

Oct 17, 2013 (4 years and 6 months ago)

115 views

نیرمت
هدننک یدنب هتسد

یاه

یطخ

1.

S
how that LDA can be seen as a least squares method for class
ifi
cation. In particular,

prove the following lemma:

Lemma 1
The projection direction

obtained by maximizing the
Fishercriterion is proportional
to the weight vector

obtained by minimizing the least
squares loss withthe affine function

(

)

Then

2.

Sketch two multimodal distributions (i.e., each class should have multiple areas of
concentration) for which a linear discriminant could give excellent (or even optimal)
classification accuracy. Sketch two unimodal distributions (i.e., each classis concent
rated
in a single area) for which even the best linear discriminant would give poor
classificationaccuracy. You may need to look up the ideas “multimodal distributions" and
"unimodal distributions"

to complete this problem.

3.

For an SVM, if we remove one of the support vectors from the trainingset, does the size
of the maximum margin decrease, stay the same, or increase for that dataset? Why?Also
justify your answer by providing a simple dataset (no more than 2
-
D) in which you
identify thesupport vectors, draw the location of the maximum margin hyperplane,
remove one of the supportvectors, and draw the location of the resulting maximum
margin hyperplane.

4.

(

)

is equivalent to mapping each x into a higher
dimensional space where

for the case where

. Now consider the cubic kernel

(

)

. Note thatthis kernel adds 1 to the dot product. What is the corresponding

function, a
gain for the case where

?

5.

Suppose that we believe some training points are more important than others. That is, as
usual, wehave data

with corresponding labels

; however, we also have
importance weights

. There are two ways we
can try to incorporate these
weights into the SVM formulation:(1) by rescaling the margin; (2) by rescaling the loss.
We will look at both in this exercise.

a)

By “rescaling the margin", we mean that instead of forcing each data point
n to achieve a margino
n one, we force each data point to have a margin
of

. (For simplicity, if you wish to leave off the bia
s term, which

is
acceptable.) Write down the corresponding primal optimization problem.
does this compare to the standard SVM formulation?

b)

Only By
\
resc
aling the loss", we mean that each data point gets a separate
slack control.In other words, our soft
-
margin classifier will have the
form

.
Repeat

theprevious sequence, eventually
getting down to the dual formulation. How does this compare toth
e
standard SVM?

Finally, discuss (
in a few sentences) what the diff
erence between rescaling the margin
and rescaling theloss is.