EE734, Spring 2013 Midterm Exam

munchsistersAI and Robotics

Oct 17, 2013 (3 years and 9 months ago)

71 views

EE734, Spring 201
3


Midterm Exam


1.

(
5

pts
) Given prior information in the table,

Name

Sex

Young
-
Hyun

Male

Chansu

Female

Young
-
Hyun

Female

Young
-
Hyun

Female

Andy

Male

Karin

Female

Nina

Female

Sunju

Male


Find
if an officer named

Young
-
hyun


is more likely to be male or female.

U
se Bayes rule to justify your answer.

S
ol) p(male | Youn
g
-
hyun)=1/3 * 3/8 /

3/8 = 0.125

/ (3/8) =0.33


p(female | Youn
g
-
hyun)=2/5 * 5/8 /

3/8 = 0.250
/

(3/8) = 0.67


So, Officer Young
-
hyun is more likely to be a female.


2.

(10 pts)
(a) Give a general expression for the quadratic approximation to a twice differentiable
function f(x) at x=
k
.

S
ol)

(b)

Use your answer from part (a) to give an
approximate

value fo
r ln(1.
1
) where ln(x) is the
natural log function.

(sol) (a):


(b)




3.

(10 pts)
Gaussian Naive Bayes

Suppose you are training Gaussian Naive Bayes (GNB) on the training set shown below.

The dataset satis

es
Gaussian Naive Bayes assumptions. Assume that the variance is
independent

of instances but dependent on classes, i.e.
σ
i

k

=
σ
k


where i indexes
instances X
(i)

and k

1;2

indexes classes. Draw the decision boundaries when you train
GNB


a. using the same variance for both classes,
σ
1

=
σ
2

b. using separate variance for each class
σ
1



σ
2


sol)
The decision boundary for part a will be linear, and part b will be quadratic.






4.

(5

pts)

Consider the datasets
toydata1
in
the
figure

below
.



In
the

dataset there are two classes, ’+’ and ’o’.



Each class has the same number of points.



Each data point has two real valued features, the X and Y coordinates.

D
raw the decision boundary that a Gaussian Naive Bayes
classifier

will learn.


(
Remember that a very important piece of information was that all the classes had the
same

number of points, and so we don’t have to worry about the prior.
)





S
ol)

For
toydata
GNB learns two
Gaussians , one for the circle inside with small variance , and

one for the circle outside with a much larger variance, and the decision surface is roughly
shown

in

the

figure.



5.

(
21

pts)
(True of False)

(a)


(SVM)
When the data

is not completely linearly separable,

t
he linear SVM without slack
variables returns
w

= 0.

(
w

= weight vector)

(sol)
False, there is no solution.
.

(b)

(SVM) After

training a SVM, we can discard all examples

wh
i
ch are not support vectors
and can still
classify new examples.

(
T)

(c)

Over
fi
tting is more likely when the set of training data is small

(T)

Solutions:

True. With small tr
aining dataset, it's easier to
fi
nd a hypothesis to
fi
t the training
data

exactly,i.e., over
fi
t.

(d)

Logistic regression learns a non
-
linear decision boundary because it assumes

that
, which is a nonlinear function of
x
.

(F)

(e)

When learning a linear decision boundary with the perceptron algorithm, it is

guaranteed
to converge within a

fi
nite number of steps

as far as the data is linear separable
.

(
T
)


(f)

Maximizing the
log
likelihood of logistic regression model
may
yield multiple local
optimums.

(F) see the 2
nd

paragraph, pp137, by Prince. That is, the log
-
likelihood for logistic
regression has a special prop
erty, that is a concave function of the parameters phi.

Also refer
to slide 22 and 25 of the lecture note on Classification.

(g)

The Newton method used in optimization only finds local extreme. (T)

6.

(
4

pts) Encircle anyone that belongs to nonparametric supervi
se learning. (c, e)

(a)
Naïve

Bayes (b) logistic regression (c) SVM (d) k
-
means clustering (e) k
-
nearest neighbor


7.

(1
5

pts) (Regression)

a)

Explain three problems related to linear regression

and how to overcome each problem

in
detail.




8.

(
2
0 pts) Lagrange

method

We want to
make a rectangular box without a lid with 12m
2

of card board. To fin
d

the
maximum volume of such a box, our goal is to m
aximize the function
f (x,y,z)

= xyz, s.t.
g(x)=xy + 2yz + 2zx
-
12 = 0, where x,y, and z are the length, width, and h
eight of a box,
respectively.

In other words, find x, y, and z that maximizes the function f(x,y,z)..

S
ol) Buil
d

Lagrangian


Calculate partial derivatices for x,y,z and
λ

and set them equal to 0


x=y=2, z=1. f(2,2,1)=4.



9.

(10 pts) SVM

and the slack penalty C

The goal of this problem is to correctly classify test data points, given a training data set.

You have been warned, however, that the training data comes from sensors which can be

error
-
prone, so you should avoid trusting any
specific point too much.

For this problem, assume that we are training an SVM with a
quadratic kernel


that is,

our kernel function is a polynomial kernel of degree 2. You are given the data set
presented

in Figure 1. The slack penalty
C
will determine the

location of the separating
hyperplane.

Please answer the following questions
qualitatively
. Giv
e a one sentence

a
nswer/justification

for each and draw your solution in the appropriate part of the Figure
at the end of the

problem.




Figure 1:
Dataset for SVM slack penalty selection task .



a) (3 pts)
Where would the decision boundary be for very large values of

C

(i.e.,

C




)?
(remember that we are using an SVM with a quadratic kernel.) Draw on

the figure below.
Justify your answer.




SOLUTI
ON:
For large values of
C
, the penalty for misclassifying points is very

high, so the decision boundary will perfectly separate the data if possible. See below for

the boundary learned using libSVM and
C

= 100000
.

_
COMMON MISTAKE 1:
Some students drew
straight lines, which would not be

the result with a quadratic kernel.

_
COMMON MISTAKE 2:
Some students confused the effect of
C

and thought

that a large

C

meant that the algorithm would be more tolerant of misclassifications.


b)

(3 pts)

For
C



0,
indicate in the figure below, where you would expect the decision

boundary to be? Justify your answer.




SOLUTION:
The classifier can maximize the margin between most of the points,

while
misclassifying a few points, because the penalty is so low. See bel
ow for the boundary

learned by libSVM with

C

= 0
.
00005
.


c)
(2 pts)
Which of the two cases above would you expect to work better in the classification
task? Why?



SOLUTION:
We were warned not to trust any specific data point too much, so we

prefer the
solution where

C



0
, because it maximizes the margin between the dominant

clouds of points.


d)
(1 pts)
Draw a data point which will not change the decision boundary learned for

very large values of

C
. Justify your answer.



SOLUTION:
We add the point
circled below, which is correctly classified by the

original
classifier, and will not be a support vector.


e)
(1 pts)
Draw a data point which will significantly change the decision boundary

learned for very large values of

C
. Justify your answer.



SOLUTI
ON:
Since

C

is very large, adding a point that would be incorrectly classified

by
the original boundary will force the boundary to move.



Figure 2: Solutions for Problem 2

SID: Name:

F
igure for prob 3.


Figure

for problem
4


Figures for prob. 9

(a)


(b)


(c )


(d)