CS 760 - Machine Learning

stemswedishΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

75 εμφανίσεις

University of Wisconsin


Madison

Computer Sciences Department


CS 760
-

Machine Learning

Spring 20
10


Exam

11am
-
12:30pm, Monday, April 26
, 20
10

Room
1240 CS


CLOSED BOOK

(one sheet of notes and a calculator allowed)



Write your answers on these pages and show your work.
If you feel that a question is not fully
specified, state any assumptions you need to make in order to solve the problem.

You may use
the backs of these sheets for scratch work.

If you use the back f
or any of your final answers, be
sure to clearly mark that on the front side of the sheets.


Neatly

w
rite your name on this and all other pages of this exam.





Name


________________________________________________________________




Problem


Score


Max

Score




1


______



2
0



2


______



20



3


______



2
0



4


______



20



5


______



20





TOTAL


______



100

Name: ______________________


Page
2

of
10

Problem 1


Learning from
Labeled

Examples (
2
0

po
ints)


You have a dataset that involves
three

features. Feature
C
’s values are in [0,

100
0
].
The other
two features are Boolean
-
valued.




A

B


C

Category


Ex1

F

T

1
1
5


false


Ex2

T

F

8
9
0


false


Ex3


T

T

25
7


true




Ex4


F

F

50
9


true



Ex5


T

T

75
3


true



a)

How much information about the category is gained by knowing
whether or not

the value of
feature
C
is less than 33
3
?













b)

How much information is there in knowing
whether or not

features
A

and
B

have the
same

value?














c)

A
knowledgeable reviewer says

that the above data set was not very well pre
-
processed for
nearest
-
neighbor algorithms.
Briefly

explain why a reviewer might say that.



Name: ______________________


Page
3

of
10


d)

Assume a one
-
norm SVM puts
weight

=
-
3 on feature
A
,
weight

= 2 on feature
B
,
and

weigh
t

= 0 on feature
C
. What would the
cost

of this solution be
,
based on

this question’s
five training examples
?

If you need to make any additional assumptions, be sure to state and
briefly justify them.


































The
training
exam
ples repeated for your convenie
nce
:




A

B

C

Category


Ex1

F

T

1
1
5


false


Ex2

T

F

8
90


false


Ex3


T

T

25
7


true


Ex4


F

F

50
9


true


Ex5


T

T

75
3


true



Name: ______________________


Page
4

of
10

Problem 2

Aspects

of

Supervised Learning (
20

points)


a)

Explain what
active learning
means. Also briefly
describe

how you might use
Bagging

to
address the task of
active

learning
.

















b)

Assume we have a supervised
-
learning task where the examples are represented by 26
Boolean features, A
-
Z. We
guess that the true

concept is of the form
:


Literal
1



Literal
2



Literal
3



Where
Literal
i

is a one of the features A
-
Z or its negation and where a given feature can
appear at most once in the concept (so “C


¬ M


A” is a valid concept, but “C


¬ M


M”
is not).


If 90% of the time we want to learn a concept whose accuracy is at least 95%, how many
training examples should we
collect
?



Name: ______________________


Page
5

of
10


c)

Assume that our learning algorithm is to simply (and stupidly) learn the model


f(x) = max
imum

output value seen in the training set
.



We want to estimate the
error due to
bias

(in the bias
-
variance sense) of this algorithm
, so

we
collect
a number of

possible training
sets
, where
the notation
N

M

means for input
N

the
output is
M

(i.e., there is o
ne input feature and the output is a single number).



{ 1


3, 2


2} { 4


5, 3


0

} { 2


2, 4


5 }
{ 3


0
, 3


0

} { 2


2, 1


3

}




Based on this sample of possible training sets, what is the estimated error
,

due to this
algorithm’s bias
,

for the input value of 2?

Be sure to show your work and explain your
answer.


Name: ______________________


Page
6

of
10

Problem
3



Reinforcement Learning (
2
0

points)


Consider the
deterministic

reinforcement environment drawn below (let γ
=0.
5
). The numbers on
the arcs indicate the immediate
rewards.

Once the agent reaches the ‘end’ state the current
episode ends and the agent is magically transported to the ‘start’ state
. The probabili
ty of an
exploration step is 0.
02
.












a)

A

one
-
step,
Q
-
table
learner follows the path
start


b



end
.


On the graph
below
, show
the Q values that have
changed
, and
show

your work

to the right of the graph
.

Assume that
for all legal actions, the initial values in the Q table are
6
.

















b)

Starting with
the
Q table you produced in
Part a,
again
follow the path
start


b


end
and
show the Q values below that have
changed
.

Show your work to the right.













start

c

-
3

9

7

2

end

4

-
5

b

a

-
1000

start

c




end



b

a

start

c




end


b

a

Name: ______________________


Page
7

of
10


c)

State and in
formally explain the optimal path from
start

to
end

that a Q
-
table learner will
learn after a large number of trials in this environment.


(You do not need to
show the
score

of

every possible path.

The original RL graph appears below for convenience.
)



start













end








d)

Repeat Part c

but this time
assume

the SARSA algorithm

is being used
.



start












end








e)

In class and in the text, a convergence proof for Q

learning was presented. If we use a
func
tion approximator
, this proof no longer applies.
Briefly

explain
why.










Here again is the version of the RL graph with the immediate rewards
shown
.












start

c

-
3

9

7

2

end

4

-
5

b

a

-
1000

Name: ______________________


Page
8

of
10

Problem
4



Experimental Methodology (
20

points)



a)

Assume
on some
Boolean
-
prediction task, you tr
ain

a perceptron
on 1000 examples and g
e
t
85
0 correct, then test

your learned model on a fresh set of 100 examples and
find it predicts
80 correctly.
Give an estimate, including the 95% confidence interval, for the expected

accuracy on the next 100 randomly drawn examples.









b)

Sketch a
pair

of
learning curve
s

that might result from an
experiment where one evaluated
whether or not

a given

fe
ature
-
selection

algorithm
helped
. Be sure to label the axes and
informally explain
what your curves show.









Why would a learning curve

even be used for an experiment like this?








c)

Assume
you have trained
a Bayesian network

for a Boolean
-
valued task. For each of the
test
-
set exampl
es below, the second column reports the probability the trained Bayesian
network computed for this example
, while the third column lists the correct category.


Example

Probability
(Output is True)


Correct Category


1




0.9
9







positive


3





0.81






negative


2




0.5
3






positive


4




0.26






negative


5




0.0
4







negative


D
raw
to the right of this table

the
ROC

curve

for this
ensemble

(it is fine to simply ‘connect the dots,’

that is make your curve piece
-
wise linear)
.


Be sure to label your axes.

Name: ______________________


Page
9

of
10

Problem
5


Miscellaneous Short Answers (
20

points)


Briefly

define and
discuss the importance in machine learning of each of the following:



weight decay



definition:





importance:






kernels that compute the distance between graph
-
based examples


[‘graph’ here is in the sense of arcs and nodes
,

as opposed to plots of
x

vs.
f(x)
]



definition:





importance:






structure search



definition:





importance:





State and briefly explain
two

ways that the Random Forest algorithm reduces the chances of
overfitting a training set.


i)





ii)


Name: ______________________


Page
10

of
10

Feel free to tear off this page and use it for ‘scratch’ paper.