# CS 760 - Machine Learning

Τεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 7 μήνες)

87 εμφανίσεις

University of Wisconsin

Computer Sciences Department

CS 760
-

Machine Learning

Spring 20
10

Exam

11am
-
12:30pm, Monday, April 26
, 20
10

Room
1240 CS

CLOSED BOOK

(one sheet of notes and a calculator allowed)

If you feel that a question is not fully
specified, state any assumptions you need to make in order to solve the problem.

You may use
the backs of these sheets for scratch work.

If you use the back f
sure to clearly mark that on the front side of the sheets.

Neatly

w
rite your name on this and all other pages of this exam.

Name

________________________________________________________________

Problem

Score

Max

Score

1

______

2
0

2

______

20

3

______

2
0

4

______

20

5

______

20

TOTAL

______

100

Name: ______________________

Page
2

of
10

Problem 1

Learning from
Labeled

Examples (
2
0

po
ints)

You have a dataset that involves
three

features. Feature
C
’s values are in [0,

100
0
].
The other
two features are Boolean
-
valued.

A

B

C

Category

Ex1

F

T

1
1
5

false

Ex2

T

F

8
9
0

false

Ex3

T

T

25
7

true

Ex4

F

F

50
9

true

Ex5

T

T

75
3

true

a)

How much information about the category is gained by knowing
whether or not

the value of
feature
C
is less than 33
3
?

b)

How much information is there in knowing
whether or not

features
A

and
B

have the
same

value?

c)

A
knowledgeable reviewer says

that the above data set was not very well pre
-
processed for
nearest
-
neighbor algorithms.
Briefly

explain why a reviewer might say that.

Name: ______________________

Page
3

of
10

d)

Assume a one
-
norm SVM puts
weight

=
-
3 on feature
A
,
weight

= 2 on feature
B
,
and

weigh
t

= 0 on feature
C
. What would the
cost

of this solution be
,
based on

this question’s
five training examples
?

If you need to make any additional assumptions, be sure to state and
briefly justify them.

The
training
exam
nce
:

A

B

C

Category

Ex1

F

T

1
1
5

false

Ex2

T

F

8
90

false

Ex3

T

T

25
7

true

Ex4

F

F

50
9

true

Ex5

T

T

75
3

true

Name: ______________________

Page
4

of
10

Problem 2

Aspects

of

Supervised Learning (
20

points)

a)

Explain what
active learning
means. Also briefly
describe

how you might use
Bagging

to
active

learning
.

b)

Assume we have a supervised
-
learning task where the examples are represented by 26
Boolean features, A
-
Z. We
guess that the true

concept is of the form
:

Literal
1

Literal
2

Literal
3

Where
Literal
i

is a one of the features A
-
Z or its negation and where a given feature can
appear at most once in the concept (so “C

¬ M

A” is a valid concept, but “C

¬ M

M”
is not).

If 90% of the time we want to learn a concept whose accuracy is at least 95%, how many
training examples should we
collect
?

Name: ______________________

Page
5

of
10

c)

Assume that our learning algorithm is to simply (and stupidly) learn the model

f(x) = max
imum

output value seen in the training set
.

We want to estimate the
error due to
bias

(in the bias
-
variance sense) of this algorithm
, so

we
collect
a number of

possible training
sets
, where
the notation
N

M

means for input
N

the
output is
M

(i.e., there is o
ne input feature and the output is a single number).

{ 1

3, 2

2} { 4

5, 3

0

} { 2

2, 4

5 }
{ 3

0
, 3

0

} { 2

2, 1

3

}

Based on this sample of possible training sets, what is the estimated error
,

due to this
algorithm’s bias
,

for the input value of 2?

Be sure to show your work and explain your

Name: ______________________

Page
6

of
10

Problem
3

Reinforcement Learning (
2
0

points)

Consider the
deterministic

reinforcement environment drawn below (let γ
=0.
5
). The numbers on
the arcs indicate the immediate
rewards.

Once the agent reaches the ‘end’ state the current
episode ends and the agent is magically transported to the ‘start’ state
. The probabili
ty of an
exploration step is 0.
02
.

a)

A

one
-
step,
Q
-
table
learner follows the path
start

b

end
.

On the graph
below
, show
the Q values that have
changed
, and
show

to the right of the graph
.

Assume that
for all legal actions, the initial values in the Q table are
6
.

b)

Starting with
the
Q table you produced in
Part a,
again
start

b

end
and
show the Q values below that have
changed
.

Show your work to the right.

start

c

-
3

9

7

2

end

4

-
5

b

a

-
1000

start

c

end

b

a

start

c

end

b

a

Name: ______________________

Page
7

of
10

c)

State and in
formally explain the optimal path from
start

to
end

that a Q
-
table learner will
learn after a large number of trials in this environment.

(You do not need to
show the
score

of

every possible path.

The original RL graph appears below for convenience.
)

start

end

d)

Repeat Part c

but this time
assume

the SARSA algorithm

is being used
.

start

end

e)

In class and in the text, a convergence proof for Q

learning was presented. If we use a
func
tion approximator
, this proof no longer applies.
Briefly

explain
why.

Here again is the version of the RL graph with the immediate rewards
shown
.

start

c

-
3

9

7

2

end

4

-
5

b

a

-
1000

Name: ______________________

Page
8

of
10

Problem
4

Experimental Methodology (
20

points)

a)

Assume
on some
Boolean
-
ain

a perceptron
on 1000 examples and g
e
t
85
0 correct, then test

your learned model on a fresh set of 100 examples and
find it predicts
80 correctly.
Give an estimate, including the 95% confidence interval, for the expected

accuracy on the next 100 randomly drawn examples.

b)

Sketch a
pair

of
learning curve
s

that might result from an
experiment where one evaluated
whether or not

a given

fe
ature
-
selection

algorithm
helped
. Be sure to label the axes and
informally explain

Why would a learning curve

even be used for an experiment like this?

c)

Assume
you have trained
a Bayesian network

for a Boolean
-
valued task. For each of the
test
-
set exampl
es below, the second column reports the probability the trained Bayesian
network computed for this example
, while the third column lists the correct category.

Example

Probability
(Output is True)

Correct Category

1

0.9
9

positive

3

0.81

negative

2

0.5
3

positive

4

0.26

negative

5

0.0
4

negative

D
raw
to the right of this table

the
ROC

curve

for this
ensemble

(it is fine to simply ‘connect the dots,’

that is make your curve piece
-
wise linear)
.

Be sure to label your axes.

Name: ______________________

Page
9

of
10

Problem
5

20

points)

Briefly

define and
discuss the importance in machine learning of each of the following:

weight decay

definition:

importance:

kernels that compute the distance between graph
-
based examples

[‘graph’ here is in the sense of arcs and nodes
,

as opposed to plots of
x

vs.
f(x)
]

definition:

importance:

structure search

definition:

importance:

State and briefly explain
two

ways that the Random Forest algorithm reduces the chances of
overfitting a training set.

i)

ii)

Name: ______________________

Page
10

of
10

Feel free to tear off this page and use it for ‘scratch’ paper.