第
1
頁共
6
頁
1
機器學習
期
末
考試
學
號：
姓名：
100
/
6
一、
是非題
(
1
4
%)
(
)
1
.
In s
emiparametric
estimation, t
he density is written as a disjunction of a small number of
parametric models.
(
)
2
.
A decision tree
is a
hierarchical model
using
a
d
ivided

and

conquer strategy
.
(
)
3
.
To r
emove subtrees
i
n a
decision tree
,
postpruning is faster
and
p
repruning is more
accurate
.
(
)
4
.
The impurity measure of a classification tree should be
satisfies the following properties:
(1)
, (2)
, and (3)
(
)
5
.
Rule induction works similar to tree induction except that rule induction does a
breadth

first search, whereas tree induct
ion goes depth

first
.
(
)
6
.
When classes are Gaussian with a shared covariance matrix, the optimal discriminant is
linear
.
(
)
7
.
S
upport vector machines are
likel
i
hood

based method
s
.
(
)
8
.
In SIMD machines,
a
ll processors execute the same
instruction but on different pieces of
data
.
(
)
9
.
Hints can be used to create virtual examples
.
(
)
10
.
A
real

valued function
f
defined on an interval is called convex, if for any two points
x
and
y
in its domain
C
and any
t
in [0,1], we have
.
(
)
11
.
Adaptive
r
esonance theory (ART)
neural networks are unsupervised learning.
(
)
1
2
.
In a multilayer perceptron, if the number of hidden units is less than the number of
inputs, the first layer performs a dimensionality
reduction
.
(
)
1
3
.
The self

organizing map (SOM) is a winner

take

all neural network.
I
t is as if one
neuron wins and gets updated, and the others are not updated at all.
(
)
1
4
.
In a local representation, the input is encoded by the simultaneous a
ctivation of many
hidden units
such as R
adial
b
asis function
s
.
二、
簡答
題
1.
(4%)
W
hat are the advantages and the disadvantages of the nonparametric
density estimation
?
第
2
頁共
6
頁
2
2.
(1)
(
4
%)
What is the meanings of a leaf
node
and an internal node in a decision tree?
(2)
(4%)
How to decide to split a node in a decision tree?
What are the split critira?
3.
(
4
%) In
a
neural network, can
we have more
than one
hidden layers
?
W
hy or why not?
4.
(4%)
Why
is
a neural network
overtraining (or overfitting)
?
5.
(4%)
(1)
What are s
upport
v
ector
s in support vector machine?
(2) Given an example as follows. Please show the supports vectors.
6.
(4%)
W
hat
is
the different between
a
likelihood

based method and
a
d
iscriminant

based
method?
第
3
頁共
6
頁
3
7.
(
10
%)
Here shows
the batch
k

means algorithm and the online
k

means algorithm,
respectively.
(1)
(4%)
What are the differences between these two methods?
(2)
(6%)
What are their advantages and disadvantages?
8.
(4%)
Condensed Nearest Neighbor
algorithm
is used to f
ind a subset
Z
of
X
that is small and is
accurate in classifying
X
.
Please finish the following
Condensed Nearest Neighbor
algorithm.
第
4
頁共
6
頁
4
9.
(
4
%) In n
onparametric
r
egression
, given a r
unning mean smoother
as follows, please finish the
graph with
h
=
1
.
where
10.
(
12
%)
Let
be the distance to the
k

nearest sample,
N
the total sample number, and
K
is a
kernel
function.
The following
shows some d
ensity estim
ators, can you
(1)
(4%)
link the fomulars
to
their corresponding graph
s
, and
(2)
(8%)
calculate the values of
k
or
h
?
第
5
頁共
6
頁
5
11.
(
4
%)
Given a regression tree as follows. Please draw its corresponding regression result.
12.
(
6
%)
In
a
p
airwise
s
eparation
example as follows, and
H
ij
indicate
s
the hyperplane
separate the
examples of
C
i
and the examples of
C
j
Please
decide each region belongs to
which class
.
13.
Given a perceptron as follows:
(1)
(4%) What are the values of these weights if we
use this pe
rceptron to implement the AND gate?
(2)
(4%)
Why can’t a perceptron learn the Boolean
function XOR?
第
6
頁共
6
頁
6
三、
計算
證明
題
1.
(
10
%)
Given a
backpropogation neural network,
where
,
.
If
the learning factor is
and
the error function is
defined as
Please find the weight update rules
and
,
where
.
ANS:
Comments 0
Log in to post a comment