Data Mining Homework
5
(
47
Points)
1.
(
8
points) Chapter
5
,
exercise 13
a
on page 320.
2.
(
10
points)
Chapter 6 exercise 2a

d, on page 404
3.
(
19
points)
A database has 4 transactions, shown below.
TID
Date
items_bought
T100
10/15/04
{K, A, D, B}
T200
10/15
/04
{D, A, C, E, B}
T300
10/19/04
{C, A, B, E}
T400
10/22/04
{B, A, D}
Assuming a minimum level of support
min_sup
= 60% and a minimum level of
confidence
min_conf
= 80%:
(a) Find
all
frequent itemsets (not just the ones with the maximum width./length)
using the Apriori algorithm. Show your work
—
just showing the final answer is
not acceptable. For each iteration show the candidate and acceptable frequent
itemsets. You should show your work similar to the way the example was done in
the PowerPoint slides
.
(15 points)
(b) List all of the strong association rules, along with their support and confidence
values, which match the following metarule, where
X
is a variable representing
customers and
item
i
denotes variables representing items (e.g., “A”, “B”, etc
.).
x
transaction
,
buys
(X, item
1
)
buys
(X, item
2
)
buys
(X, item
3
)
Hint: don’t worry about the fact that the statement above uses relations. The point
of the metarule is to tell you to only worry about association rules of the form X
Y
Z (or {X, Y
}
Z if you prefer that notation). That is, you
don’t
need to
worry about rules of the form X
Z.
(4 points)
4.
(
10 points
)
Here are several short questions on clustering
.
a.
List one significant commonality between clustering algorithms and
instance based
learning algorithms like nearest

neighbor.
b.
A decision tree can be used to generate a partitional clustering. How?
c.
Will outliers have a big impact on the K

means algorithm?
Why or why
not?
d.
Will outliers have a big impact on the DB

scan algorithm?
e.
We
had talked about Manhattan distances earlier in the course. If this is
used in a clustering algorithm, what shape will the clusters take on?
Comments 0
Log in to post a comment