Data Mining and Knowledge Discovery (KSE525)

stemswedishΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 24 μέρες)

92 εμφανίσεις

Data Mining and Knowledge Discovery (KSE525)


Assignment #3 (April
19
, 201
2
)


1. [10 points] Build the decision tree for the following relational table.

The last attribute is the class
label.

Use the
information gain
for attribute selection.

Let's assume that multi
-
way split is always
the best.

You need to explain how you calculated the information gain in detail.




















2. [6 points]
Classification can be used for
automatic speech recognition

which
is

one of the main features
of

Apple Siri.

Discuss what the class label is

in this type of applications. Then, briefly explain what
classification techniques can be used for developing the application.


3. [6 points] Discuss the advantages and dis
advantages of
lazy
classification (e.g., k
-
nearest neighbor
classification) in comparison with
eager
classification.


4. [8 points]
A notable problem of the information gain is that it
prefers attributes with a large number
of distinct values. Explain why

the
information

gain suffers from the problem and why the gain
ratio or Gini index does
not
.


5. [20 points] Download and install Weka (explained in class).

Then, build the decision tree using
J48 (C4.5) for the
Wine
data set in the UCI machine learning repository.

Notice that you need to
modify the format of the original data file as required by Weka.

Copy and paste the text
representation of the decision tree.

ID code

Outlook

Temperature

Humidity

Windy

Play

a

b

c

d

e

f

g

h

i

j

k

l

m

n

Sunny

Sunny

Overcast

Rainy

Rainy

Rainy

Overcast

Sunny

Sunny

Rainy

Sunny

Overcast

Overcast

Rainy

Hot

Hot

Hot

Mild

Cool

Cool

Cool

Mild

Cool

Mild

Mild

Mild

Hot

Mild

High

High

High

High

Normal

Normal

Normal

High

Normal

Normal

Normal

High

Normal

High

False

True

False

False

False

True

True

False

False

False

True

True

False

True

No

No

Yes

Yes

Yes

No

Yes

No

Yes

Yes

Yes

Yes

Yes

No