CHE1147 Sample Questioins

desertcockatooΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

71 εμφανίσεις


1

CHE1147


Sample Questioins


For the following dataset:

Table 1: Sample dataset

Obs.

Area

Diameter_mean

Density_min

Has_Particles

1

Small

High

High

No

2

Large

Low

Low

No

3

Small

High

High

No

4

Small

Low

High

No

5

Small

Low

High

No

6

Small

Low

Low

No

7

Large

Low

High

No

8

Small

High

High

No

9

Small

Low

Low

No

10

Large

Low

High

No

11

Large

High

Low

Yes

12

Large

High

High

Yes

13

Small

High

Low

Yes

14

Large

High

Low

Yes

15

Large

High

Low

Yes

16

Large

High

Low

Yes

17

Large

High

Low

Yes

18

Large

High

High

Yes

19

Small

Low

Low

Yes

20

Large

High

Low

Yes



Target attribute:
Has_Particles


Descriptors:
Area
,
Diameter_mean
,
Density_min


Source:
Data
Mining Research Group, UofT



Numeric precision:

0.0000



1. Calculate support and confidence for the following association rules. (
2.5

Marks)


Rule

Support

Confidence

(Area = Large
) => (Has_Particles = Yes
)





(Has_Particles = Yes
) => (Area = Larg
e)





(Area =Small) & (Diameter_mean = High
) => (Has_Particle=No
)





(Density_min =High
) => (Has_Particle=No
)





(Has_Par
ticle=Yes
)








2



2
.
Fill in the following tables where you see “……..” and determine the best rule set
using the OneR algorithm
.


(
2.5

Marks)




Has_Particles




Y
es

N
o

# of errors

Area

Large

8

3



Small

2

7












Has_Particles




Yes

No

# of errors

Diameter_mean

Low

1

7



High

9

3












Has_Particles




Yes

No

# of errors

Density_min

Low

8

3



High

2

7











Write the best OneR rule set below
:



3
. Calculate the probability of
(
Has_Particles = No

|
Area=
Large
)
and (
Has_Particles = Yes

|
Area=
Large
)
by using the Bayes’ theorem

with the information below by filling in the
spaces marked “……….”
.

(
7

Marks)







3.1


yes

no


Has_Particles

10

10

20


P (
Has_Particles = No
) =










X
P
C
X
P
C
P
X
C
P

|

|


3


P (
Has_Particles = Yes
) =




3.2


Large

Small


Area

11

9

20


P (
Area
= Large
) =




3.3



Has_Particles




yes

no


Area

Large

8

3


Small

2

7




10

10

20


P (
Area=
L
arge
|
Has_Particles = No
) =



P (
Area=
Large
|
Has_Particles = Yes
) =




3.4


P (
Has_Particles = No

|
Area=
Large
) =



P (
Has_Particles =
Yes

|
Area=
Large
) =



4
. Find the root node for a classification tree

using the information below
.

Fill in the
space
s marked “..
.”.


(
1
3

Marks)







4.1


Y
es

N
o

Total

Has_Particles

10

1
0

20



Y
es

N
o

Total





c
i
i
i
p
p
S
E
1
2
log
)
(




A
v
v
v
S
E
S
S
S
E
A
S
Gain
)
(
)
(
)
,
(

4

Entropy









4.2

Frequency


Has_Particles




Yes

No

Total

Area

Large

8

3

11

Small

2

7

9





20


Entropy


Has_Particles




Y
es

No

Total

Area

Large







Small














Gain (Has_Particles,

Area) =



4
.
3

Frequency


Has_Particles




Y
es

N
o

Total

Diameter_mean

Low

1

7

8

High

9

3

12





20


Entropy


Has_Particles




Yes

No

Total

Diameter
_mean

Low







High














Gain (
Has_Particles
,

D
iameter
_mean
) =



4.4

Frequency


Has_Particles

Total



Yes

No



5

Density_min

Low

8

3

1
1

High

2

7

9





20


Entropy


Has_Particles




Yes

No

Total

Density_min

Low







High














Gain (Has_Particles,

Densi
ty_min) =




4.5


The root node is

.












5
. Find the
nearest

Y for the case (X1=3, X2=5, X3=1
) using
the
1
-
Nearest Neighbor
method (
use Euclidean distance
with

normalization)
.
(
5

Marks)



Y

X1

X2

X3

2

2

5

1

1

2

4

0

1

1

5

0

1

1

3

1

5

3

6

1

4

4

4

0

7

5

6

0

6

5

4

1





6