# CHE1147 Sample Questioins

Διαχείριση Δεδομένων

20 Νοε 2013 (πριν από 4 χρόνια και 7 μήνες)

100 εμφανίσεις

1

CHE1147

Sample Questioins

For the following dataset:

Table 1: Sample dataset

Obs.

Area

Diameter_mean

Density_min

Has_Particles

1

Small

High

High

No

2

Large

Low

Low

No

3

Small

High

High

No

4

Small

Low

High

No

5

Small

Low

High

No

6

Small

Low

Low

No

7

Large

Low

High

No

8

Small

High

High

No

9

Small

Low

Low

No

10

Large

Low

High

No

11

Large

High

Low

Yes

12

Large

High

High

Yes

13

Small

High

Low

Yes

14

Large

High

Low

Yes

15

Large

High

Low

Yes

16

Large

High

Low

Yes

17

Large

High

Low

Yes

18

Large

High

High

Yes

19

Small

Low

Low

Yes

20

Large

High

Low

Yes

Target attribute:
Has_Particles

Descriptors:
Area
,
Diameter_mean
,
Density_min

Source:
Data
Mining Research Group, UofT

Numeric precision:

0.0000

1. Calculate support and confidence for the following association rules. (
2.5

Marks)

Rule

Support

Confidence

(Area = Large
) => (Has_Particles = Yes
)

(Has_Particles = Yes
) => (Area = Larg
e)

(Area =Small) & (Diameter_mean = High
) => (Has_Particle=No
)

(Density_min =High
) => (Has_Particle=No
)

(Has_Par
ticle=Yes
)

2

2
.
Fill in the following tables where you see “……..” and determine the best rule set
using the OneR algorithm
.

(
2.5

Marks)

Has_Particles

Y
es

N
o

# of errors

Area

Large

8

3

Small

2

7

Has_Particles

Yes

No

# of errors

Diameter_mean

Low

1

7

High

9

3

Has_Particles

Yes

No

# of errors

Density_min

Low

8

3

High

2

7

Write the best OneR rule set below
:

3
. Calculate the probability of
(
Has_Particles = No

|
Area=
Large
)
and (
Has_Particles = Yes

|
Area=
Large
)
by using the Bayes’ theorem

with the information below by filling in the
spaces marked “……….”
.

(
7

Marks)

3.1

yes

no

Has_Particles

10

10

20

P (
Has_Particles = No
) =

X
P
C
X
P
C
P
X
C
P

|

|

3

P (
Has_Particles = Yes
) =

3.2

Large

Small

Area

11

9

20

P (
Area
= Large
) =

3.3

Has_Particles

yes

no

Area

Large

8

3

Small

2

7

10

10

20

P (
Area=
L
arge
|
Has_Particles = No
) =

P (
Area=
Large
|
Has_Particles = Yes
) =

3.4

P (
Has_Particles = No

|
Area=
Large
) =

P (
Has_Particles =
Yes

|
Area=
Large
) =

4
. Find the root node for a classification tree

using the information below
.

Fill in the
space
s marked “..
.”.

(
1
3

Marks)

4.1

Y
es

N
o

Total

Has_Particles

10

1
0

20

Y
es

N
o

Total

c
i
i
i
p
p
S
E
1
2
log
)
(

A
v
v
v
S
E
S
S
S
E
A
S
Gain
)
(
)
(
)
,
(

4

Entropy

4.2

Frequency

Has_Particles

Yes

No

Total

Area

Large

8

3

11

Small

2

7

9

20

Entropy

Has_Particles

Y
es

No

Total

Area

Large

Small

Gain (Has_Particles,

Area) =

4
.
3

Frequency

Has_Particles

Y
es

N
o

Total

Diameter_mean

Low

1

7

8

High

9

3

12

20

Entropy

Has_Particles

Yes

No

Total

Diameter
_mean

Low

High

Gain (
Has_Particles
,

D
iameter
_mean
) =

4.4

Frequency

Has_Particles

Total

Yes

No

5

Density_min

Low

8

3

1
1

High

2

7

9

20

Entropy

Has_Particles

Yes

No

Total

Density_min

Low

High

Gain (Has_Particles,

Densi
ty_min) =

4.5

The root node is

.

5
. Find the
nearest

Y for the case (X1=3, X2=5, X3=1
) using
the
1
-
Nearest Neighbor
method (
use Euclidean distance
with

normalization)
.
(
5

Marks)

Y

X1

X2

X3

2

2

5

1

1

2

4

0

1

1

5

0

1

1

3

1

5

3

6

1

4

4

4

0

7

5

6

0

6

5

4

1

6