Behemoth - Yrd. Doç. Dr. Mete EMİNAĞAOĞLU

aspiringtokAI and Robotics

Oct 15, 2013 (3 years and 10 months ago)

114 views

STAT480 Data Mining for Statistics


Lecturer: Assist. Prof. Dr. Mete Eminağaoğlu

Quiz No.
3



Date:
3
1
st

Ma
y
, 2013



Duration:
60

minutes


Name & Surname:


_______

_______

_______

_______

_______

_______



Signature
:

_______

_______

_______

_______

_______

______________


General Instructions:

Open your web browser and go to

http://meminagaoglu.yasar.edu.tr/

Then,
select the

menu

“STAT480”
and then
select the
sub
-
menu “Quiz
3
”.
To open th
is

page,

you must enter the correct password. Password:
Behemoth

Q1. (
3
0 points)

Click on
Question1
and d
ownload
new
-
sales
.
zip

(zipped

.arff

file).

You
will only use
these two regression

algorithms
in Weka
: Linear Regression,
Simple Linear Regression.

1.
1.

(
1
5

points)


For
each of these two algorithms
;

First you will train the model with “new
-
sales
-
train.arff”. Then you will test the model with “new
-
sales
-
test.arff”.

Analyze the results that you have derived from both of the
regression alogrithms’ train and test performances.



Which algorithm is better?

(more accurate / reliable) _____________________________________


Why?

(you must answer it
by giving

some
necessary and relevant results

that you have found in Weka)
.









1.2
.

(
15

points)


Write down the
regression equation

that you have found for the best algorithm in part 1.1.




Using this equation,
calculate the

total
-
no
-
sales


prediction for;

others
-
price
=

9.5

our
-
price
=

10

our
-
cost =

9.18

inflation
-
rate
=

3.47



_____________________________________________________________________

Q
2
. (
2
0

points)

outlook

temperature

humidity

day
-
of
-
week

play

rainy

mild

high

Saturday

no

sunny

hot

high

other

no

sunny

hot

high

Sunday

no

rainy

mild

high

other

no

rainy

cool

normal

Sunday

no

rainy

cool

normal

Saturday

no

sunny

mild

low

other

yes

sunny

hot

normal

other

yes

sunny

mild

high

Saturday

yes

sunny

hot

normal

Sunday

yes

rainy

mild

normal

Sunday

yes

sunny

mild

low

Saturday

yes


Suppose

that you use
Tertius

algorithm

for the
above
data

set

and you find the rule denoted as below;

If
(humidity=
normal
) AND (play=
yes
)

THEN (temperature=hot) OR (day
-
of
-
week=Sunday)


According to
Tertius

algorithm, find the
values

that could be obtained by this rule.
You must show
all the necessary

calculations.

Expected

=


?

Observed =


?

Confirmation =

?

True Positive rate =

?

False Positive rate =

?


















Q
3
. (
30

points)

hair

height

weight

burned

blonde

average

light

yes

blonde

tall

average

no

brown

short

average

no

blonde

short

average

yes

red

average

heavy

yes

brown

tall

heavy

no

blonde

short

light

no


Suppose

that you have the above original data set. You will use this for
machine learning
classification. “burned” is the
class attribute.

Q3.1.

If you use
k
-
NN

classifier algorithm (
taking
k = 3
and using

Manhattan

distance

for the distance function), what
will be predicted as the class of this record? yes or no?
You must show all the necessary

calculations.

hair

height

weight


blonde

short

average















Q3.2.

If you use
NN

simple instance
-
based algorithm (
using

Euclidean

distance

for the distance function), what will be
predicted as the class of this record? yes or no?
You must show all the necessary

calculations.

hair

height

weight


b
rown

tall

heavy













Q
4
. (
20
points)


Team Name

no
-
of
-
wins
(this season)

no
-
of
-
wins (predicted
-
next season)

TS

20

2
2

BNK

14

13

AU

0

2

SSN

7

7

VK

12

10

RM

20

5

PO

2

0

RYL

10

1
1


You have a report for
eight different football

teams’ next season performance prediction given as above. This report is
derived by any machine learning numeric prediction method. According to this report, calculate the
evaluation
values
given below.
You must show all the necessary

calculations.

MSE (m
ean squared er

ror) =

?

Mean absolute error =


?


Relative squared error =

?

















According to these evaluation values,
what

can you say about this
prediction performance and data sample?

What
could

you do
improve

the prediction performance?