STAT480 Data Mining for Statistics
Lecturer: Assist. Prof. Dr. Mete Eminağaoğlu
Quiz No.
3
Date:
3
1
st
Ma
y
, 2013
Duration:
60
minutes
Name & Surname:
_______
_______
_______
_______
_______
_______
Signature
:
_______
_______
_______
_______
_______
______________
General Instructions:
Open your web browser and go to
http://meminagaoglu.yasar.edu.tr/
Then,
select the
menu
“STAT480”
and then
select the
sub

menu “Quiz
3
”.
To open th
is
page,
you must enter the correct password. Password:
Behemoth
Q1. (
3
0 points)
Click on
Question1
and d
ownload
new

sales
.
zip
(zipped
.arff
file).
You
will only use
these two regression
algorithms
in Weka
: Linear Regression,
Simple Linear Regression.
1.
1.
(
1
5
points)
For
each of these two algorithms
;
First you will train the model with “new

sales

train.arff”. Then you will test the model with “new

sales

test.arff”.
Analyze the results that you have derived from both of the
regression alogrithms’ train and test performances.
Which algorithm is better?
(more accurate / reliable) _____________________________________
Why?
(you must answer it
by giving
some
necessary and relevant results
that you have found in Weka)
.
1.2
.
(
15
points)
Write down the
regression equation
that you have found for the best algorithm in part 1.1.
Using this equation,
calculate the
“
total

no

sales
”
prediction for;
others

price
=
9.5
our

price
=
10
our

cost =
9.18
inflation

rate
=
3.47
_____________________________________________________________________
Q
2
. (
2
0
points)
outlook
temperature
humidity
day

of

week
play
rainy
mild
high
Saturday
no
sunny
hot
high
other
no
sunny
hot
high
Sunday
no
rainy
mild
high
other
no
rainy
cool
normal
Sunday
no
rainy
cool
normal
Saturday
no
sunny
mild
low
other
yes
sunny
hot
normal
other
yes
sunny
mild
high
Saturday
yes
sunny
hot
normal
Sunday
yes
rainy
mild
normal
Sunday
yes
sunny
mild
low
Saturday
yes
Suppose
that you use
Tertius
algorithm
for the
above
data
set
and you find the rule denoted as below;
If
(humidity=
normal
) AND (play=
yes
)
THEN (temperature=hot) OR (day

of

week=Sunday)
According to
Tertius
algorithm, find the
values
that could be obtained by this rule.
You must show
all the necessary
calculations.
Expected
=
?
Observed =
?
Confirmation =
?
True Positive rate =
?
False Positive rate =
?
Q
3
. (
30
points)
hair
height
weight
burned
blonde
average
light
yes
blonde
tall
average
no
brown
short
average
no
blonde
short
average
yes
red
average
heavy
yes
brown
tall
heavy
no
blonde
short
light
no
Suppose
that you have the above original data set. You will use this for
machine learning
classification. “burned” is the
class attribute.
Q3.1.
If you use
k

NN
classifier algorithm (
taking
k = 3
and using
Manhattan
distance
for the distance function), what
will be predicted as the class of this record? yes or no?
You must show all the necessary
calculations.
hair
height
weight
blonde
short
average
Q3.2.
If you use
NN
simple instance

based algorithm (
using
Euclidean
distance
for the distance function), what will be
predicted as the class of this record? yes or no?
You must show all the necessary
calculations.
hair
height
weight
b
rown
tall
heavy
Q
4
. (
20
points)
Team Name
no

of

wins
(this season)
no

of

wins (predicted

next season)
TS
20
2
2
BNK
14
13
AU
0
2
SSN
7
7
VK
12
10
RM
20
5
PO
2
0
RYL
10
1
1
You have a report for
eight different football
teams’ next season performance prediction given as above. This report is
derived by any machine learning numeric prediction method. According to this report, calculate the
evaluation
values
given below.
You must show all the necessary
calculations.
MSE (m
ean squared er
ror) =
?
Mean absolute error =
?
Relative squared error =
?
According to these evaluation values,
what
can you say about this
prediction performance and data sample?
What
could
you do
improve
the prediction performance?
Comments 0
Log in to post a comment