ASSIGNMENT 1 (SAK5609)

fantasicgilamonsterData Management

Nov 20, 2013 (3 years and 8 months ago)

105 views

ASSIGNMENT 1 (SAK5609)


You are to perform data mining on

i.

IRIS
dataset and

ii.

another chosen dataset (please refer to data_info files).

These dataset can be found in the UCI Machine Learning Repository.



Provide a report of your assignment answering

the following questions



1.




Data Description

a.


Name of dataset

b.


Brief description

c.


Source of data

d.

List of attributes with their types (categorical, nominal, continuous).


2.




Objective(s) of applying data mining tech
niques to this data

a.


In general, do you expect to discover from mining this data?

b.

What types of relationships are being mined (Han & Kamber Ch 1.4)?


3.




Preliminary examination of data

a.

Are there any “obvious” patterns in the data
which might be helpful?

b.

Are there any experts that understand the data well and with whom you can
talk?

c.

A
re values missing for some attributes?

d.

Examine various plots of the data. In n
-
space with n > 3, plot values in
subsets of the di
mensions.[This is likely to give you insight into what needs to
be done next.]

e.


Perform correlation analysis to see if linear combinations exist between
numeric attribute pairs.



4.

Preprocessing
(Indicate techniques or strategy for each procedure.)

i.


Cleaning

ii.


Integration

iii.

Transformation(s)

iv.


Reduction