Assignment 3 - Steven Graham - B00444855x - Sgraham745.net

elbowcheepAI and Robotics

Oct 15, 2013 (3 years and 9 months ago)

63 views






Assignment 3

Health Informatics


Steven Graham

B0044855






1

Table of Contents

Introduction

................................
................................
................................
................................
............

2

Task 1

................................
................................
................................
................................
......................

2

Task 2

................................
................................
................................
................................
......................

2

Task 3

................................
................................
................................
................................
......................

3

Task 4

................................
................................
................................
................................
......................

3

Task 5

................................
................................
................................
................................
......................

4

Task 6

................................
................................
................................
................................
......................

5

Random Tree

................................
................................
................................
................................
.......

5

Decision Table

................................
................................
................................
................................
.....

6

Classifier Analysis

................................
................................
................................
................................

6

Task 7

................................
................................
................................
................................
......................

7

SimpleKmeans

................................
................................
................................
................................
.....

7

Task 8

................................
................................
................................
................................
......................

8

Works Cited

................................
................................
................................
................................
.............

9






2

Introduction

For this assignment I will be investigate the mining of clinical and medical data. Data mining is the
process of
extracting useful information from raw data. The objective of the assignment is to
investigate the medical datasets and classify the data using well
-
known classification algorithms. For
the classification I will be using a piece of open source data mining
software called WEKA.


(Weka, 2010)

Task 1


The term attribute simply means a feature or a variable within the dataset. For example within the
dataset I will be using for this assignment a variable or attribute would be diagno
sis.

Definition taken from Wikipedia
-

In computing, an attribute is a specification that defines a property
of an object, element, or file. It may also refer to or set the specific value for a given instance of
such.

(Attribute, 20
10)

Task 2


For this assignment I will download

two files

from the
Machine Learning repository. The first will be
data set description and the second will be the actual data. Below is a screen shot of the data set
description file which I will used
later to convert the file to a suitable format for WEKA.


(Breast Cancer Wisconsin (Diagnostic) Data Set , 1995)


3

Task 3


The dataset contains
data on breast cancer diagnosis. The data contains information on both benign
and ma
lignant tumours. The data set also gives further characteristics of the benign and malignant
tumours.

I believe that the dataset is intended for research into ways in which to improve tumour diagnosis,
by the introduction of computerised diagnosis. The d
ataset contains the information that a system
could be tested on to see how accurate it is.

The data

set could also be used to see i
f the
re

are any similarities or if there are any trends within
the data.

Task 4


1.

The dataset contains 569 instances

2.

The
dataset contains 32 attributes, below are the details


3.

There are two classes within the dataset these are Malignant and Benign.


4

Task 5


To enable me to use WEKA to classify the raw data, I must first place it into a format in which WEKA
will understand. T
o convert the text file to weka format I simple list the attributes that are contain
within the data file and then save it as an arff file. Below is a screen shot of the attributes placed at
the beginning of the file






5

Task 6


Supervised learning is where the machine learns task by inferring a function from a supervised
training dataset. The training data set will contain a number of training examples. The examples will
consist of an input object and a desired output. A supervis
ed learning algorithm will analyse the
training data and produce a classifier, this should predict the correct output for any valid input.

(Supervised learning, 2010)

Random Tree


Below are the results from the random tree sup
ervised classification



The random tree classification algorithm classified 92.091% of data set correctly, with an average TP
of 0.921.



6

Decision Table


Below are the results from the decision tree supervised classification



The decision table classifi
cation algorithm classified 94.024% of data set correctly, with an average
TP of 0.94.


Classifier Analysis


Out of the two supervised classification methods I selected the decision table perform the best. The
decision table classified 94.024% of the insta
nces correctly compare to the 92.091% of the random
tree. The decision table also had a better weight TP rate with 0.94 compared to the random tree
with 0.921.





7

Task 7


Unsupervised learning is a set of problems in which one seeks to determine how the data is
organised. Many of the methods employed here are based on methods from data mining to pre
-
process data. It differs from supervised learning in that the learner is gi
ven only the unlabelled
example.

(Unsupervised learning, 2010)

SimpleKmeans


Below are the results from the
SimpleKmeans

unsupervised classification


From the results of the
SimpleKmeans classification I can
see that there w
ere 4 iterations
within the dataset.

The 569 instances where classified
into both of the cluster with Benign
having 63% or 358 instances and
malignant having 37% or 211
instances.

In comparison to the supervised
algorithms, the SimpleKmeans
classified mor
e of the instances as
Benign by around 13, and more
instances Malignant by around 21.


8

Task 8


Unsupervised methods of classification although useful, only work well when the users has an idea
of the expected result. If the user does not have an idea of what the results should look like then
they should use supervised as this method is more accurate
.


If the dataset has missing data this can have an effect on the performance of the classifier. The data
may not be placed into the right classification or the classifier may not display how many values
where missing and the user may not notice this. One

method which has been used to treat missing
values is
deleting instances containing at least one missing value of a feature
.

(The treatment of missing values and its effect, 2005)







9

Works Cited

Breast Cancer Wis
consin (Diagnostic) Data Set
. (1995, 11 01). Retrieved 10 2010, from Machine
Learning Repository: http://archive.ics.uci.edu/ml/machine
-
learning
-
databases/breast
-
cancer
-
wisconsin/wdbc.names

The treatment of missing values and its effect.

(2005). Retrieved 12 2010, from University of Puerto
Rico: http://academic.uprm.edu/~eacuna/IFCS04r.pdf

Attribute
. (2010, 11 17). Retrieved 10 2010, from Wikipedia:
http://en.wikipedia.org/wiki/Attribute_(computing)

Supervised learning.

(2010, 12 02). Ret
rieved 12 2010, from Wikipedia:
http://en.wikipedia.org/wiki/Supervised_learning

Unsupervised learning.

(2010, 10 04). Retrieved 12 2010, from Wikipedia:
http://en.wikipedia.org/wiki/Unsupervised_learning

Weka
. (2010). Retrieved 10 2010, from Machine Learn
ing Group at University of Waikato:
http://www.cs.waikato.ac.nz/~ml/weka/