workflow with Triana

journeycartAI and Robotics

Oct 15, 2013 (3 years and 8 months ago)

68 views

Data Mining
workflow with Triana
(Demo)

Presented by:

Ali Shaikh Ali

Welsh e
-
Science Centre

Ali Shaikh Ali

Agenda..

Classifiers

Demo

Clusterers

Demo

Where?

Software

used

We are here

END

Welsh e
-
Science Centre

Ali Shaikh Ali

Software used..1

Triana



An open source problem solving environment developed at
Cardiff University that combines an intuitive visual interface
with powerful data analysis tools.



Triana includes a large library of pre
-
written analysis tools
and the ability for users to easily integrate their own tools.


Welsh e
-
Science Centre

Ali Shaikh Ali

Softwares used..2

Weka machine learning software..



Weka is a collection of machine learning algorithms for data
mining tasks.


Weka contains tools for data pre
-
processing, classification,
regression, clustering, association rules, and visualization.


Weka accept only
Arff

datafile formats.


An ARFF (Attribute
-
Relation File Format) file is an ASCII text
file that describes a list of instances sharing a set of
attributes. ARFF files were developed by the Machine
Learning Project at the Department of Computer Science of
The University of Waikato for use with the Weka machine
learning software


Welsh e
-
Science Centre

Ali Shaikh Ali

ARFF Overview

ARFF


Header

Data



Name of the relation



List of the attributes



Attributes types

@RELATION iris

@ATTRIBUTE sepallength NUMERIC

@ATTRIBUTE sepalwidth NUMERIC

@ATTRIBUTE petallength NUMERIC

@ATTRIBUTE petalwidth NUMERIC


@ATTRIBUTE class {Iris
-
setosa,Iris
-
versicolor,Iris
-
virginica}


@DATA

5.1,3.5,1.4,0.2,Iris
-
setosa

4.9,3.0,1.4,0.2,Iris
-
setosa

4.7,3.2,1.3,0.2,Iris
-
setosa


4.6,3.1,1.5,0.2,Iris
-
setosa


5.0,3.6,1.4,0.2,Iris
-
setosa

Welsh e
-
Science Centre

Ali Shaikh Ali

Inside the Dipso Toolbox

Deals with Dataset files

e.g. load datasets,

converts dataset formats

Deals with data that need manipulation

Visualize data

Welsh e
-
Science Centre

Ali Shaikh Ali

Agenda..

Classifiers

Demo

Clusterers

Demo

Where?

Software

used

We are here

END

Welsh e
-
Science Centre

Ali Shaikh Ali

Classifiers Demo..


J48 Classifier


Class for generating an unpruned or a pruned C4.5 decision
tree. For more information, see
Ross Quinlan (1993). C4.5:
Programs for Machine Learning, Morgan Kaufmann Publishers,
San Mateo, CA.



Operations

classify( )

Input:

DataHandler
dataset
, String
attributeName

output:
DataHandler

decitionTree




Welsh e
-
Science Centre

Ali Shaikh Ali

Classifiers Demo..2


Classifier Template


This Web service implements a
complete list of classifiers, i.e. trees,
rules, functions etc.


Operations

classifyInstance()


classifyRemoteInstance()


getClassifiers( )



getOptions()



Input:


DataHandler
dataset



String
classifierName



String
options



String
attributeName


output:

String

result


Input:


null


output:


String

listOfClassifiers


Input:


String
classifierName


output:


String

listOfApplicableOptions


Input:


String
datasetURL



String
classifierName



String
options



String
attributeName


output:

String

result










Welsh e
-
Science Centre

Ali Shaikh Ali

Classifiers Demo..3

1. Build your classifier


must implement the 4 required methods



2. Place it in the classifier’s lib.


3. Done!


Welsh e
-
Science Centre

Ali Shaikh Ali

Agenda..

Classifiers

Demo

Clusterers

Demo

Where?

Software

used

We are here

END

Welsh e
-
Science Centre

Ali Shaikh Ali

Clusterers Demo


Cobweb Clusterer



Overview


Similar to K
-
means:



K
-
Means iterates over the whole dataset until convergence in the
clusters is reached




Cobweb works incrementally, updating the clustering instance by
instance.


Welsh e
-
Science Centre

Ali Shaikh Ali

Clusterers Demo

Cobweb Web Service


Operations



cluster( )

Input:

DataHandler
dataset

output:
String

result



clusterRemoteInstance( )

Input:

String datasetURL


output:
String

result



clusterByPercentage( )

Input:
DataHandler
dataset
, int
percentage

output:

String
result


Welsh e
-
Science Centre

Ali Shaikh Ali

Agenda..

Classifiers

Demo

Clusterers

Demo

Where?

Software

used

We are here

END

Welsh e
-
Science Centre

Ali Shaikh Ali

Where can you find us?

UDDI Browser


An open
-
source project that provides a
friendly user interface allowing users to
browse and manipulate content in UDDI
registries.


It is written in Java using the
Swing libraries.


Currently the browser
only supports version 2.0 UDDI registries.


Cardiff UDDI:

Inquiry:

http://agents
-
comsc.grid.cf.ac.uk:8334/juddi/inquiry

Publish:

http://agents
-
comsc.grid.cf.ac.uk:8334/juddi/inquiry