Weka Demo

overratedbeltΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

169 εμφανίσεις

WEKA SUMMARY


The following is a very brief overview of the functions available in Weka 3.5.

For a
more detailed description see my
presentation
.


EXPLORER



Pr
eprocess

o

Open file (.arff) *

o

Open URL


select URL as the source of a data file

o

Open DB


download data from a database

o

Generate


Generates data

for



Classification



Random RBF


generated by using RBFs with random
centres



Regression



Clustering

o

Filter data:



Supervised



Attribute

o

Attribute selector (selects the most informative
attributes)

o

Class order (switches the class order indicated in
the header)



Instance

o

spreadsubsample (produces a random subsample
of the dataset without replacement)



Unsupervised



Attrib
ute

o

Add noise



o

Discretize

o

Normalize

o

Remove

o

Remove useless



Instance

o

Randomize (shuffles the order of instances)

o

Resample (random subsample of dataset without
replacement)

o

Remove misclassified (removes misclassified
samples (select a classifier to determin
e errors))

o

Select attribute
-

Gives the min, max, mean and std *

o

Visualize all


Displays the density functions of all attributes *



Classify

o

Choose



Bayes



BayesNet


Bayes network classifier



Naïve Bayes *

(without noise)



Add noise


Preprocess/Filter/unsupe
rvised/attribute



Naïve Bayes * (with noise)



Functions



libSVM (wrapper class for libSVM tools) *



MultilayerPerceptron (backpropagation MLP) *



RBFNetwork (Normalized Gaussian RBF network) *



SMO (SVM with Sequential Minimal Optimization)



Lazy



IB1


Nearest Ne
ighbour classifier



IBk


k
-
Nearest Neighbour classifier *



LBR (Lazy Bayesia
n

Rules classifier


Bayes
classifier/cannot handle numerical values)



Meta



AdaBoostM1


boost nominal class classifiers



AttributeSelectedClassifier


selects most relevant
attribute
s

then classify data
, can select

o

Attribute selection procedure



Principal Components



SVMAttributeEval

o

Search method



Exhaustive search



Ranker



Bagging


bag a classifier to reduce variance



Misc



FLR (Fuzzy Reasoning Classifier) v5 *



Trees



ADTree (Alternating
decision tree)



ID3


Constructs an unpruned decision tree based on
the ID3 algorithm



J48


Generates a pruned/unpruned decision tree based
on C4 algorithm (right
-
click on results list to visualize
tree) *



NBTree


generates a decision tree with Naïve Bayes

classifiers at the leaves


o

After classification, you can right
-
click on the results list



Save model



Load model



Visualize classifier errors (select two variables at a time) *

-

‘x’ are
correctly classified samples and squares are errors



Visualize
Threshold

Curve *

o

For ROC curve plot make X false positive and Y true
positive

o

Also calculates the area under the ROC curve





Cluster

o

EM
-

uses the Expectation Maximization algorithm for clustering
(don’t have to specify the number of clusters) *

o

SimpleKMeans


us
es the k
-
means algorithm for clustering data (must
give the number clusters) *

o

After clustering you can right
-
click on results list *

(do this for 2 and 3
clusters)



Save model



Load model



Visualize cluster assignments
*


ASSOCIATE

o

Algorithms to extract

assoc
iation rules from non
-
numerical data



Select Attributes

*

o

Same functionality as discussed in attributeSelectedClassifier *(PCA
and Ranker)

o

The PCA algorithm can transform the attributes back to the original
variable space

* (click on PrincipalComponents and

select the
transformBackToOriginal as true)



Visualize

o

Scatter plots of all the attributes

o

Show functionality of *



Plot size



Point size



Jitter/noise



Click on a scatter plot


show that can save image


EXPERIMENTER



Provides an environment for testing and c
omparing various classifiers and
datasets



Setup

o

New

experiment

o

Select a results destination (*.csv file)

o

Number of repititions

o

Add datasets

(Iris, Diabetes and Heart)

o

Add algorithms

(Nave Bayes, libSVM, J48)



Run

o

Start (Make sure the *.csv file is closed)



A
nalyse

o

Analyse the results to determine if one of the algorithms is statistically
better than the other algorithms

o

load the experiment
-

Experiment

o

Perform test


‘v’ indicates a statistical better and ‘*’ a statistical worse
result than the baseline class
ifier


CLI



Weka command line, useful if for example want to set up experiments using a
*.bat file.


KNOWLEDGEFLOW
*



Alternative to Explorer with a graphical front end to the algorithms



The knowledge flow allows to process large datasets in an incremental
m
anner, while the other modes can only process small to medium size
datasets



Load demo.kf



Right
-
click ‘Arff’

o

‘Configure’


select dataset

o

‘Start loading’


runs simulation



Right
-
click ‘TextViewer’ for results


ARFFVIEWER



*
Used to view and edit *.arff file
s



*Click on ‘ArffViewer’

o

Edit



Rename attributes



Delete

attributes



Delete instances


LOG



The log file helps you to keep track of what you did