ClusteringToolkit 1.1 for clustering, validation, and programming of ...

spiritualblurtedΤεχνίτη Νοημοσύνη και Ρομποτική

24 Νοε 2013 (πριν από 3 χρόνια και 24 μέρες)

75 εμφανίσεις

ClusteringToolkit 1.1
for clustering, validation, and programming of the
new algorithms
1 TERMS OF USE
FOR NON COMMERCIAL USE ONLY.
Program is a research tool and still in the development stage.
It is being supplied "as is" without any warranty.
2 SYSTEM REQUIREMENTS
PC with windows. Mac, linux, or unix version are not yet available.
Java runtime (1.4.1 or higher recommend).
3 USAGE
Double click run.cmd.
4 MAIN WINDOW
Use parameter fields to adjust parameter of the clustering algorithms. First parameter is always distance
parameter of Minkowski metric (2 = Euclidean distance function). K means the number of clusters
wanted.
Parameter fields
Function selectors
Function buttons
Info fields
Use function buttons to:

Start clustering: Cluster

Start validating: Validate

Add a clustering or a validation job in the batch queue.

Visualize results of clustering: Visualize
Use function selectors to:

Select clustering method: up most selector

Select validation method: next selector

Select normalization method

Select type (Validation or Clustering) of a batch job item.
5 FILE MENU
Use File menu to:

Select input file for clustering

Select output file for clustering results

Save current settings

Load saved settings

Load clustering results for visualization

Start batch job

Quit program
6 VISUALIZATION
Load result file.
Resize main window to allow visualization.
Press Visualize button
7 DATA FORMAT
Format of input data.
All tokens are separated with tabulator. First row is header row and first column
consist of name of the instances. Example below.
IrisName
s_len
s_wdt
p_len
p_wdt
Iris-setosa
5.1
3.5
1.4
0.2
Iris-setosa
4.9
3
1.4
0.2
Iris-versiclr
7
3.2
4.7
1.4
Iris-versiclr
6.4
3.2
4.5
1.5
...
Format of output data
is like input data but there is no header row and the last column consist of
cluster numbers of instances. In the end of the output data file there is also centroids of clusters and
nearest instance to each cluster centroid. Example below.
Iris-setosa
1.227
0.325
-0.475
-1.07
0
Iris-setosa
1.208
0.338
-0.435
-1.11
0
Iris-setosa
1.165
0.459
-0.576
-1.04
1
Iris-setosa
1.136
0.486
-0.533
-1.089
1
Iris-virginica
1.057
-0.730
0.640
-0.968
2
...
8 ABOUT CLUSTERING & VALIDATION METHODS
More information about clustering and validation methods:

TTSAS & k-means:
Theodoridis, S. & Koutroumbas, K. 2003. Pattern Recognition. San Diego:
Academic Press.


Diana:
Datta, S. & Datta S. 2003. Comparision and validation of statistical clustering techniques for
microarray gene expression data. Bioinformatics 19 (4), 459-466.


SOM:
Kohonen, T. 1997. Self-Organizing Maps.
Berlin: Springer-Verlag.


FOM:
Yeung, K. Y., Haynor, D. R. & Ruzzo, W. L. 2001. Validating clustering for gene expression
data. Bioinformatics 17 (4), 309-318.

9 FOR DEVELOPERS
Feel free to modify ClusterToolkitView.java file. Source codes of engine (JNICluster.dll) are not yet
available. Question, bug reports etc.: jotatu@it.utu.fi