Presentation - Debellor

cavalcadejewelSoftware and s/w Development

Nov 18, 2013 (3 years and 8 months ago)

90 views

Debellor

Data Mining Platform with Stream Architecture

Marcin Wojnarski

Warsaw University, Poland

2

Outline

Debellor


data mining platform

Motivation

Main features

Architecture:


Cell


data streaming


multi
-
threading

A
vailable in ver
.

0.
6

Future releases

Summary

3

Language:

Java

Licence:

open source
(
GPL
)

Download:

www.debellor.org

Debello


to conquer (latin).

Debellor


conqueror of data

Debellor

4

Debellor


data mining platform

Weka


TA
-
Lib

Debellor

5

Motivation

Demand for more complex algorithms.

Necessity to combine elementary algorithms.

6

Motivation

1.
Data Processing Network (DPN)

Load

Preprocess

Predict

Preprocess

Save

Load

Visualize

7

Motivation

2.
Committee of algorithms

Classifier B

Voting

Classifier A

Classifier C

8

Motivation

3.
Nested algorithms

RBF neural network

K
-
means

9

Requirements

Versatile

Efficient

Simple

10

All types of data processing algorithms

Extendible data types

Stream architecture


large data sets

Multi
-
threading

I
mmutability of data objects


safety

Features of Debellor

11

Debellor

12

Algorithm

=

䍥Cl
=
cell

Cell cell = new
RseslibClassifier
(
"C45"
);

cell.set(
"pruning"
,
"true"
);

13

Cell


data source

cell

cell.
open()
;

Sample

s1 = cell.
next()
,


s2 = cell.next(),


...

cell.
close()
;

14

Cell


data receiver

cell

cell.
setSource
(anotherCell);

anotherCell

15

Trainable

Cell

cell

cell.setSource(…);

cell.
learn
()
;

cell

EMPTY

TRAINED

16

Data Streaming

A

B

A

B

BATCH

STREAM

It’s the cell who is responsible for asking for data

17

Benefits of streaming

X

X

training of
k
-
means

18

Thread_1

Multi
-
threading

A

B

19

Thread_1

Multi
-
threading

A.
newThread()
;

A

B

Thread_2

20

Available in version 0.
6

Rseslib algorithms:


classifiers

(~20 algorithms)

Weka algorithms:


ARFF

reader


classifiers

(~60)


filters

(47)

Debellor algorithms:


Train&Test

evaluation


k
-
means

for large data (stream
-
based)

Data types:


numeric
and

symbolic features


vectors

of features, vectors of vectors of …

21

Future releases

Multi
-
input
&

multi
-
output cells

Composite cells (e.g. meta
-
learning)

Serialization and copying



22

Summary

Platform

Stream architecture

Extendible

Multi
-
threaded

Weka & Rseslib partially integrated

23

www.debellor.org

Home

24