cs 4200 - final year project report siddhi-cep

farmacridInternet και Εφαρμογές Web

2 Φεβ 2013 (πριν από 4 χρόνια και 4 μήνες)

752 εμφανίσεις

CS 4200
-

FINAL YEAR PROJECT

REPORT

S
IDDHI
-
CEP


By

Project Group


04


Suhothayan S. (070474R)

Gajasinghe K.C.B. (070137M)

Loku Narangoda I.U. (070329E)

Chaturanga H.W.G.S (070062D)


Project Supervisors

Dr. Srinath Perera

Ms.Vishaka Nanayakkara

Coordinated By

Mr. Shantha Fernando


THIS REPORT IS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
AWARD OF THE DEGREE OF BACHELOR OF SCIENCE OF ENGINEERING AT UNIVERSITY OF
MORATUWA, SRI LANKA.

3
r
d

of
September
201
1


Abstract

Project Title

:
Siddhi
-
CEP
-

High Performance Complex Event Processing Engine

Authors

:

Suhothayan Sriskandarajah

-

070474R


Kasun Gajasinghe

-

070137M





Isuru Udana Loku Narangoda

-

070329E


Subash Chaturanga

-

070062D


Coordinator

: Dr. Shantha Fernando

Supervisors

: Dr. Srinath Perera




Mrs. Vishaka Nanayakkara

During the last half a decade

or so
, Complex Event Processing (CEP)

is one of the most
rapidly emerging
fields
. Due to massive amount of business transactions and numerous new
technologies like

RFID (Radio Frequency Identification), it has now become a real challenge
to provide real time event driven systems that can process data and handle high input data
rates with near zero latency.

The basic functionality of the complex event processor is t
o match queries with events and
trigger a response. These queries describe the details about the events that the system needs to
search within the input data streams. Unlike traditional systems like Relational database
Systems

which are operating with hund
reds of queries running for short durations against
stored static data, event driven systems operate with stored queries running constantly against
extremely dynamic data streams. In fact, an event processing system is an upside down view
of a database. T
he
tasks of CEP are

to identify meaningful patterns, relationships and data
abstractions among unrelated events and fire an immediate response
.

Siddhi is an Apache
-
2.0 Licensed Complex Event Processing Engine.
It
addresses some
of
the
main concerns of event processing world where there is absolute need to have
a open
-
source variant with
the ability of processing huge flood of events that may go well over one
hundred thousand events per second with a near
-
zero latency. This needed carefu
l design of
generic concepts of a CEP. Siddhi was designed after doing an in
-
detail literature review
focusing on each and every concept separately. Current Siddhi implementation provides
an

extendable, scalable framework for the open
-
source community for
extending Siddhi to
match specific business needs.

Table of Figures


Figure 1 Maven Structure

................................
................................
................................
........

16

Figure 2 Some CEP engines and their brief distributions with their time line

........................

19

Figure 3 S4 architecture

................................
................................
................................
...........

20

Figure 4 Esper architecture

................................
................................
................................
......

21

Figure 5 PADRES broker network

................................
................................
........................

22

Figure 6 PADRES router architecture

................................
................................
....................

22

Figure 7 Aurora Pipeline Architecture

................................
................................
.....................

26

Figure 8 Siddhi high
-
level architecture
................................
................................
....................

33

Figure 9 Siddhi Class Diagram

................................
................................
................................

35

Figure 10 Sequence Diagram

................................
................................
................................
...

37

Figure 11 Siddhi Sequence Diagram

................................
................................
......................

37

Figure 12 Siddhi Implementation View

................................
................................
..................

38

Figure 13 Siddhi Process View

................................
................................
................................

40

Figure 14 Siddhi Deployment View

................................
................................
.......................

42

Figure 15 Siddhi Use Case Diagram

................................
................................
........................

43

Figure 16 Siddhi Event Tuple

................................
................................
................................
..

45

Figure 17 Siddhi Pipeline Architecture
................................
................................
....................

46

Figure 18 Map holding Executor Listeners

................................
................................
.............

49

Figure 19 Simple Siddhi Query

................................
................................
...............................

53

Figure 20 Siddhi Query Form UI Implementation Used in OpenMRS

................................
...

54

Figure 22 Siddhi Time Window

................................
................................
..............................

56

Figure 21 Siddhi Query with Simple Condition

................................
................................
......

56

Figure 23 Siddhi Time Window Query

................................
................................
...................

57

Figure 24 Siddhi Batch Window
................................
................................
..............................

58

Figure 25 Siddhi Unique Query

................................
................................
...............................

60

Figure 26
Scrum Development Process

................................
................................
...................

61

Figure 27 Test
-
Driven Development (TDD)

................................
................................
..........

62

Figure 28 Call Tree for the Time Window in JProfiler

................................
...........................

67

Figu
re 29 The Memory Usage of Siddhi for a Time Window query

................................
.......

68

Figure 30 Siddhi Benchmark

................................
................................
................................
...

69

Figure 31 DocBook Documentation Configuration

................................
................................
.

71

Figure 32 Siddhi Web Site

................................
................................
................................
.......

72




Table of Graphs


Graph 1 Siddhi Vs Esper Simple Filter Comparison

................................
...............................

75

Graph 2 Siddhi Vs
Esper Average over Time Window Comparison

................................
......

75

Graph 3 Siddhi Vs Esper State Machine Comparison

................................
.............................

76


Table of
Tables


Table 1 Comparison between Database Applications and Event
-
driven Applications

...........

11

Table 2 Different Every Operator Cases

................................
................................
..................

48





Table of Contents

Abstract

................................
................................
................................
................................
......

2

Table of Figures

................................
................................
................................
.........................

3

Table of Graphs
................................
................................
................................
..........................

4

Table of Tables

................................
................................
................................
..........................

4

1.

INTRODUCTION

................................
................................
................................
..............

9

1.1.

Complex Event processing

................................
................................
..........................

9

1.2.

Aims and Objectives

................................
................................
................................
.

10

2.

LITERATURE SURVEY

................................
................................
................................
.

11

2.1.

Background

................................
................................
................................
...............

11

2.1.1.

What is Complex Event Processing?

................................
................................
.....

11

2.1.2.

Why Complex Event Processing?

................................
................................
.........

12

2.1.3.

CEP General Use Cases

................................
................................
.........................

12

2.2.

Terminology

................................
................................
................................
..............

13

2.3.

Tools &

Technology studies

................................
................................
.....................

14

2.3.1.

Compiler Generators

................................
................................
..............................

15

2.3.1.1.

ANTLR

................................
................................
................................
..............

15

2.3.2.

Building and Project Management tools

................................
...............................

15

2.3.2.1.

Apache Maven

................................
................................
................................
...

15

2.3.2.2.

Apache ANT

................................
................................
................................
......

17

2.3.3.

Version Control Systems

................................
................................
.......................

18

2.3.3.1.

Subversion
................................
................................
................................
..........

18

2.4.

CEP

Implementation Related Study

................................
................................
..........

18

2.4.1.

Some Well Known CEP Implementations

................................
............................

20

2.4.1.1.

S4 [3] [4]

................................
................................
................................
............

20

2.4.1.2.

Esper/NEsper [7] [6]

................................
................................
..........................

21

2.4.1.3.

PADRES [6] [8]

................................
................................
................................
.

22

2.4.1.4.

Intelligent Event Processor (IEP) [6] [10]

................................
.........................

23

2.4.1.5.

Sopera [6] [11]

................................
................................
................................
...

23

2.4.
1.6.

Stream
-
based And Shared Event Processing (SASE) [12]

................................

23

2.4.1.7.

Cayuga [6] [14] [15]

................................
................................
..........................

25

2.4.1.8.

Aurora and Borealis [6] [20] [21] [15]

................................
..............................

26

2.4.1.9.

TelegraphCQ [6] [15] [26]

................................
................................
.................

27

2.4.1.10.

STREAM [6] [33]

................................
................................
..............................

28

2.4.1.11.

PIPES

................................
................................
................................
.................

29

2.4.1.12.

BEA WebLogic [15]

................................
................................
..........................

29

2.4.1.13.

Coral8 [15]

................................
................................
................................
.........

30

2.4.1.
14.

Progress Apama [15]
................................
................................
..........................

30

2.4.1.15.

StreamBase [15]

................................
................................
................................
.

31

2.4.1.16.

Truviso [15] [41]

................................
................................
................................

31

2.5.

Some Interesting Research Papers

................................
................................
............

31

2.5.1.

Event Stream Processing with Out
-
of
-
Order Data Arrival [42]

............................

31

2.5.2.

Efficient Pattern Matching over Event Strea
ms [43]

................................
.............

32

2.6.

What We Have Gained from the Literature Survey

................................
..................

32

3.

SIDDHI DESIGN

................................
................................
................................
.............

33

3.1.

Siddhi Architecture

................................
................................
................................
...

33

Input Adapters

................................
................................
................................
..................

33

Siddhi
-

core
................................
................................
................................
.......................

34

Output Adapters

................................
................................
................................
................

34

Compiler

................................
................................
................................
...........................

34

Pluggable UI

................................
................................
................................
.....................

34

3.2.

4+1 Model

................................
................................
................................
.................

35

3.2.1.

Logical

View

................................
................................
................................
.........

35

Class Diagram
................................
................................
................................
...................

35

Sequence Diagram

................................
................................
................................
............

36

Implementation View

................................
................................
................................
.......

38

Process View

................................
................................
................................
....................

39

Deployment View

................................
.............................

Error! Bookmark not defined.

Use Case
View
................................
................................
................................
..................

43

Use case 1.

................................
................................
................................
........................

43

Use case 2.

................................
................................
................................
........................

44

Use case 3.

................................
................................
................................
........................

44

3.3.

Major Design Components

................................
................................
.......................

45

3.3.1.

Event Tuples

................................
................................
................................
..........

45

3.3.2.

P
ipeline architecture

................................
................................
..............................

45

3.3.3.

State Machine

................................
................................
................................
........

47

Sequence Queries

................................
................................
................................
.............

47

Every Operator

................................
................................
................................
.................

47

Design Decisio
ns in Sequence Processor

................................
................................
.........

48

Pattern Queries

................................
................................
................................
.................

49

Kleene Star Operator

................................
................................
................................
........

50

Design Decisions in Pattern Processor

................................
................................
.............

51

3.3.4.

Processor Architecture

................................
................................
...........................

51

Executors

................................
................................
................................
..........................

51

3.3.4.1.

Event Generators

................................
................................
................................

52

3.3.5.

Query Object Model

................................
................................
..............................

53

3.3.6.

Query parser

................................
................................
................................
..........

55

3.3.7.

W
indow

................................
................................
................................
.................

56

Time Window

................................
................................
................................
...................

56

Batch window

................................
................................
................................
...................

57

Time Batch Window:
................................
................................
................................
........

57

Length Batch Window:

................................
................................
................................
.....

58

3.3.8.

“UNIQUE” Support

................................
................................
...............................

59

4.

Implementation

................................
................................
................................
.................

60

4.1.

Process models

................................
................................
................................
..........

60

4.1.1.

S
crum Development Process

................................
................................
.................

60

4.1.2.

Test
-
driven development (TDD)

................................
................................
...........

62

4.2.

Version control

................................
................................
................................
..........

63

4.3.

Project management

................................
................................
................................
..

63

4.4.

Coding Standards & Best Practices Guidelines for Siddhi

................................
.......

64

4.4.1.

General

................................
................................
................................
...................

64

4.4.2.

Java Specific

................................
................................
................................
..........

66

4.5.

Pro
filing

................................
................................
................................
.....................

66

4.6.

Benchmark

................................
................................
................................
................

68

4.7.

Documentation

................................
................................
................................
..........

70

4.8.

Web Site

................................
................................
................................
....................

71

4.9.

Bug track
er

................................
................................
................................
................

72

4.10.

Distribution

................................
................................
................................
............

73

5.

Results

................................
................................
................................
..............................

74

5.1.

Performance testing

................................
................................
................................
...

74

6.

Discussion and Conclusion

................................
................................
...............................

76

6.1.

Known issues
................................
................................
................................
.............

76

6.2.

Future work

................................
................................
................................
...............

76

6.2.1.

Incubating Siddhi at Apache
................................
................................
..................

77

6.2.2.

Find out a query language for Siddhi

................................
................................
....

77

6.2.3.

Out of order event handling

................................
................................
...................

77

6.3.

Siddhi Success Story

................................
................................
................................
.

77

6.
4.

Conclusion

................................
................................
................................
.................

78

Abbreviations

................................
................................
................................
...........................

80

Bibliography

................................
................................
................................
............................

81

Appendix A

................................
................................
................................
..............................

84




1.

INTRODUCTION

1.1.

Complex
E
vent processing

Data pr
ocessing is one of the key functionality in computing. Data processing refers to a
process that a computer program does to enter data and then analyze it to convert the data
into usable information.
Basically, data is nothing but unorganized facts and whic
h can be
converted into useful information
. Analyzing data, Sorting data and Storing data are a few
major tasks involved in data processing.

Data processing has a tight connection with Event
-
Driven Architecture (EDA)
. EDA can be
viewed as a subset

data processing where, in EDA, a stream
of
data (
can be
called
as
events
)
is processed
.

One might think an Event is just another data which has a time
-
stamp. But an
Event has a broader meaning. One of the better ideas about the relationship between data an
d
events is “Data is derived from the Events”. So the Events are different from data. Actually
the event representation contains the data.

During the last half a decade, Complex Event Processing (CEP)
has been

one of the most
rapidly emerging fields in dat
a processing. Due to
the
massive amount of business
transactions and numerous new technologies like RFID (Radio Frequency Identification), it
has now become a real challenge to provide real time event driven systems that can process
data and handle high in
put data rates with near zero latency (nearly
r
eal
-
t
ime).

The basic functionality of the complex event processor is to match queries with events and
trigger a response immediately. These queries describe the details about the events that the
system needs
to search

for

within the input data streams. Unlike traditional systems like
Relational database systems

(RDBMS)

which are operating with hundreds of queries running
for short durations against stored static data, event driven systems operate with stored q
ueries
running constantly against extremely dynamic data streams. Actually, an event processing
system is the
inverted
version of a database where the search queries are stored in the system
and matched against incoming data. Hence, Complex Event Processin
g is used in systems
like data monitoring
centers
, financial services, Web analysis, and many more, where
extremely dynamic data is being generated.

In
the
abstract, the tasks of the CEP is to identify meaningful patterns, relationships and data
abstracti
ons among unrelated events and fire an immediate response such as an Alert
message.

Examples:



Search a Document for a specific key word



Radio Frequency Identification (RFID)



Financial market transaction pattern analysis

1.2.

Aims and Objectives

The main aim

of our project is to implement a 100% open source high performance complex
event processing engine. There are several commercial and very few open
-
source CEP
engines currently available. Most of them were implemented early in this decade and now
they have

become stable. Since they were implemented some time ago, there can be
improvements that can be done to those

CEP implementations even
at

the architectural level.
But since they have become stable, we can see that there is no tendency to further improve
t
heir system base. So our main aim is to identify those weaknesses and implement a better
open source CEP implementation on a latest JDK.




Carry out a literature survey, compare & contrast different implementations of Event
Processing Engines, and come up w
ith an effective architecture that can process any type
of external event streams which is computationally efficient.
The factors to look for are
the support for

high speed processing with low memory consumption.



Implement
the basic Complex Event
Processing engine framework



Build up

the planned features for the engine on top of the written framewo
rk



Do Java profiling in several iterations on the code base and improve the efficiency.

This
makes sure that there are no/less overhead due to
the
code wr
itten

2.

LITERATURE SURVEY

2.1.

Background

Following sections will provide the details of the background of this project. It includes
describing what Complex Event Processing systems are,
why there are need for CEPs,
and
their general use cases.

2.1.1.

W
hat is
Complex
Event Processing
?

Event processing can be defined as a methodology
that

performs predefined operations on
event objects including
analyzing,
creating, reading, transforming or deleting them.
Generally, Complex Event Processing (CEP) can be defined as an em
erging technology that
creates actionable, situational knowledge from distributed message
-
based systems, databases
and applications in real time or near real time. In other words, a CEP software
implementation aggregates information from distributed system
s in real time and applies
rules to discern patterns and trends that would otherwise go unnoticed. In another view, we
can identify a CEP
as

a database turned upside
-
down.

That is because

i
nstead of storing the
data and running queries against
those
stored

data, the CEP engine store
s

queries and run the
data through

them

as a stream
. So
,

Complex Event Processing is primarily an event
processing concept
that
deals with the task of processing events from several event streams
and identifying the meaningful events within the event domain and fire when a matching
was

found based on the rules provided. Basically, CEP uses techniques such as detection of
complex pa
tterns of events, event hierarchies, and relationships between events such as
causality, timing etc.

Table
1

Comparison between

Database Applications and Event
-
driven Applications


Database Applications

Event
-
driven Applications

Query Paradigm

Ad
-
hoc queries or
requests

Continuous standing queries

Latency

Seconds, hours, days

Milliseconds or less

Data Rate

Hundreds of events/sec

Tens of thousands of
events/sec o
r more

2.1.2.

Why Complex Event Processing
?

The
IT industry is getting complex
day by day
,

and
the state of
today’s industry can
be
identified as an event driven era.
I
n modern enterprise software systems, events are a very
frequent comm
odity. Thus, extraction of what i
s relevant and what is not can be
a nightmare
,

especially when ther
e are thousands of changes taking

place per second. So Complex Event
Processing is a new way to deal with applications where
many agents produce

huge amounts
of data per second and
we

need to transform data into reliable in
formation in a short
period

of
time. These applications consist of massive amounts of data being produced
per second

where traditional computing power (
hardware and software) won’t have the enough capacity
.
Therefore

when
we

need to process massive
amount
of
incomin
g events in real time
the
classical methodologies

will fail
,
where the CEP
comes in to play.

For example, say you
have to trace the increase of stock pric
es on a particular set of
companies.
The traditional

process
was to

put
all of them in

a database under a particular
schema and at the end of the day, go
through the whole data base monitoring
whether there
was an increase

in stock price
. Generally, there

will be
hun
dreds of such companies to track

and there will be thousands of
buying and
selling

per second on the stock market.
Therefore
,

the database will be enormous
.

T
his is where
the real

need of a CEP engine
arises
, whic
h
gives you real time feed back

based on
the

specified
business rules
while saving

both time
and space.


As
explained,
unlike

traditional static data analysis methodologies,

the
CEP engines are
event
-
driven: the

logic of the

analysis is applied in advance, and each new event is processed
as soon as it arrives,

immediately updating all high level information and
triggering any rules
that
have

be
en

defined. With CEP, businesses can map discrete events to expected outcomes
and relate

a

series of events to key performance indicators (KPIs).

Through this

CEP gives
businesses

more

insight into which events will have th
e greatest operational impact

by
helping

them to
seize opportunities and mitigate risks.


2.1.3.

CEP General Use C
ases


In earl
y age of
Complex Event Processing

S
ystems

they were used for
monitoring

stock
trading systems
, and many still believe that

is the major use case of

CEP.
But
in the present
days
there are so many other
interesting applications of CEP,

especially
across the IT
industry, Financial M
arkets and
Manufacturing Organizations
.

They are as follows,

Cleansing and validation of data
: CEP
s can recognize patterns in the data and filter out
anomalous events which fall outside recognized parameters.

Alerts and notifications
: CEP engines can monitor event streams and detect patterns, and
notify them by hooking into email servers; posting messa
ges to web services and etc. (i.e
.

a
real time business system should be able to send notifications when problems occurred.)

D
ecision making systems
: CEPs are used in automated business decision making systems
that take

current conditions in
to its
knowledge base.

Feed han
dling
: Most CEP platforms come
with many in
-
built feed handlers for common
market data formats.

Data standardization
: CEP engines are capable of standardizing data of the same entity from
different sources within a common reference schema.

2.2.

Terminology

This section covers a small set of basic terms related to event processing.
However

much
event processing technologies
have progressed, still there is no standardized terminology
defined for even the basic terms such as ‘event’, ‘event stream’, ‘event processing’. The
definition varies slightly depending on the implementation and the product. There is an
ong
oing effort to
standardize this

[
1
]
. The definitions here give a generic idea of the terms,
and shows distinctions between different implementations.

The

basic term that is excessively used in Complex Event Processing

is ‘event’. And it is one

term that i
s misused most of the time. Basically, an ‘event’ can be defined as anything that
happens, or is contemplated as happening. However, the term is mostly used for the
representation of an event. The authors
Event Process
ing Glossary

[
1
]

have generalized
this
by defining two separate terms, i.e. ‘event’ and ‘event object’. The term ‘event object’ refers
to the representation of a particular event. This can be a tuple, v
ector, or row implementation.

Events can be two
-
fold, simple events
or

composite (complex) events. A simple event refers
only to an occurrence of a single event. A complex event is an aggregation of several events.
To understand this simply, let’s take
the

example o
f

stock
-
market transactions. There, each
buying and selling of stock is a simple event. The simple event object
may consist of the
stock symbol,
the buying/sellin
g price

and volume
. However, if we record

all the buying and
selling of stocks of a
specific company, and return them as one event, that event can be
considered as a complex event. In this case, it consists of a number of simple events

[
2
]
.

The Event processing engine receives events through an
event stream.
Event stream is a
linearly ordered sequence of events which is ordered by the time of arrival. The usage of
event streams varies depending on the implementation. Some implementations allo
w event
streams of different types
for

events, while other implementation
s restrict

it to be of
a
pre
-
defined type. Siddhi for example, restricts the event type of a given stream. User has the
ability to create different streams to send events of different

type. This makes the
implementation clear and less confusing.

The processing of events comes with
a
variety of names. Mostly people call it ‘complex event
processing’ (CEP) or ‘event stream processing’ (ESP). CEP is the name we have used for
Siddhi throug
hout this report.

‘Query’ is
a
basic mean

of specifying the rules/patterns to match in the incoming stream of
events. It specifies the needed event streams,
the operations that need

to be performed on
those events, and how to output the outcome of the oper
ation. The outcome/output is also an
event, generally. Mostly we can regard
this

as a composite event if there are aggregator
-
functions specified in the query. The Query Language follows a SQL
-
like structure. There are
differences on the processing of the
query, but the language contains
many

similarities. For
example, each of these SELECT, FROM, WHERE, HAVING, GROUP BY clauses intends
to say the same thing though the processing would be different. Different CEP
implementations

use different query languages
, and there is no

standard for it. These query
languages
extend SQL with the ability to process real
-
ti
me data streams. In SQL, we send

a
query to be performed on the stored data rows in a database
-
table. In here, the queries are fed
to the system before
-
h
and, and the real
-
time streams of events are passed through these
queries performing the operations specified. The query will fire a new event when a match
for the rule/pattern occurs.

2.3.

Tools & Technology studies

Following sections describes the tools and
tech
nologies we have used for providing the basic
infrastructure to develop Siddhi
.

This
contains details about build management tools, version
controlling system, compiler generator etc.

2.3.1.

Compiler Generators

As a part of our literature survey we looked at
compiler generators. We have to construct a
compiler which generates the query object model from a query. One of the popular tools
which can be used to construct a compiler is ANTLR.

2.3.1.1.

ANTLR

ANTLR which stands for “
Another

Tool for Language Recognition” is a

tool which can be
used to create compilers, recognizers and interpreters. ANTLR provides a framework which
greatly supports

tree construction, translation, error reporting and recovery. ANTLR provides
a single syntax notation for specifying both lexer and

parser. This feature makes it easy for
the users to specify their rules.
It also

has a graphical grammar editor and a debugger called
ANTLRWorks which enhance the ease of usage further. ANTLR uses Extended Backus
-
Naur
Form (EBNF) grammars and supports man
y target programming languages like Java, C,
C++, C#, Ruby and Python. The parser generated by ANTLR is a LL(*) parser which
provides infinite
look ahead
.
Since ANTLR is a top
-
down parser it uses S
yntactic predicates
to resolve

ambiguities such as left factoring
which are

not supported by
native top
-
down
parsers
.

2.3.2.

Building and Project Management tools

2.3.2.1.

Apache Maven

Apache Maven is a project management tool. Maven is developed to make the build process
much easier. Initially Maven
was created to manage the complex build process of the Jakarta
Turbine project. Maven is rapidly
evolving and though its

newest

version is 3, still
version 2
is widely

used
.

F
eatures

of Maven

I.

Maven understands how a project is typically built.

II.

Maven makes
use of its built
-
in project knowledge to simplify and facilitate project
builds.

III.

Maven prescribes and enforces a proven dependency management system that is in
tune with
todays

globalized and connected project teams.

IV.

Maven is completely flexible for power

users; the built
-
in models can be overridden
and adapted declaratively for specific application scenarios.

V.

Maven is fully extensible for scenario details not yet covered by existing behaviors.

VI.

Maven is continuously improved by capturing any newfound best
practices and
identified commonality between user communities and making them a part of
Maven's built
-
in project knowledge.

VII.

Maven can be used to create projects files from build f
iles. By using
command
s

a.

Creating

an eclipse artifact for any source containi
ng a build script.

mvn eclipse:eclipse

b.

Creating

a
n

IntelliJ idea artifact using

mvn idea:idea


Project object model (POM):

The POM is a model for maven 2 which is partially built into
the maven main
engine. Pom.xml which is a XML based metadata file is the build file which
has the declarations of the components.

Dependency management model
:
Dependency management is a key part of the maven. The
maven dependency management can be adapted to most require
ments and its model is built in
to maven 2.This model is a proven workable and productive model currently deployed by
major open source projects.

Build life cycle and phases
:

These are

the interface
s

between its built
-
in model and the plug
-
ins
. The default

lifecycle has the following build phases
.

Figure
1

Maven Structure



validate

-

validate the project is correct and all necessary information is available



compile

-

compile the source code of the project



test

-

test the compiled source code using a suitable unit testing framework.
These
tests should not require the code be packaged or deployed



package

-

take the compiled code and package it in its distributable format, such as
a JAR.



integration
-
test

-

process and deploy the package if necessary into an
environment where integration

tests can be run



verify

-

run any checks to verify the package is valid and meets quality criteria



install

-

install the package into the local repository, for use as a dependency in
other projects locally



deploy

-

done in an integration or release enviro
nment, copies the final package to
the remote repository for sharing with other developers and projects.

Plug
-
ins
:
Most of the effective work of maven is performed using maven plug
-
ins. Following
is a part of a pom.xml file. Here we have used p2
-

feature
plug
-
in to generate a p2 feature.

<plugins>


<plugin>


<groupId>org.wso2.maven</groupId>


<artifactId>carbon
-
p2
-
plugin</artifactId>


<version>1.1</version>


<executions>


<execution>


<id>p2
-
feature
-
generation</id>


<phase>package</phase>


<goals>


<goal>p2
-
feature
-
gen</goal>


</goals>


</execution>


</executions>


</plugin>

</plugins>


2.3.2.2.

Apache ANT

Apache ANT is another popular build tool. The acronym ANT stands for


Another Neat
Tool

. It is similar to o
ther build tools like make and n
make. The major difference
when
compared to other build tools is
that
ANT is written in Java. So ANT is very much suitable
for Java projects. ANT uses XML to specify the build structur
e. Apache ANT provides a rich
set of operations that we can use to write build scripts. ANT is widely used in the industry as
the universal build tool for Java projects. ANT targets can be invoked by simple commands.
To run an ANT target called foo one ma
y just type ‘ant foo’ on the command prompt.

2.3.3.

Version Control Systems

Most of the open source projects are not developed by a single
developer
.
Usually the
projects are a team effort. Therefore

there should be a way to manage the source code. That is
the ta
sk of a version control system. A version control system manages files and directories
over time.

2.3.3.1.

Subversion

Subversion is a version
controlling

system which is
distributed under Apache/BSD
-
style
Open Source license. It is a replacement for the CVS version
controlling

system. Subversion
can

be

used by people on different computers.

Everyone can modify the source code of a project at the same time. If someone has
done
something
incorrectly
, we can simply undo those changes by looking
into project

history.

Subversion has an official API (which is not there in CVS). Subversion is written as a set of
libraries in C language. Although it was written in C it has langua
ge binding for
many
programming languages. So S
ubversion is a very extensible version control
ling

system.
JavaHL is the Java language binding of
Subversion
.

Though the default UI of the subversion
is a command line interface, t
here are many third party too
ls developed to provide better user
interfaces for different environments. For windows there is a client called Tortoise svn.
Subclipse and Subvise are two plugins for Eclipse IDE.

2.4.

CEP Implementation Related Study

The following are some of the projects whi
ch have
made

some significant effort
s

in the same
research area. Some of these have not
been completed
and some others projects have not yet
released their CEP engine. This shows that even though CEP is relatively an old concept
there
is

still
much

significant wor
k going on, and the there is no
CEP that took over this
market.

The following are some CEP engines and their brief distributions with their time line
.




Figure
2

S
ome CEP engines and their brief distributions wit
h their time line

2.4.1.

Some

W
ell
K
nown CEP
I
mplementations

Let
’s look at some of the well known Complex Event Processing engines in market with their
features and advantages & disadvantages.

2.4.1.1.

S4
[
3
]

[
4
]

S4
was

created and released by Yahoo! This is a framework for "processing continuous,
unbounded streams of data." The framework, allows for massively distributed computation
over data that is constantly changing. This was initially developed to personalize searc
h
advertising products at Yahoo! and Yahoo has now released this under Apache License v2.
The architecture of S4 resembles the Actors model

[
5
]
, providing semantics of encapsulation
and location transparency, thus allowing applications to be massively concurrent while
exposing a simple programming interface to application developers. This design choice also
makes it relatively easy to reason abou
t correctness due to the general absence of side
-
effects.

S4 was designed for big data with the ability to
capable mining information from continuous
data streams using user defined operators
. Though

S
4 design shares many attributes with
IBM’s Stream Proces
sing Core (SPC) middleware
[
6
]

architectural
ly S4 have some
differences.
S4 is believed to

achieve greater level of simplicity due to its symmetry

in its
design where
all

its

nodes in the cluster are identical and
there is no centralized control.
Further
S4 is accomplish
ed
by leveraging ZooKeeper
[
3
]

which is

a simple and elegant
cluster manageme
nt service that can be shared by many
systems in
a

data center.

The di
sadvantage of S4 is that

it allows
lossy failover
s
.
Where u
pon a server failure, processes are automatically
moved to a standby
mode and the state of the processes at
the time of frailer

are stored in local memory, and it allows
data loss

during th
is

handoff

process
. The state is
regenerated using the input streams. Downstream systems
must degrade gracefully. Further any Nodes cannot be added or removed from a running
cluster.

Figure
3

S4 architecture

2.4.1.2.

Esper/NEsp
er
[
7
]

[
6
]

EsperTech
[
7
]

brings Complex Event Processing (CEP) to mainstream with an Open Source
approach, ensuring rapid innovation with quality
productization
, support and services for
mission critical environments, from SOA to eXtreme Transaction Processing deployments.
EsperT
ech runs on Java 5 or Java 6 JVM Fully embeddable.

A tailored Event Processing Language
(EPL) allows registering queries in the
engine, using Java objects (POJO,
JavaBean
s
) to represent events. A
lis
tener class
-

which is basically also a
POJO
-

will then be called by the
engine when the EPL condition is
matched as events come in. The EPL
allows expressing complex matching
conditions that include temporal
windows, and join
ing

different event
streams,
as well as filter
ing

and
sort
ing

them.

[
7
]

The internals of Esper are made up of fairly complex algorithms primarily relying on state
machines and delta networks in which only changes to data are communicated across object
boundaries when required.

Esper is available under the GNU GPL license (GPL

also known as GPL v2). Esper and
NEsper are embeddable compone
nts written in Java and C#, thes
e are not servers by
themselves but are designed to hook into any sort of servers, and therefore suitable for
integration into any Java process or .NET
-
based pro
cess including J2EE application servers
or standalone Java applications.

Esper has a pull query API. Events in Esper allow object representation and dynamic typing.
Esper features a Statement Object Model API, which is a set of classes to directly constr
uct,
manipulate or interrogate EPL statements.

Figure
4

Esper architecture

Esper also has a Commercial version, and the disadvantage of Esper is that its free version
does not contain a GUI management, editor or portal application. Esper also does not
currently have a server. Esper
provides only a small number of key inputs and output adapters
through EsperIO, and provides an adapter framework.

2.4.1.3.

PADRES
[
6
]

[
8
]

PADRES (Publish/Subs
cribe Applied to Distributed Resource Scheduling) is developed by
Middleware Systems Research Group (MSRG) and University of Toronto. This is
an

enterprise
-
grade event management infrastructure that is designed for large
-
scale event
management applications. Ongoing research seeks to add and improve enterprise
-
grade
qualities of the middleware.

A publish/subscribe middleware

[
9
]

provides many benefits to enterprise applications.
Content
-
based interaction simplifies the IT development and maintenance by decoupling
enterprise components. The expressive PADRES subscription language supports
sophisticated interactions among component
s, and allows fine
-
grained queries and event
management functions. Furthermore, scalability is achieved with in
-
network filtering and
processing capabilities.

Figure
5

PADRES

broker network

Figure
6

PADRES
router architecture


2.4.1.4.

Intelligent Event Processor (IEP)
[
6
]

[
10
]

Intelligent Event Processor (IEP) is a
product of

CollabNet, Inc
. This is an

open source
Complex Event Processing (CEP) engine. IEP is a JBI Service Engine and is a part of the
Open ESB community. OpenESB is an open source project with the goal of building a world
-
class Enterprise Service Bus. An ESB provides a flexible and extens
ible platform on which to
build SOA and Application Integration solutions.


2.4.1.5.

Sopera

[
6
]

[
11
]

SOPERA

is a complete and proven SOA platform, which is
rigorously oriented to practical
requirements. Companies and organizations benefit from the SOA know
-
how integrated in
SOPERA during implementation of sophisticated SOA strategies.

SOPERA
has the ability to
predict

failure of the business process
by
monito
ring

the event

patterns
.
SOPERA detects Patterns which are schema based
and when it discovers a
certain
schema of events that lead
s

to a
failure of the business process and i
f all of the events of the
pattern
occur
red

within the time window, it fires a new complex event that alerts the

staffer in
advance about a process failure in the future. This provides the ability to react proactively


2.4.1.6.

Stream
-
based And Shared Event Processing (SASE)

[
12
]

The goal of SASE research project
conducted by
UC Berkeley and University of
Massachusetts Amherst

to design and develop an efficient, robust RFID stream processing
system that addresses the challenges in emerging RFID

deployment
s, including the data
-
information mismatch, incomplete and noisy data, and high data volume, and it enables real
-
time tracking and monitoring.


The paper

[
13
]

presented on SASE give insight to different algorithms used in their efficient
state machine implementation. SASE extends existing event languages to meet the needs of a
range of RFID
-
enabled monitoring applications. SASE supports High volume streams,
extr
act events from large windows which even spans up to 12 hours, and include flexible use
of negation in sequences, parameterized predicates, and sliding windows. This approach is
based on a new abstraction of CEP i.e., a dataflow paradigm with native sequen
ce operators
at the bottom,
and
pipelining query
-
defined sequences to subsequent relational style
operators.

SASE language supports not only basic constructs such as sequence and negation that
existing event languages have, but also offers

flexible use of
negation in event sequences,
adds parameterized predicates for correlating events via value based constraints, includes
sliding windows for imposing additional temporal constraints, and resolves the semantic
subtlety of negation when used together with sli
ding windows. Unlike previous work that
focuses on complex event “detection” (i.e., only reporting that an event query is satisfied but
not how), SASE explicitly report what events are used to match the query. This significantly
increases the complexity of

query processing.

SASE approach employs an abstraction of complex event processing that is a dataflow
(query
-
defined event sequence) paradigm with pipelined operators as in relational query
processing. As such, it provides flexibility in query execution,
ample opportunities for
optimization, and extensibility as the event language evolves.

The paper

[
13
]

provides a comparison between SASE and relational stream processor,
TelegraphCQ (TCQ)

[
12
]
, developed at the University of California, Berkeley. TCQ uses an
n
-
way join to handle an equivalence test over an event sequence. This certainly incurs high
overhead when the sequence length is hi
gh. Moreover, TCQ only considers equality
comparisons in joins. Therefore, temporal constraints for sequencing, e.g., “s.time > r.time”,
are evaluated only after the join. In contrast, SASE uses the NFA to naturally capture
sequencing of events, and the PA
IS algorithm to handle the equivalence test during NFA
execution, yielding much better scalability.

SASE also has some limitations where SASE does not handle Hierarchy of complex event
types, where the output of one query cannot be used as an input to anot
her. This assumes total
ordering of events, a known issue with this assumption arises in the scenario where a
composite event usually obtains its timestamp from one of its primitive events, when such
composite events are mixed together with primitive event
s to detect more complex events, the
assumption of total order on all events no longer holds. Further SASE language can be
extended to support aggregates such as
count(
)
and
avg(
)
but
these have not yet been
implemented.

2.4.1.7.

Cayuga

[
6
]

[
14
]

[
15
]

This project is part of 2007 AFRL/IF/AFOSR Minigrant titled “User
-
Centric Personalized
Extensibility for Data
-
Driven Web Applications,” by James Nagy (AFRL/IFED)

[
16
]
. This
minigrant focuses on Cayuga as a
stateful
publish/subscribe system for use in a graphical
programming model (also being developed at
Cornell) known as Hilda. An overview of both
systems can be found in the Minigrant Proposal.

Researchers at Cornell describe Cayuga as a general
-
purpose complex event processing
system

[
17
]
. The system
can be used to detect event patterns in event streams. The Cayuga
system is designed to leverage traditional publication/subscription techniques to allow for
high scalability

[
18
]
. This leads to compari
sons not only with other data stream management
systems, but also to publish/subscribe systems to demonstrate the applications and
capabilities of Cayuga.
The Cayuga system architecture is designed to efficiently support a
large number of concurrent subscr
iptions. Its core components include a query processing
engine, an index component, a metadata manager, and a memory manager.

One of the most novel components of Cayuga is the implementation of the processing engine,
which utilizes a variation of nondeterm
inistic finite automata

[
18
]
. However, the automata in
Cayuga are a generalization
of

the standard nondeterministic finite automata model. These
automata read relational streams, instead of a finite input alphabet. Also, the state transitions
are performed using predicates. The use of automata allows for the storing of input data and
new i
nputs can be compared against previously encountered events.

Cayuga requires users to specify their interests in the structured Cayuga Event Language
(CEL). Not every Cayuga query can be implemented by a single automaton. In order to
process arbitrary qu
eries, Cayuga supports re
-
subscription. This is similar to pipelining


the
output stream from a query is used as the input stream to another query. Because of re
-
subscription, query output must be produced in real time. Since each tuple output by a query
has the same detection time as the last input event that contributed to it, its processing (by re
-
subscription) must take place in the same epoch in which that event arrived. This motivates
the Cayuga Priority Queue, where the only form of query optimizati
on performed by the
Engine is to merge manifestly equivalent states events having the same time stamps to
be
process
ed

together and then update
d

to the automata’s Internal String Table (week referenced
hash table).

There is also research regarding a distri
buted implementation of Cayuga, known as
FingerLakes

[
19
]
.

2.4.1.8.

Aurora and
Borealis

[
6
]

[
20
]

[
21
]

[
15
]


The primary goal of the Aurora project
[
20
]

is to build a single infrastructure that
can
efficiently and seamlessly meet the requirements of demanding real
-
time streaming
applications. This project has been superseded by the Borealis

[
21
]

project.

Both Aurora and Borealis are described
as general
-
purpose data stream management systems

[
22
]

in the papers published by the creators at Brandeis University, Brown University, and
the Massachusetts Institute of Technology. The goal of the sy
stems is to support various real
-
time monitoring applications. The overall system architecture of Aurora and Borealis is based
on the “boxes
-
and
-
arrows” process
-

and work
-
flow systems

[
23
]
. Data flows t
hrough the
system as tuples, along pathways, which are arrows in the model. The data is processed at
operators, or
in
the boxes. After the last processing component, they are delivered to

an
application for processing
[
22
]
.


Figure
7

A
urora

P
ipeline
A
rchitecture

There are three types of graphs used
to monitor the
Aurora and Borealis

systems


latency
graphs, value
-
based graphs, and loss
-
tolerance graphs.
Monitoring
these graphs

are several
optimizations that these systems are capable of carrying out to decrease system stress. The
primary optimizations are the insertion of processing boxes, moving processing boxes,
combining two boxes into a single, larger box, reorde
ring boxes, and load shedding

[
22
]
.
Where l
oad shedding is one of the most important optimizations introduced
in these systems,
which
means that the number of tuples presented for processing are reduced

to end
the
overloaded
states.
In Aurora and Borealis systems,

load shedding is done in a manner
that

opting to drop the tuples relating to systems that are more tolerant of lost and missing data.

Borealis
being

the second generation system developed by
Br
andeis, Brown, and MIT

[
24
]

they have improved and integrated the
stream processing functionality
of
Aurora system,
and
also integrated the
distribution techniques

of
Borealis

from a project known as Medusa

[
25
]
.

It should also be noted that the Aurora team has
now
commercialized the Aurora project
through StreamBase

[
23
]
.

2.4.1.9.

TelegraphCQ
[
6
]

[
15
]

[
26
]

TelegraphCQ was developed by
the Univer
sity of California at Berkeley, and it was
designed
to provide event processing capabilities alongside
the
relational database management
capabilit
ies by utilizing the PostgreSQL

[
27
]
.
Since PostgreSQL is
an open source database
they have modified its e
xisting architecture to allow continuo
us queries over streaming data
[
28
]
.

TelegraphCQ
focus
es

on the issues such as scheduling and resource management fo
r
groups of queries, support for out
-
of
-
core data,
allow
variable adaptively, dynamic QoS
support, and parallel cluster
-
based processing and distribution.

Further this also

allows
multiple simultaneous notions of time, such as logical sequence numbers or p
hysical time.

TelegraphCQ uses
different types of windows
to
impose different requirements of the query
processor and its underlying storage manager

[
27
]
. One fundamental issue
TelegraphCQ
has
is
to do with

the use of logical (i.e., tuple sequence number) vs.

physical (i.e., wall clock)
timestamps. If the former is used, then the memory requirements of a window can be known
as
a priori, while in the latter case, memory requirements will depend on fluctuation
s in the
data arrival rate. Another issue related to memory requirements
is
to do with the type of
window used in
the

query. Consider the execution of a MAX aggregate over a stream. For a
landmark window, it is possible to compute the answer iteratively by

simply comparing the
current maximum to the newest element as the window expands. On the other hand, for a
sliding window, computing the maximum requires the maintenance of the entire window.
Further

the direction of
movement

and the “hop” size of the windows (the distance between
consecutive windows defined by for loop) also have significant impact on query execution.
For instance, if the hop size of the window exceeds the size of the window itself, then some
portions of the
stream are never involved in the processing of the query.

There are several significant open problems in TelegraphCQ

with respect to the complexity
and quality of routing policies: understanding how ticket based schemes perform under a
variety of workload
s, and how they compare to (NP
-
hard)

optimal schedule computations,
modifying such schemes to adjust the

priority of individual queries,

and evaluating the
feasibility (in terms of computational complexity and quality) of more sophisticated schemes.

Routin
g decisions could consume significant portions of overall execution time. For t
his
reason, two techniques play

a key role in TelegraphCQ: batching tuples, by dynamically
adjusting the frequency of routing decisions in
order to reduce per
-
tuple costs,

and f
ixing
operators, by adapting the number and order of operators scheduled with each decision to
reduce per operator costs.

Since
TelegraphCQ has
designed with a storage subsystem that
exploits the sequential write workload

and

with a

broad
cast
-
disk style re
ad behaviour, q
ueries
accessing data that spans memory and disk also raise signif
icant Quality of Service issues

in
terms of deciding what work to

be

drop
ped

when the system is in danger of falling behind the
incoming data stream
.

C
urrently
the developers
of TelegraphCQ
are
extending the Flux module

of
TelegraphCQ

to
serve as the basis of the

cluster
-
based implementation
.

Also
has been a spate of work on

sharing work across queries are related to the problem of multi
-
query optimization

which was
o
riginally
posed by Sellis, et al.,
[
29
]

by

the group at IIT
-
Bombay

[
30
]

[
31
]

[
32
]
.

It should also be noted that though
TelegraphCQ
is licensed under BSD license there is also a
commercialized
version of
TelegraphCQ

named
the Truviso event processing system.

2.4.1.10.

STREAM
[
6
]

[
33
]

STREAM is a Distributed Data Stream Management System, produced at Stanfo
rd
University
[
33
]
. The goal of STREAM is to be able to consider both structured data streams
and stored data together. The queries over data streams are issued declaratively, but are
then
translated into flexible physical query plans. STREAM system include
s

adaptive approa
ches
in processing and

load shedding, provide
s

approx
imate answers, and also manipulates
query
plans during execution.

In STREAM, queries are independent units that

logically generate separate plans, but t
hose

plans
are then
combined by the system and ult
imately result in an Aurora
-
like mega plan.

One of the notable features of the STREAM system is its subscription language, known as t
he
Continuous Query Language (
CQL
)
. CQL features two layers


an abstract semantics layer
and an implementation of the abs
tract semantics.
Here the
implementation of the abstract
semantics

uses SQL to express relational operations and adds extensions for stream
-
related
operations.

Currently
STREAM
has
several
limitations
such as merging sub expressions with different
window
sizes, sampling rates, or filters
. This is because it’s handling
resource sharing and
approximation separately.
As t
he number of tuples in a shared queue at any time depends on
the rate at which tuples are added to the queue,

and

the rate at which the slow
est pare
nt
operator consumes the tuples,
where when
the

queries with common
sub expression
s

produce
s

parent operators
to handle the tuples in

different consumption rates, then it
is

preferable not to use a shared
sub plan
,
which the

STREAM

is
not
currently

handing
.

STREAM was released under
BSD license
and a
ccording to the STREAM homepage the
project has

now officially wound down
[
33
]
. Now
it is

used as the base for the

Coral8 event
processing engine.

2.4.1.11.

PIPES

PIPES

[
34
]

is

developed by University of Marburg. It’s a flexible and extensible
infrastructure providing fundamental building blocks to implement a data stream
management system (DSMS). PIPES cover the functionality of the Continuous Query
Language (CQL). The First an
d last public release of PIPES was done on 2004 under GNU
Lesser General Public License.

2.4.1.12.

BEA WebLogic
[
15
]

BEA Systems
developed the
WebLogic Real Time and WebLogic Event Server
system
which focuses

on enterprise
-
level system architectures and service integrations.

BEA
WebLogic
focuses

on event
-
driven service
-
oriented architecture which provides a
complete event processing and event
-
driven service
-
oriented architecture infrastructure that
supports high
-
volume, real
-
time, event
-
driven applications.
BEA
WebLogic

is one of the few
commerci
al offerings of a complete, integrated solution for event processing and service
-
oriented architectures.

BEA
WebLogic
system includes a series of Eclipse
-
based developer tools
for easy
development and some a
dministration tools for monitoring throughpu
t, la
tency, and
to
monitor other
statistics
.
As
BEA
WebLogic
had been
acquired by Oracle Corporation
,
Oracle

have released some n
on
-
programmatic interfaces
to allow
all interested parties to configure
queries and rules for processing event data.

2.4.1.13.

Coral8

[
15
]

The Coral8 event processing tool is designed to process multiple data strea
ms and hand
heterogeneous stream data.
Coral8

has the capability of

processing operations that require
filtering, aggregation, correlation (including correlation across streams), pattern matching,
and other complex operations in near
-
real
-
time

[
35
]

[
36
]
. Coral8 Engine is composed of two
tools, which are the Coral8 Server and the Coral8 Studio,
and
Coral8
also comes with
a
Software Development Kit (SDK)

to perform further optimizations
.

Coral8 Server is the heart of Cora
l8

[
35
]

which provides c
lustering support

[
37
]
.
Coral8
Server
also includes
features
such as

publication of status data stream that can be used to
monitor performance and activity of the server
by providing

Simple Network Monitoring
Protocol (SNMP) to be used by management consoles and
monitoring

frameworks
.

Coral8 Studio
provides an IDE
-
like in
terface
which allows the
Administrators
to add and
remove

queries
,

and

input and output

data streams
. This uses a
subscription language
called
Contin
uous Computational Language (
CCL
)

to mange queries
.

2.4.1.14.

Progress Apama

[
15
]


The Progress Apama event stream processing platform
[
38
]

consist of several tools, including
an event processing engine, data stream management tools, event visualization

tool
s
, adapters
for converting external events into internal events, and
some
development tools.

The Apama technology was tested at the Air Force Research Laboratory
by Robert Fa
rrell
(AFRL/IFSA)
[
15
]

to

disprove the marketing claims of Apama, relating to the throughput and
latency.
These
result
s have shown

that Apama could process events at rates measured in
thousands of events per second

[
38
]
.


2.4.1.15.

StreamBas
e

[
15
]

The StreamBase event processing engine is
developed
based on

research from the
Massachusetts Institute of Technology, Brown Univ
ersity, and Brandeis University. This is
an improved version of the
Aurora project

[
39
]
.

StreamBase provides a Server and a Studio module

[
40
]
. The Serv
er module is designed to be
able to scale from a sin
gle
-
CPU to a distributed system
. The Studio is Eclipse
-
based and

this
not only
provides graphical (drag
-
and
-
drop) creation of queries, but also supports text
-
based
ed
iting of the queries, which
uses

StreamSQL.


2.4.1.16.

Truviso

[
15
]

[
41
]

Truviso is a commercial event processing engine

that’s
based on the TelegraphCQ project

of

UC Berkeley. The
most importa
nt feature of
Truviso is that it s
upports a fully
-
functional
SQL,
alongside a stream processing engine.


The queries

of
Truviso

are simply standard SQL with extensions that add functi
onalities for
time window

and event processing
.

In addition, the use of an integrated relational database
of
Truviso
allows for easy caching, persistence, and archival of data streams,
for
queries that
include not only real
-
time data, but also the historical data.

2.5.

Some Interesting
R
esearch
P
apers

Sidd
hi has taken inspiration from some valuable research papers when designing its
architecture. We have read several papers, and
then
chose the algorithms that most suitable
for us, and
perform

better.

2.5.1.

Event Stream Processing with Out
-
of
-
Order Data Arrival
[
42
]

This paper provides
an
in
-
depth ar
chitectural and algorithm
ic

view
to manage
arrival of

out
order

data

(
with respect to

time
s
tamp). This is becaus
e there’s a possibility
for the
CE
P to
lose its accuracy
in real time, due to network traffic and other factors. The system present
ed
here is
very much similar to SASE

[
12
]

where i
t
also
uses stacks
to handle Event arrivals
.
Here they have
provide
d

this as a feature which can act upon out of order events
,

by

enabli
ng

all stacks at the beginning
to
maintain

a clock to check
for
the out of order events
arrivals
from
their

timestamp
. They have also provided an
algorithm

to handle

the identified out of
order events in
this pa
per. This will be useful only for the projects having similar design to
SASE
.

2.5.2.

Efficient Pattern Matching over Event Streams
[
43
]


This paper
focuses on
Rich
er
Query Languages for efficient processing
over Streams.

Their
query evaluation framework is based on three principles: First, the evaluation framework
should be sufficient for the full set of pattern queries. Second, given such full support, it
should be

computationally efficient. Third, it should all
ow optimization in a principle
way.


To achieve the above they have tested using a

Formal Evaluation Model,
NFA
b
, which

combines a finite automaton with a match buffer.
When testing SQL
-
TS, Cayuga, SASE+,
CEDR they have found
SASE+
to be much
richer
and more useful
.

2.6.

What
W
e
H
ave
G
ained from the
Literature

S
urvey

Literature Survey has greatly helped us to understand the different implementations of
present CEP engines and

also get to know about their pros and cons. Through this survey we
were able to come up with the best architecture to Siddhi
-
CEP by understanding how other
CEP engines are implemented.

From the
literature

review we
found out

pipelines

will be the most app
ropriate
model for
Event passing, and therefore we decided to build our
core
use producer

consumer
architecture
having an
Aurora like structure
,

to obtain high performance

[
44
]
. Further there we
also

found

an interesting paper

[
45
]

to help us implement

the

Query Plan Management with will not only
imp
rove efficiency but also allows approximation of data in Stream Management
.



3.

SIDDHI DESIGN

This cha
pter discusses all the major sections of our project in terms of system architecture
and design. We discuss the basic System Architecture in detail with the use of various
different diagrams such as architecture diagrams, use case diagrams, class diagrams,

etc.

This document also discusses about system design considerations which we thought before
starting the implementation phase. Since the significant factor which makes this project
different from other projects is performance, and therefore in this secti
on we discuss more on
how we have achieved performance through our system design and the problems we faced.

3.1.

Siddhi Architecture


Figure
8

Siddhi high
-
level architecture


Input Adapters

Siddhi receives events through input adapters. The major task of input adapters is to provide