FIRST Agenda Template

looneyvillestaticΛογισμικό & κατασκευή λογ/κού

15 Αυγ 2012 (πριν από 5 χρόνια και 28 μέρες)

518 εμφανίσεις



Project Acronym:

FIRST

Project Title:

Large scale information extraction and
integration infrastructure for supporting
financial decision making

Project Number:

257928

Instrument:

STREP

Thematic Priority:

ICT
-
2009
-
4.3 Information and Communication
Technology


D
2.2

Conceptual and technical integrated architecture
design

Work Package:

WP
2



Technical analysis, scaling strategy and
architecture

Due Date:

30/09/2011

Submission Date:

30/09/2011

Start Date of Project:

01/10/2010

Duration of Project:

36 Months

Organisation Responsible
for

Deliverable:

ATOS

Version:

1.0

Status:

Final

Author N
ame(s):

Mateusz Radzimski,
Murat
Kalender

(ATOS), Miha Grcar
(JSI), Achim Klein, Tobias
Haeusser (UHOH), Markus Gsell
(IDMS), Jan Muntermann
(UGOE), Michael Siering (GUF)


Reviewer(s):

Achim Klein

Irina
Alic

UHOH

UGOE

Nature:


R


Report

P


Prototype


D


Demonstrator

O
-

Other

Dissemination level:


PU
-

Public


CO
-

Confidential, only for members of the
consortium (including the Commission)


RE
-

Restricted to a group specified by the
consortium (including the Commission Services)

Project co
-
funded by the European Commission within the Seventh Framework Programme (2007
-
2013)








D2.2



Revision history

Version

Date

Modified by

Comments

0.1

18
/
04
/
2011

Mateusz Radzimski (ATOS)

Early version of ToC provided

0.2

13/06/2011

Mateusz Radzimski (ATOS)

Primary content for “R
eq畩牥浥湴
s

A
湡ly獩s
” section

〮0

〴⼰㜯㈰ㄱ

䵵牡琠䭡汥湤er

⡁呏pF

Primary content for “
f湴ngra瑩潮o
a灰牯pch
” section

〮0

ㄱ⼰㜯㈰ㄱ

䵡瑥畳t⁒a摺業獫椠⡁s体l

m物浡ry c潮瑲楢畴楯渠i漠

䅲c桩hec瑵牡氠灥牳灥c瑩ve


獥c瑩潮

〮0

㈶⼰㜯㈰ㄱ

䵡瑥畳t⁒a摺業獫椠
⡁呏pF
Ⱐ䵵牡琠ta汥湤lr
⡁呏pF

Added content to “Architectural
perspective” section
ⰠI摤⁁湮 x‱

c畲瑨u爠c潮瑲楢畴楯i

景f
“Integration Approach”

sec瑩潮o

〮0

㤯〸9㈰ㄱ

䵡牫畳⁇獥汬
f䑍匩

Added contribution to “Data
storage design principles”

〮0

㄰⼰㠯㈰ㄱ

䵡瑥畳t⁒a摺業獫椠⡁s体l

䑯a畭敮琠ue晡c瑯物tg an搠浩湯爠
c潲oec瑩潮o

acco牤楮g⁴漠
瑥汥c潮晥牥nce⁤楳捵獳 o湳
.

〮0

ㄸ⼰㠯㈰ㄱ

䵵牡琠䭡汥湤e爠rA呏pF

Added “Integrated GUI”
獵扳散瑩潮o

〮㠮0

㈹⼰㠯㈰ㄱ

䵵牡琠䭡汥湤e爠rA呏p⤬
䵡瑥畳t⁒a摺業獫椠
⡁呏pF
I

䵩桡⁇牣a爠rgpfF

Changes to „Integrated GUI”

c桡灴敲.

A
摤e搠
„Hosting platform”
c桡灴敲.

〮0

ㄴ⼰㤯㈰ㄱ

䵩桡⁇牣a爠rgpf⤬⁁c桩h
䭬h楮Ⱐ呯扩a猠䡡e畳ue爠
⡕䡏䠩I⁍ 牫畳⁇se汬
Ef䑍匩Ⱐaa渠䵵湴e牭rn渠
⡕䝏䔩Ⱐ䵩捨ae氠l楥i楮g
⡇商⤬⁍r瑥畳t⁒a摺業獫椠
⡁呏pF

Added chapter “Example of the
FIRST process”.




㈷⼰㤯㈰ㄱ

䵡瑥畳t⁒a摺業獫椠
⡁呏p⤬⁍畲a琠ta汥湤lr
⡁呏p⤬⁍F牫畳⁒敩湨a牤琠
⡎䕘吩

Addressing reviewers’ comments,
䅤摥搠
“industrial perspective”
獵扣桡灴敲
⸠.潮瑲楢畴楯湳⁴漠
“Design” chapter.

〮㤵

㈹⼰㤯㈰ㄱ

䵡瑥畳t⁒a摺業獫椠⡁s体l

c楮慬⁖i牳楯渮

ㄮN

㌰⼰㤯㈰ㄱ

呯淡猠sa物r湴攠n䅔体l

c楮慬⁑i⁡湤⁰牥pa牡瑩o渠景n
獵扭楳i楯i








D2.2


Copyright © 2011
, FIRST Consortium

The FIRST Consortium (
www.project
-
first.eu
) grants third parties the right to use and distribute
all or parts of

this document, provided that the FIRST project and the document are properly
referenced.

THIS DOCUMENT IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTI
ES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PR
OCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS
DOCUMENT, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

----------------







D2.2


Executive Summary

This document
aims at performing

comprehensive

architectural analysis of the FIRST system.
Based on
both technical and functional requirements
,

a candidate architecture

and technical
design is defined

that
will ensure meeting

goals of

FIRST large
-
scale financial information
system
.


This analysis focuses also on investigating on system integration approach and choosing most
suitable architectural

patterns, mechanisms and technologies
.
It consists of technical details
concerning

high level organisation of subsystems, their

interaction and specific aspects of
technical components with a
n

emphasis on

performa
nce and high scalability.









D2.2


Table of Cont
ents


Executive Summary

................................
................................
................................
......

4

Abbreviations and acronyms

................................
................................
.......................

7

1.

Introduction

................................
................................
................................
............

8

2.

Requirement analysis

................................
................................
............................

9

2.1.

Relation of technical requirements and user requirements

...............................

9

2.2.

Architecturally significant requirements

................................
...........................

11

3.

FIRST architectural perspective

................................
................................
..........

14

3.1.

Overall project goals and architectural considerations

................................
....

14

3.2.

High level FIRST architectural view

................................
................................
.

15

4.

Integration approach

................................
................................
............................

18

4.1.

State of the art on integration approach

................................
..........................

18

4.1.1

Application Integration

................................
................................
..............

18

4.1.2

Enterprise Application Integration

................................
............................

21

4.2.

Pipeline processing

................................
................................
.........................

22

4.3.

GUI integration

................................
................................
................................

25

4.3.1

Introduction

................................
................................
..............................

25

4.3.2

Web Application Frameworks
................................
................................
...

26

4.4.

Data storage design principles

................................
................................
........

29

4.4.1

Storage paradigms

................................
................................
...................

29

4.4.2

Access layer

................................
................................
.............................

30

4.4.3

Medi
ation layer

................................
................................
.........................

31

5.

Design

................................
................................
................................
...................

32

5.1.

Detailed components interaction perspective

................................
..................

32

5.2.

Sample FIRST process

................................
................................
...................

35

5.2.1

Sample scenarios

................................
................................
.....................

35

5.2.2

Data acquisition and preprocessing pipeline

................................
............

37

5.2.3

Information extraction pipeline

................................
................................
.

38

5.2.4

Decision support models

................................
................................
..........

39

5.2.5

Data exchange between pipeline components

................................
.........

41

5.2.6

Role of the integration layer

................................
................................
.....

42

5.2.7

GUI integration

................................
................................
.........................

45

5.2.8

Role of the storage components

................................
..............................

45

6.

Deployment

................................
................................
................................
...........

47

6.1.

Hosting platform

................................
................................
..............................

47

6.2.

D
eployment scenarios

................................
................................
.....................

47

6.3.

Industrial perspective of the FIRST system

................................
.....................

48

7.

Conclusion

................................
................................
................................
............

49

References

................................
................................
................................
...................

50

Annex 1.

Requirements groups

................................
................................
............

52








D2.2


Index of Figures

Figure 1: Integrated architecture in the context of other workpackages

................................
.........

8

Figure 2: Architectural mechanisms applied to requirements (Eeles, 2005)

................................

11

Figure 3: FIRST High
-
level architecture


logical view

................................
...............................

16

Figure 4: Remote Procedure Invocation Architecture
(Hohpe & Woolf, 2003)

...........................

19

Figure 5: Communication model of a broker based messaging system
(Zeromq, 2010)

..............

19

Figure 6: Communication model of a brokerless messaging system
(Zeromq, 2010)

..................

20

Figure 7: Communication model of a broker based messaging system
(Kusak, 2010)

................

21

Figure 8: Messaging systems performance comp
arison results

................................
....................

23

Figure 9: ZeroMQ messaging patterns
(Piël, 2010)

................................
................................
......

24

Figure 10: ZeroMQ messaging patterns performance comparisons.

................................
............

25

Figure 11: Data Acquisition and Information Extraction pipeline integration architecture

.........

25

Figure 12: Client/server application and web application architectures (Howitz, 2010)

..............

25

Figure 13: Detailed component view (overview)

................................
................................
..........

32

Figure 14: Detailed component view (with highlighted interactions and data flow)

....................

34

Figure 15: An example of the topic trend visualization

................................
................................

35

Figure 16: DJIA vs.
smoothed time series of the sentiment index (see (Klein, Altuntas, Kessler,
& Häusser, 2011))

................................
................................
................................
.........................

36

Figure 17: The current topology
of the data acquisition and preprocessing pipeline.

..................

38

Figure 18: Canyon Flow: a “view” of a hierarchy of document clusters

................................
......

39

Figure 19: The Canyon Flow pipeline and the corresponding part of the Web
-
based integrated
GUI.

................................
................................
................................
................................
...............

40

Figure 20: Annotated document corpus serialized into XML.

................................
......................

41

Figure 21: Annotated document serialized into HTML and displayed in a Web browser

............

42

Figure 22: I
ntegration between data acquisition and information extraction components w/load
balancing.

................................
................................
................................
................................
......

43

Figure 23: Example of synchronous
high
-
level services invocation
................................
.............

44

Figure 24: Data
-
push mechanism for notification services

................................
...........................

45

Figure 25: Instrument Cockpit of MACOC application augmented with FIRST data (mockup)

.

48



Index of Tables

Table 1: Use case vs. Technical requirements cross matrix

................................
..........................

10

Table 2: Architecturally significant requirements analysis

................................
...........................

13

Table 3: GWT advantages and disadvantages

................................
................................
...............

27

Table 4: JSF advantages and disadvantages

................................
................................
..................

27

Table 5: Spring MVC advantages and disadvantages

................................
................................
...

28

Table 6: A comparison of notable Java web application frameworks (Raible, Comparing JVM
Web Frameworks, 2010)

................................
................................
................................
...............

29

Table 7: Main characteristics of relational and non
-
relational storage paradigms

.......................

30

Table 8: Portfolio selection scenario based on DJIA stocks and sentiment extracted from blog
posts (Klein, Altuntas, Kessler, & Häusser, 2011)

................................
................................
........

37










D2.2


©
FIRST

consortium


Page
7

of
53

Abbreviations and acronyms

DO
W

Description of Work

WP

Workpackage

TBD

To be defined

SOA

Service Oriented Architecture

API

Application Programming Interface

ESB

Enterprise Service Bus

UC

Use Case

PUB/SUB

Publish/Subscribe

REQ/REP

Request/Reply

JVM

Java Virtual Machine

CLR

Common Language Runtime

BOW

Bag of words

MVC

Model View Control

UI

User Interface

JSF

Java Server Faces

GWT

Google Web Toolkit








D2.2


©
FIRST

consortium


Page
8

of
53

1.

Introduction

The most important aspect of this document is to provide a communication

within the project
regarding

architectural and technical
point of view. Design
will
affect
all
technical components
developed in other workpackages by outlining possible interactions patterns, dependencies and
dataflow.
Architecture is also heavily influenced by articulated requirements, both technical and
use case.

For example, processing methods e
nvisaged for

analysing

huge data streams
provide
an important input that constrain architecture techniques and determine
further
technological choices.

Therefore the idea is to encompass all such requirements and
constraints into coherent design that will
enable for seamless future development of FIRST
system.

It is also important to keep the description

at proper level of abstraction to avoid
overengineering the design.

FIRST being a research project is driven by experiments and
improvements of current sta
te
-
of
-
the
-
art techniques, therefore
defining all details at this stage of
the project would be infeasible. Instead,
those details

will be presented along with prototype
release milestones

throughout project lifetime
in corresponding deliverables.

The
relat
ion of this document with other deliverables and technical workpackages has

been
presented in
Figure
1
.



Figure
1
: Integrated architecture
in the context of other workpackages

D
2
.
2
D
2
.
3
WP
3
WP
4
WP
5
WP
6
WP
7
influences
Integrated
architecture
Scaling strategy
Technical
components
D
1
.
2
D
2
.
1
Use case requirements
specification
Technical requirements
and state of the art





D2.2


©
FIRST

consortium


Page
9

of
53

2.

R
equirement analysis

This section is aimed at recapping and analysing

technica
l requirements described in D2.1

that
concern architecture and behaviour of the overall FIRST system.

We will ensure that
requirement analysis is sound and provide a firm ground for further technical architecture
design.
Therefore

we will study and evaluate requirements collected within D1.2 (use
-
case
perspective) and D2.1

(technical perspective) in order
to assess how business use cases
defined in WP1 are satisfied by

envisaged technical provisioning captured by D2.1 technical
requirements analysis
.
It is also important to analyze requirements with regard to the benefit it
brings to the overall system and
their
viability

within the
limits

of p
roject
’s

resources

or

technical
and
scientific
feasibility
.

It allows assigning requirements’ priorities accordingly, deciding on
which functionalities are essential for satisfying the use cases and project goals

and w
hich
should be treated as supplemental.

Significant part of this chapter is devoted to
analysis of architecturally significant requirements.
Those are requirements that have a clear, technical impact on the project and influence
architecture and design of
the system.

We will proceed with this task by choosing relevant ones
and

extend
ing

them if necessary by

providing
proposed
design and techn
ological details.

They

will serve
as

enablers of
the
architectural analysis.

2.1.

Relation of technical requirements and
user requirements

The use case requirement specification described in D1.2 from a comprehensive overview of a
system as seen by
use case stakeholders,

through the description of system functionalities,
non
-
functional attributes, actors, and
context.
For ea
ch of three use cases a similar list has
been provided.

By analysing all requirements altogether from a functional point of view,
eight

main category groups can be identified:

1.
Data feeding and acquisition

2.
Retrieval of topics and features

3.
Retrieval

and classification of sentiments

4.
User Interface and Service delivery

5.
Access control and security

6.
Configuration and maintenance

7.
Storage, persistence and data access

8.
Decision support and visualisati
o
n

Requirements falling into each category a
re closely related and they define a common fragment
of system functionality.
Note that Non Functional Requirements are orthogonal to the
aforementioned groups, thus not listed here.

The list of assignment of each requirement to the
group above has been pr
esented in

Annex 1
.


By clustering use case requirements into functional categories,

we can reduce the number of
items to analyse,

thus
it is viable to perform a requirement coverage breakdown using
requirement
s

traceability matrix
.

Table
1

shows
the coverage

of
each group
(horizontal axis)
by
relevan
t technical requirement

(vertical axis)
.


The analysis
simplifies multidimensional nature of
system requirements into 2
-
dimentional matrix. By
identifying cross
-
reference relationships
we
obtain

quantitative result indicating how
well each of functionaliti
es has been described in terms
of technological provisioning
.
Th
e

table should be read as: Technical requirement
R1.1
covers
some aspects of global functionality

nr 1.

The numbers in the row titled “Quantitative technical
coverage” denotes how many technical requirements are related
to

certain

functionality.






D2.2


©
FIRST

consortium


Page
10

of
53



Functionalities

Technical
r
equirement

name

Requirement
ID

1. Data
f
eeding and
acquisition

2. Retrieval of
topics and
features

3. Retrieval and
Classification of
s
entiments

4. User Interface & Service
delivery

5. Access control and
security

6. Maintenance and
configuration

7. Storage, persistence and
data access

8. Decision Support system
and visualisations

Internet connection
bandwidth

R1.1

x








Concurrent execution of
processes


hardware
infrastructure

R1.2

x

x

x





x

Memory and persistent
storage

R1.3







x


API for external access

R2.1




x





Flexibility of the
infrastructure

R2.2







x


Concurrent execution of
processes


software
infrastructure

R2.3

x

x

x






Logging & monitoring

R2.4

x

x

x


x



x

Stability

R3.1

x

x

x

x



x

x

Pipeline latency

R4.1

x

x

x






Pipeline throughput

R4.2

x

x

x






Document format and
encoding

R4.3

x





x



Interchange data format

R4.4

x

x

x






Data formats

R5.1

x






x


Unified access

R5.2







x


Ontology format

R6.1


x







Ontology availability

R6.2


x





x


Ontology purpose

R6.3


x

x






Ontology evolution

R6.4


x

x






Data acquisition pipeline


functionality

R7.1

x

x




x



Data acquisition pipeline


supported Web content
formats

R7.2

x

x







Information extraction
components

R7.3



x






Sentiment analysis

R7.4



x






Decision support models


features

R7.5








x

Decision support models


streams

R7.6








x

Programming / runtime
environment


data
acquisition pipeline

R8.1

x

x







Programming / runtime
environment


information extraction
pipeline

R8.2



x






Runtime environment


knowledge base

R8.3







x


Programming / runtime
environment


decision
support components

R8.4








x


Quantitativ
e


technical
coverage

13

14

12

2

1

2

7

6



High

High

High

Low

Low

Low

Medium

Medium

Table
1
:

Use case vs. Technical requirements cross matrix






D2.2


©
FIRST

consortium


Page
11

of
53


It can be
observed from a cross
-
reference analysis
that the coverage is lower for the storage
and decision
-
support components

(Category
8
)
. This is due to the fact that at this st
age of the
project, the data acquisition and preprocessing pipeline, as well as the information retrieval
framework, are relatively well defined and
early

implement
ations has already started
, while the
decision
-
support models are yet to be fully defined. F
or this reason,
fever

constraints have been
put on this part of the system to ensure enough flexibility when pursuing the use
-
c
ase goals at a
later time and t
hey will receive more technical coverage in
the scope of WP6

deliverables

according to the work pl
an
.
Consequently, the data storage components will need to adapt to
the system at that time in order to properly store the models, predictions, and potentially other
relevant data and metadata.


Categories 4, 5 and 6 have been c
overed relatively low
.
Category 4 (“User interface and service
delivery”) will be described further in the context of integration and GUI provisioning
(see
chapter
4.3
)
for building Integ
rated Financial Market Information System (WP7).

Categories 5
and 6
are

not

considered main research and technical

challenges within the project

and their
priorities

are

of least importance

in comparison with others
.

2.2.

Architecturally significant requirement
s

From architectural perspective some requirements are more important than the others.

Those
which
contain attributes or constraints related to architecture and design are called
a
rchitecturally significant requirements. The list of already defined require
ments in D1.2 and
D2.1 already contain such
. FURPS+ classification model
(Grady, 1992)

divide
s requirements

based on following characteristics (FURPS): Functionality, Usability, Reliability, Performance,
and Supportability. Tho
se requirements have

been

defined mostly in
(FIRST D1.2 Usecase
requirements specification, 2011)
. The “+” sign in FURPS+ model adds additional types, that
are
: Design, Implementation, Interface and Physical requirements
. They are
covered by
specification of additional technical requirements

in
(FIRST D2.1 Technical requirements and
state
-
of
-
the
-
art, 2011)
. While
latter requirement group usually define concrete design or
implementation features
thus making implicit influence on architecture, the former group
may
needs some further analysis and elaboration
(see
Figure
2
)
in order to define their effect on
arc
hitecture.



Figure
2
:

Architectural mechanisms applied to requirements

(Eeles, 2005)






D2.2


©
FIRST

consortium


Page
12

of
53


Table
2

presents
relevant
architecturally significant requirements

chosen from
(FIRST D1.2
Usecase requirements specification, 2011)

and
(FIRST D2.1 T
echnical requirements and state
-
of
-
the
-
art, 2011)

and
performs
short analysis on their
technical
impact on overall architecture.

Those aspects will be further considered when
defining

FIRST
system

design.

Requirement
ID
1

Analysis & Design mechanisms

Influence on architecture

R4.1


mi灥li湥
l慴敮捹


E慬獯

i渠畳攠捡獥s
湯n
-
f畮捴i潮慬
req畩r敭敮tsI

攮g. rC㌮P
-
mㄩ
I oQ.㈠
“Pipeline
throughput”

e畧攠摡t愠獴r敡m 灲p捥獳敤 i渠
䙉剓吠qill
d
慴愠pr潣o獳s搠楮 愠
灩灥li湥
f慳ai潮I w桩捨

r敱畩r敳
愠摩ffer敮t i湴n杲慴i潮 慰灲潡捨p
t桡t ty灩捡c plA
-
扡獥s
sy獴敭猠
r数r敳ent
.

䙯r i湳瑡湣nI w攠慲攠
慳獵r敤 t桡t 摡t愠楳a捯浩湧 i渠愠
捯c獴慮t str敡mI t桥refor攠
數灬i捩t
req略st猠mig桴 i湣牥慳攠
l慴敮捹
慮搠汯w敲e潶敲慬l
t桲潵g桰ut 批 i湤畣u湧 extr愠
traffi挠Eo㐮QF

j潳琠im灯rt慮t 慳ae捴 i猠t漠慮慬y獥s
潴桥o t桡n te打ervi捥
tpF 慮d plA
慰灲潡捨ps
I 敳e散e慬ly t桯獥⁳畩t慢l攠
f潲 獴r敡m 灲p捥獳cng

慮搠捯d灡r攠
t桥m
扡獥s o渠t桥ir 灥rf潲o慮捥 慮搠
t桲潵g桰ut 捡c慢iliti敳e

ft m慹 慬獯sr敳elt i渠獯m攠捯e灯湥nts

扥i湧 i湴n杲慴敤 f潬l潷in朠摩ffer敮tI
m潲o r潢畳t t散e湩q略sI whil攠潴桥r
El敳猠灥ef潲m慮捥 捯cstr慩湥搩d
f潬l潷i湧 tr慤iti潮慬 慰灲潡捨c献

rCㄮ1
-
䕕b


Al敲e o数潲o猠
慮搠䍯ck灩t


Al敲e猠慮d 湯tifi捡ti潮

fe慴畲敳 of
畳u 捡獥s 灲潶i摥猠慮 in摩r散琠
慳獵mpti潮 of 摥liv敲楮g
m敳獡g敳 慳a獯s渠慳 th敹
慰灥慲ai渠t桥 摡t愠獴r敡m.
qy灩捡c 捯c獴慮t 灯lli湧 f潲o
messages (“did alert X appeared
in the data”) might quickly
扥捯浥c畮獣慬慢l攠eit栠hr潷i湧
湵m扥r of 摡ta 慮搠regist敲敤
慬敲e献

py獴敭e獨s畬搠摥div敲

慬敲es 慮搠
notification in a “push” based manner,
摩r散瑬e t漠獵o獣si扥搠灡rti敳e

q桡t 慰pr潡捨c獨s畬搠扥 灲pf敲r敤 t漠
t桥 ty灩捡c r敱略sting f潲 摡ta E灯lli湧FI
t桡t mig桴 i渠t桥 f畴ur攠e散em攠湯t
effi捩敮t.

o㠮ㄬ oU.㈬
o㠮U

py獴敭e捯m灲楳ps

捯mp潮敮t猠
writt敮 畳u湧 摩ff敲敮t
t散e湯l潧i敳.

q桥refor攠ty灩捡c
灲pgrammati挠i湴n杲慴i潮 i猠湯t
灯獳s扬攮e
A 捯mm潮 i湴ngr慴楯渠
t桡t will 敮獵牥sr潢畳u 數捨c湧e
of 摡t愠扥tw敥渠摩ff敲敮t
數散eti潮
敮vir潮m敮ts

獨s畬搠
扥 灲潶i摥搮

A渠楮tegrati潮 mi摤l敷a
r攠e桡t will
捯c湥捴 獵捨 c潭灯湥nt猠獨s畬搠扥
慢l攠e漠捯mm畮i捡瑥 i渠n t散e湯logy
i湤数敮摥湴nm慮湥r.

o㜮T
-
S

慮d
o
-
b
㌮P

o
-
b
㌮P

q散e湩捡c 捯m灯湥nt猠inv潬v敤
i渠摡n愠ar潣o獳sng 慲a 灲潶idi湧
捥牴慩渠f畮cti潮慬iti敳e
畳敤 for
敮e
-
畳ur drf猠or integrati潮 wit栠
潴桥o 慰灬i捡瑩c湳
e.朮gu獥⁣慳s
im灬敭敮eati潮猩. q桡t r敱畩r敳
數灯獩湧 t桥m i渠n st慮d慲搠w慹
i渠愠form of
Amfs

E慢獴ra捴敤
from 畮摥rlyi湧 t散e湩捡c
捯浰潮c湴猩

t桡t

捡c 扥
bx灯獥s f畮cti潮慬iti敳⁳桯畬搠扥d
r敳em扬敤

i渠慲捨ct散瑵r慬 潶敲eiew

i渠愠
f潲m of

桩gh
-
l敶敬 獥牶i捥cy敲

捯浰ci獩ng Amf猠f潲 摥v敬潰i湧
f湴n杲慴敤 䙩湡湣n慬 j慲ket fnformati潮
py獴敭e慮搠啳d C慳a im灬敭敮e慴楯湳n




1

Requirements IDs are corresponding to their counterparts defined in

(FIRST D1.2 Usecase requirements
specification,
2011)

and
(FIRST D2.1 Technical requirements and state
-
of
-
the
-
art, 2011)






D2.2


©
FIRST

consortium


Page
13

of
53

Requirement
ID
1

Analysis & Design mechanisms

Influence on architecture

accessible for other interested
com
ponents.

R2.2


Flexibility of
the
infr
astructure


Architecture flexibility
requires
certain level of components
decoupling and integration

middleware should
possess such
characteristics.

This will also
support development and testing
process.

FIRST will follow component
-
based
design techniques

and integration
middleware
will

allow for clear
system
decomposition and
will support
deploying system components
independently.

R2.3


Concurrent
execution of
processes


software
infrastructure


To support c
oncurrent

processing
,

architecture should
eas
ily allow

techniques as
parallelisation from the very
beginning.
It affects middleware
layer as well as
individual
components.

Data distribution and results collecting
across different components should be
possible and easily supported by
architecture.

Su
ch techniques as load
balancing or distributed processing
should be supported
by architecture
as
well.

Table
2
: Architecturally significant requirements analysis






D2.2


©
FIRST

consortium


Page
14

of
53

3.

FIRST
a
rchitectural perspective

T
his chapter
creates

high level ove
rview of the FIRST architecture

in order to support

FIRST
project goals.
It also depicts
a coarse grained decomposition of

the main FIRST building blocks
and explains how different use cases are build on top of generic FIRST architecture.

3.1.

Overall project g
oals

and architectural considerations

The role of defining architecture is to encompass the project vision and defined requirements
into a common technical platform that will successfully
fulfil

the projects goals.
Architectural

analysis focuses on definin
g
candidate architecture and constraining the architectural
techniques to be used in the system

(IBM, 2009)
.
The input for this task is coming from
the
project goals,
use case analysis and requirements definition in order to provide a suitable
architecture that
will ensure that the overall project meets its objectives
. It especially
accommodates non functional and technical requirements that have
strong

influence on fin
al
design and implementation.

The result is a technical overview of organization and structure of
components
(IBM, 2009)
.

From the
global project

perspective,
the objective of the system is to improve decision making
process (
e.g. reduce investment risk, increase return, etc. See
(FIRST D1.1 Definition of market
surveillance, risk management and retail brokerage usecases, 2011)
) based on unstructured
data sources, such as news articles or blogs. For
that reason, system is required to process
loads of data and perform automatic analysis, that otherwise would be impossible.

Therefore,
the architecture should enable

to
:

-

process,

analyze
and make sense of
huge number of unstructured data

-

p
rovide results o
f financial news

(articles, blogs, etc)

analysis

and relevant information
extraction

in reasonably
short

time (
near real
-
time
)
,

-

integrate components and algorithms
in order to
automatically
process financial data and
allow to apply sophisticated decision
models
,

-

d
evelop a graphical
interface
to present and visualize data

relevant for decision making
.

Furthermore such architecture
(among

other characteristics
)

should ensure

maintainability,
extendibility and flexibility.

E.g.

flexibility
may

enable further
exploitation
possibilities;

while
maintainability and extendibility allows for seamless development process that may include
introducing changes and technical modifications during the development and further stages of
the project.

A
rchitecture should foll
ow component based structure. While the most important functionality
comes from the data processing pipeline

(FIRST D2.1 Technical requirements and state
-
of
-
the
-
art, 2011)

individual parts
of the system might be
also
subject for
further exploitation
separately
in the

future.
However,

technologies supporting modular architecture and decomposition, by
bringing a new layer of abstraction, usually impose some performance limitations or extra
message overhead. In case of SOA

integratio
n approaches

it may be a central broker or
application server for deploying services.

In FIRST we may divide the architecture
integration
into two parts:



Lightweight integration for pipeline processing


Integration of c
omponents
directly
involved in
processing

unstructured documents
(pipeline processing components)
, exchanging huge amounts

of data


the focus is
mostly on lightweight and performance
-
wise approaches that will ensure the project
goals.






D2.2


©
FIRST

consortium


Page
15

of
53



Classical SOA
-
based system integration



Integratio
n of other Integrated Financial Market Information system
components,
such as
services for constructing u
ser interface
,

use
case
s

implementation and accessing
stored
data


integration
should
follow

best known integration approaches
.

Also some features

of
components involved in pipeline processing might be exposed as non
-
strictly
performance
-
oriented services, i.e.
offering some fragments of the pipeline as a
separate service might provide added value for further project exploitation

or
components individua
l exploitation possibilities
.

Also reusing components as a service
could be viable.

For example, isolated sentiment analysis functionality can be wrapped
as a service and offered separately.

This part also contains

GUI
implementation that

will
be analyzed
later

in this document.

The overall
FIRST
architecture will accommodate

both
aforementioned

techniques in a coherent
way in order to satisfy project goals.

3.2.

High level FIRST architectural view

This section depicts architectural, high
-
level perspective of
FIRST system. It provides structural
view of components and explains their relationships within the other parts of the system. The
static analysis is covered within this chapter, while more detailed analysis

including dynamic
view is presented

in chapter
5
.
We will follow loosely adapted approach of presenting

architecture as a set of

different
but coherent
views of the system
and its environment
(IEEE
Std 1471
-
2000, Recommended Practice for Architectural Description, 2000)
.

From the software engineering perspective, t
he very heart of the project is

sentiment analysis
analytical pipeline for information extraction from unstructured source
s

(FIRST D2.1 Technical
requirements and state
-
of
-
the
-
art, 2011)
. It is
providing the

core functionality
and
serv
es

as a
common
part

providing
necessary

data to all use cases.
For allowing flexible implementati
on of
e.g. use cases functionalities
a standard API will be defined that will expose high
-
level services
to use case providers and
FIRST GUI
demonstrating

system’s capabilities. T
his API will
further
allo
w

other parties
to develop their own applications or

integrate it with their own systems

in
order to

offer new

services and added value

on top of
FIRST

system
.

Based on that descri
ption
a

multi
-
tier architecture is a suitable choice for describing the system.
From the analytical point of view, a multi
-
tiere
d architectural pattern allows to clearly distinguish
and separate different layers (tiers) in the system
(Fowler, 2002)
. In the FIRST system, the
logical
, high
-
level

view consist
s

of the following layers

(as depicted on

Figure
3
)
.







D2.2


©
FIRST

consortium


Page
16

of
53


Figure
3
:

FIRST High
-
level architecture



logical view

The system, as envisaged from the architectural point of view consists
of following parts,
depicted as layers. Starting from the top, those are:



Use case

and GUI

implementation layer


It consists of implementation of 3 FIRST Use
Cases, and also FIRST Integrated GUI

(Integrated Financial Market Information
System)
. It include
s end
-
user user interface
s

and provisioning of graphical widgets for
displaying

results of computation including: sentiments, alerts, event predictions,
decision support and data stream visualisations.

Moreover, FIRST Integrated GUI
provides an “entry poin
t” for showcase of FIRST functionalities.

All 4
parts

are
implemented accessing FIRST APIs

and include necessary technological provisioning
(e.g. web application deployment server).



High
-
level FIRST services layer (FIRST APIs)


set of higher level servic
es running on
top of the FIRST
Analytical

P
ipeline and providing necessary access to its computation
results. They provide all concrete functionalities offered by underlying technical
components, wrapping them as services, therefore forming a logical abstr
action from the
high performance lower level components.

Those services are delivered by technical
components implemented within
following
workpackages
:

WP
3,
WP
4,
WP
5,
WP
6 and
WP
7.



Middleware/integration layer


provides fast, robust and lightweight infras
tr
u
cture for
integration of different components of the FIRST Analytical Pipeline while supporting low
latency, high performance and throughput. It also offers advanced techniques for
general
pipeline scaling
,

as explained in

(FIRST
D2.3 Scaling Strategy, 2011)
.

This layer
integrates components developed within
following
workpackages
:

WP3, WP4, WP6 and
WP7, that take part in pipeline processing.



Pipeline layer


FIRST Analytical Pipeline is a set of components that are processing

stream of document in a sequential manner
.

While in principle we use the term “pipeline”
in singular form, it may consist of more number of parallel “pipelines”
balanced for
Computation
results
External
Datasources
1
2
3
4
5
Unstructured Data
Structured Data
FIRST Analytical Pipeline
(
s
)
FIRST Financial Marketplace Services
FIRST Lightweight Integration Layer
FIRST
Sentiment
Analysis services
FIRST
Visualisations
services
FIRST
Alerts Services
FIRST Information Integration
FIRST
Decision Support
Services
Financial time
-
series
Experts opinions
Blogs
Media coverage
Sentiments
Alerts
Event predictions
Decision Models
Sentiment history
Ontologies
Annotated documents
Decision support
Market Surveilance
Use Case
Risk Management
Use Case
Retail Brokeage
Use Case
FIRST
Integrated GUI
Use cases
and GUI
implementati
on layer
High
-
level
FIRST
Services
Layer
Middleware
Layer
Pipeline
components
layer
Data Storage
Layer
Reports
Visualisations
Recommendations
Ontology
Learning
Information
Extraction
Sentiment
Analysis
Decision
Support
Data
Acquisition
Ontology
Learning
Information
Extraction
Sentiment
Analysis
Decision
Support
Data
Acquisition





D2.2


©
FIRST

consortium


Page
1
7

of
53

handling
bigger data

streams.

The integration of those components
is

provided by
middleware/integration
layer
.



Data storage layer


an underlying set of storage services providing unified data
access

for supp
orting the pipeline operations such as: storing intermediate

documents
,

decision
models,

ontology versioning
,
or archiving comput
ation results
.

Design has been carried
out within WP5’s
(FIRST D5.1 Specification of the information
-
integration model, 2011)
.








D2.2


©
FIRST

consortium


Page
18

of
53

4.

Integration approach

In
the following sections we analys
e
common
integration approaches and choose
most
suitable
for the FIRST architecture.

4.1.

State of the art on integration
approach

The term integration, in computer science domain, expresses the process of making disparate
software modules work together by exchanging information in order to build a comp
lete
application or fulfil a task. Integration can be categorized into three types according to the
application areas

(Westermann, 2009)
: Application Integration

(AI)
, Enterprise Application
Integration (EAI) and Business Inte
gration

(BI)
.

In general, Application Integration makes applications to exchange information. Messaging is
one of t
he commonly used approaches in Application I
ntegration. Enterprise Application
Integration builds on application integration methodologies b
y dealing with integration and
orchestration of enterprise applications and services. Enterprise Message Bus and Enterprise
Service Bus are commonly used technologies in EAI. Business Integration

builds on EAI
methodologies which

deal with technical infras
tructure of an organization such as exposing
parts of business’ operations on the public internet for use of costumers

(Westermann, 2009)
.

A scalable and efficient integration platform is required in order to communicate components of
the FIRST analytical pipeline and build a complete system. In the following sections,
application
integration

and Enterprise Application
integration approaches
are presented in details, and then
their suitability’s for the FIRST pipeline is discussed.

Business Integration approaches are not
analysed because they are applied in business organization level for integrating several
complex systems. For this reason th
ese approaches are not suitable for the integration of
FIRST components to build FIRST system.

4.1.1

Application Integration

There are mainly four Application Integration approaches, which are: File Transfer, Shared
Database, Remote Procedure Invocation and Mess
aging

(Hohpe & Woolf, 2003)
.

In File Transfer approach applications communicate via files that can be accessible with
integrated applications. One application writes files and another application reads later on. An
agreement is

required on the filenames, locations, formats and maintenance of files between
applications. The following figure shows integration of two applications using the File Transfer
approach.


Figure 1: File Transfer Arc
hitecture
(Hohpe

& Woolf, 2003)

Shared Database approach is integration of applications using a single shared database. In this
approach, applications are able to access same database and information. Therefore, there is
no need to transfer information between applic
ations directly. One of the biggest difficulties with
Shared Database is design of the database schema. The following figure shows integration of
three applications using the Shared Database approach.






D2.2


©
FIRST

consortium


Page
19

of
53


Figure 2: Shared Database Architecture
(Hohpe & Woolf, 2003)

Remote Procedure Invocation approach integrates applications by exposing functionalities of
applications, which can be called remotely by other applications. Exposed functionalities can be
used for data transfer between

applications or modification of the data by external applications.
Web Services are examples of the Remote Procedure Invocation, which use standards such as
SOAP and XML .The following figure shows integration of two applications using the Remote
Procedur
e Invocation approach.


Figure
4
: Remote Procedure Invocation Architecture
(Hohpe & Woolf, 2003)

Messaging is exchange of messages between applications in a form of loosely coupled
distributed manner (
Java Message Service, 2011). Communication between applications can be
established using TCP network sockets, HTTP, etc... Messaging channels are opened and
applications transfer messages by sending and receiving messages through the channel. The
applicati
ons must agree on channel and message format for the integration. There are two
different models of how messaging can be done, which are broker and brokerless. In a broker
based messaging system, there is a messaging server in the middle. Every applicatio
n is
connected to the central broker. No application is speaking directly to the other application. All
the communication is passed through the broker.
Figure
5

shows

communication model of a
broker based messaging system.


Figure
5
: Communication model of a

broker based messaging system
(Zeromq, 2010)







D2.2


©
FIRST

consortium


Page
20

of
53

The advantages and disadvantages of broker based messaging sys
tems are:

Advantages



Applications don't have to have any idea about location of other applications.



Message sender and message receiver lifetimes don't have to overlap.



Resistant to the application failure.

Disadvantages



Excessive amount of network
communication causes performance decrease.



Broker is the bottleneck of the whole system. When broker

fails, whole system would
stop
working.

In a brokerless based messaging system, clients interact directly with each other. There is no
central messaging se
rver.
Figure
6

shows communication model of a brokerless messaging
system.


Figure
6
: Communication model of a brokerless messaging system
(Zeromq, 2010)

The advantages and disadvantages of brokerless messaging systems are:

Advantages



No single bottleneck.



High performance with less network communication.

Disadvantages



Each application has to connect to the applic
ations it communicates with and thus it has
to know the network address of each such application.



Application failures cause persistence and stability problems.

File Transfer, Shared Database and Messaging are data based integration approaches, which
enabl
e applications to share their data but not their functionality. Remote Procedure Invocation
enables applications to share their functionality, which makes them tightly coupled (dependent)
to each other. Remote calls are also much slower, and they are much
more likely to fail
compared to local procedure calls that may cause performance and reliability problems.

File Transfer and Shared Database allow keeping the applications well decoupled and therefore
different technologies can be used in the applications.

However, these approaches require
synchronization mechanism in order to inform integrated applications, when data is shared for
consumption. Moreover, these approaches require disk access to store and retrieve data, which
increases cost of communication b
etween applications.






D2.2


©
FIRST

consortium


Page
21

of
53

To integrate applications in a more loosely coupled, reliable way with high performance,
Messaging would be the most suitable approach as an application integration approach to
integrate the FIRST pipeline. Messaging is reliable since
messaging has retry mechanism to
make sure message transfer succeeds. Applications are synchronized to each other with
automatic notification of message transfer, which increases performance of a system.

4.1.2

Enterprise Application Integration

The term Enterpri
se Application Integration denotes the usage of tools to integrate enterprise
applications and services. EAI tools typically act as a middleware between applications.
Communication between the EAI tools and applications are handled by usage of a messaging
middleware inside. There are mainly four Enterprise Application Integration types, which are
:

Point
-
to
-
Point topology, Hub
-
and
-
Spoke topology, Enterprise Message Bus Integration and
Enterprise Service Bus Integration

(Kusak, 2010)
.
Figure
7

shows the architectures of the
integration approaches.


Figure
7
: Communication model of a broker based messaging system
(Kusak, 2010)

In Point to Point approach, a point to point topology is formed by direct interaction of
applications, which creates tight coupling between applications. This approach is generally used
when there are few applicat
ions and few interactions between them. In this type of integration,
integration logic is embedded into application, which removes central control, administration and
management.

Hub
-
and
-
Spoke topology is formed by interaction of applications via a central

Hub. The main
advantage of this topology is that less connections is required in this kind of topology compared
to the point to point topology. Also interactions can be managed centrally. However, Hub
becomes the single point of failure. When the Hub fail
s, whole integration topology fails.







D2.2


©
FIRST

consortium


Page
22

of
53

Enterprise Message Bus integration approach is an evolved version of the Hub
-
and
-
Spoke
approach. Messaging Bus forms the central part of the topology. Applications communicate with
Messaging Bus via message brokers or
adapters. Main advantage of this topology is a
messaging underlying the communication. Messaging offers performance, persistence and
reliability.

Enterprise Service Bus (ESB) is a software infrastructure, which is used as a backbone to SOA
implementations
. Gartner Group defines ESB as "a new architecture that exploits Web services,
messaging middleware, intelligent routing, and transformation. ESBs act as a lightweight,
ubiquitous integration backbone through which software services and application compone
nts
flow." ESB promotes less complexity, flexibility, low cost by combining benefits of standards
based Web Services with other EAI approaches
(Jude Pereira, 2006)
.

ESB would be the most suitable approach as EAI to integrate th
e FIRST pipeline with its
advantages compared to the other approaches. However it
is focused issues not present in
FIRST system (such as: integration of big number of components or content
-
based message
routing) that results in being too complex and adding

possible performance overhead in
information exchanging
. FIRST pipeline composes of few applications a
nd each of them has a
clearly defined communication schema (passing data from one to another in a component
chain)
. For this reason, Application Integrat
ion approaches specifically messaging would be
more suitable because of their sim
plicity and
better
performance in comparison

to

more
complex

ESB

middleware

designed for
much
large
r
scale integration
.

4.2.

Pipeline processing

FIRST pipeline has three major modu
les to be integrated: Data acquisition and Ontology
infrastructure (WP3), Semantic information extraction system (WP4) and Decision support
infrastructure (WP6). For the First semantic information extraction prototype WP3 and WP4
modules are integrated usi
ng the messaging approach. This section presents the design and
implementation of the messaging integration solution in details.

High performance
and

support for integration of components written in different technologies
(.
NET
,

Java)
are
most important
requirements for the FIRST integration approach

(FIRST D2.1
Technical requirements and state
-
of
-
the
-
art, 2011)
. For
programming language
independency,
messaging systems which support multiple platforms and programming languages a
re
investigated as a potential solution for the integration problem. Messaging systems that support
only specific
programming language

are eliminated during our survey of messaging systems.
The following popular messaging systems are chosen for further inv
estigation, after the
programming language

independency elimination: Zero
MQ
1
, ActiveMQ
2
, RabbitMQ
3
, OpenJMS
4
and Apache Qpid
5
.

Performance
is

the most important criteria’s when selecting the messaging platform

for the
pipeline integration
.
For this purpo
se, performance tests are done using a randomly generated
data with fixed message sizes. In the experiments, 1000 messages are transferred from one
application to another

running on the same machine
. Total transfer duration is used for
calculating throughp
ut of popular messaging systems
.
Figure
8

shows performances of these
messaging systems:




1

http://www.zeromq.org/

2

http://activemq.apache.org/

3

http://www.rabbitmq.com/

4

http://openjms.sourceforge.net/

5

http://qpid.apache.org/






D2.2


©
FIRST

consortium


Page
23

of
53


Figure
8
: Messaging systems performance comparison
results

The experiment results showed that ZeroMQ performed better than the other messaging
platforms. ZeroMQ performs better since it is brokerless based messaging system. It requires
less network communication. As a broker based messaging system, RabbitM
Q performed better
than the other systems. Based on the experiment results ZeroMQ is decided to be used as a
messaging platform for the FIRST pipeline integra
tion because of its performance and

numerous

bindings for most programming
environments
.

ZeroMQ (Ø
MQ) is a high performance asynchronous messaging library, in which
communication between applications can be established without much effort. ZeroMQ offers
performance, simplicity and scalability without a dedicated message broker
(
Piël, 2010)
.

The library provides sockets to connect applications and exchange information. ZeroMQ
provides several communication patterns which are:



Request
-
reply connects a set of applications. Firstly, message consumer (client) request
for a messag
e transfer and message producer (server) sends a message. Request
-
reply
pattern supports multiple message producers and consumers. There is a load balancing
between message producers and consumers and each message is consumed one time.
The advantage of thi
s approach is the synchronization between producer and consumer.
Each message is consumed one by one. In contrast, two messages are sent for each
packet of the message producer, which increases network traffic and decreases the
performance.



Publish
-
subscri
be connects a set of publishers to a set of subscribers. Publishers
publish messages and subscribers receive the message. Each message is delivered to
all subscribers.



Push
-
pull (pipeline) connects applications in a pipeline pattern. Message producer
pushe
s messages to the pipe without request from the message consumer. The
messages are kept in a queue until they are consumed by the message receiver. The
pipeline approach performs faster than the request and replay approach. However, the
high performance co
mes with the
risks of
queue overflow and synchronization problems.



Exclusive pair connects two applications in an exclusive way. Both application can send
and consume messages.

0
0.5
1
1.5
2
2.5
3
3.5
RabbitMq
ActiveMq
OpenJMS
ZeroMq
Qpid
Throughput ( MB / second)






D2.2


©
FIRST

consortium


Page
24

of
53


Figure
9
: ZeroMQ messaging patterns
(Piël, 2010)


Messaging system would
be
responsible for exchange of information from WP3 to WP4 in an
efficient, stable and reliable way. Thus the pipeline and request
-
reply message patterns are
much suitable for our needs. To observ
e perform
ances and suitability

of these patter
n
s
experiments are done on a test data
set

collected by the WP3. The test data set composes of
2,378 files with total size of 5.26 GB. Each file is transferred as a one message from one
application to another using the b
oth message patterns.


0
0.5
1
1.5
2
2.5
3
3.5
ZeroMq messaging patterns
ZeroMq Pipeline
ZeroMq Request-Replay
Throughput ( MB / second )






D2.2


©
FIRST

consortium


Page
25

of
53

Figure
10
: ZeroMQ messaging patterns performance comparisons.

In the experiments, we observed that the pipeline approach

(data push)

performs approximately
three tim
es better than the request
-
repl
y pattern

(
Figure
10
)

due to lack of extra communication
overhead
related to data request sent every time by client.

Instead, the data is pushed to the
client once it is availa
ble. For these reasons,

pipeline pattern is selected for transferring
messages between
the pipeline components (WP4 and WP6)
.
Figure
11

shows the architecture
of the p
ipeline between the systems.


Figure
11
: Data Acquisition and Information Extraction pipeline integration architecture

4.3.

GUI integration

4.3.1

Introduction

A
common front
-
end Graphical User Interface (GUI) is required to present the
Integrated
Financial Market Information System to the end users. This section discusses the technical
requirements of the GUI and presents state of the art technologies that fit the requirements.

There are two types of applications that are
commonly used

t
o implement the integrated GUI:
Client/server application and web application. Client/server application follows two
-
tier
architecture, which runs on the client side and access information on the remote s
erver. Web
application follows
n
-
tier architecture,
which is accessed via web browsers and all application
logic is handled on the server side

(see
Figure
12
)
. Web applications are generally designed
with
3
-
tier archite
cture:



Presentation tier:
Front
-
end is the content rendered by the browser.



Application tier:
Controls applications functionality.



Data tier:
Stores and retrieves information.


Figure
12
: Client/server application and web applicat
ion architectures
(Howitz, 2010)







D2.2


©
FIRST

consortium


Page
26

of
53

In a Client/server based application, application must be installed on each client’s computer. To
avoid the burden in deploying in each user machine and maintaining them, the integrated GUI
wou
ld be implemented as a web application.
The database tier in case of FIRST system is
provided in a form of Data Storage APIs, provided by
FIRST high
-
level
services layer.

4.3.2

Web Application Frameworks

Web application frameworks (WAF) are mostly used for developing web applications for their
benefits (i.e. simplicity, consistency, efficiency, reusability). There are large amount of available
programming language specific web application frameworks. For
platform independency and
performance, the integrated GUI will be implemented in Java language and for this reason we
will
focus

on Java ba
sed web application
frameworks. From

the client perspective, only a web
browser should be needed.

Java web applicatio
n frameworks can be categorized into five categories
(Shan & Hua, 2006)
:



Request
-
based Framework

uses controllers and actions, which handles incoming
request from users. User session is kept on the server side in this type of f
rameworks.
Struts
1
, WebWork
2
and Stripes
3
are examples of Request
-
based Framework.



Component
-
based Framework
abstracts the request handling mechanism and
encapsulates the logic into reusable components. Events are triggered automatically for
handling incoming requests from users. Development model of this type of framework is
similar to the desktop GUI framework
models. JSF
4
and Apache Wicket
5
are examples
of Component
-
based Framework, which are widely used for web application
development.



Hybrid Framework

is
a
combination of both request
-
based and component
-
based
frameworks. The entire data and logic flow of
components are handled as in a request
-
based model. RIFE
6
, a full
-
stack web application framework, falls into this category.



Meta Framework
provides set of core interfaces for integrating components and
services. Meta framework can be considered as a frame
work of frameworks. Spring
7
and Keel
8
are examples of Meta Framework.



Rich Internet Framework
uses client
-
side container model in which requests are handled
on the client side. Therefore, the amount of server communication and load decreases.
Google Web To
olkit
9

(GWT) and Echo2
10

are popular Rich Internet frameworks.

The main purpose of using software frameworks is reducing the amount of time, effort, and
resources required to develop and maintain web applications. Performance of the framework is
also very

important factor when choosing the web application framework. From this per
sp
ective,
popular frameworks are analyzed: GWT, JSF and Spring MVC.

GWT is a component based Rich Internet Framework
that allows

web developers

to create and
maintain complex

JavaS
cript

front
-
end

applications. GWT allows you to write AJAX applications
in Java and then compile the source to highly optimized JavaScript that runs across all



1

http://struts.apache.org/

2

http://www.opensymphony.com/webwork/

3

http://www.stripesframework.org/display/stripes/Home

4

http://javaserverfaces.java.net/

5

http://wicket.apache.org/

6

http://rifers.org/

7

http://www.springsource.org/

8

http://www.keelframework.org/

9

http://code.google.com/webtoolkit/

10

http://e
cho.nextapp.com/site/echo2






D2.2


©
FIRST

consortium


Page
27

of
53

browsers, including mobile browsers for Android and the iPhone. Advantages and
disadvantages of
GWT are listed below:

Advantages

Disadvantages

-

Simplicity

o

No need to learn/use JavaScript
language (Use a reliable, strongly
-
typed language (Java) for
development and debugging)

o

Leverage various tools of Java
programming language for
writing/debugging/tes
ting

-

Performance

o

Generates optimised JavaScript
code

o

Can use complex Java on the
client

-

Scalability

o

Stateful client,
s
tateless server

-

AJAX support

-

Compatibility

o

No need to handle browser
incompatibilities and quirks

-

Steep learning curve

-

Heavy dependence
on Javascript

o

Results in the client web browser
applications that consume much of
the memory

-

Not search engine friendly

-

Need more components

o

GWT doesn't come out of the box
with all the possible Widgets; there
is a need to use extra components.


Table
3
: GWT advantages and disadvantages

Java Server Faces (JSF) is a component oriented and event driven framework based on the
Model View Control (MVC) pattern. View layer is separated from controller and model. Event
driven
User Interfac
e (
UI
)

components are provided by the JSF API. The UI components and
their state are represented on the server with a defined life
-
cycle of the UI components.
Advantages and disadvantages of
JSF

are listed below:

Advantages

Disadvantages

-

Simplicity

o

Easy
to learn for existing Java web
developers

o

Enables the use of IDEs for Rapid
Application Development
(NetBeans, Jdeveloper, Eclipse,
etc)

o

Follow MVC design pattern

-

Compatibility

o

No need to handle browser
incompatibilities and quirks


-

Performance

o

Every butto
n or link clicked results
in a form post, which
might in a
bad user experience from the end
user point of view
.

-

Scalability

o

States of the components are
s
tored in session objects and it
provides difficulties to run in
distributed mode
.


Table
4
: JSF advantages and disadvantages






D2.2


©
FIRST

consortium


Page
28

of
53

Spring MVC is the request
-
based framework of Spring Framework for developing web
applications. The framework defines strategy interfaces for all of the responsibilities, which are
tightly coupled to

the Servlet API. The following are Spring MVC advantages and
disadvantages:


Advantages

Disadvantages

o

Simplicity



E慳a t漠oe獴



䙯llow MVC 摥獩g渠灡ntern



Cl敡湥r 捯ce

o

Integrates with many view options
seamlessly: JSP/JSTL, Tiles,
Velocity, FreeMarker, Exce
l, XSL,
PDF.


o

Configuration intensive

o

No common parent controller,
resulting in the need for handling
many issues individually

o

No built
-
in Ajax support



Table
5
: Spring MVC advantages and disadvantages

This is a comparison of
notable Java web application frameworks

that compares features of the
frameworks. The frameworks are rated between 0
-
1 and rating logic for the features are
described in
(Raible, JVM Web Frameworks Rating Logic, 2010)
.
Table
6

shows the
comparison results. Spring MVC and GWT are the highly rated frameworks based on the
evaluations of the aut
hor (the higher note means better).

Criteri
a

Struts 2

Spring
MVC

Wicket

JSF 2

Tapestry

Stripes

GWT

Vaadin

Developer Productivity

0.5

0.5

0.5

0.5

1.0

0.5

1.0

1.0

Developer Perception

0.5

1.0

1.0

0.0

0.5

1.0

1.0

1.0

Learning Curve

1.0

1.0

0.5

0.5

0.5

1.0

1.0

1.0

Project Health

0.5

1.0

1.0

1.0

0.5

0.5

1.0

1.0

Developer Availability

0.5

1.0

0.5

1.0

1.0

0.5

1.0

0.5

Job Trends

1.0

1.0

0.5

1.0

0.5

0.0

1.0

0.0

Templating

1.0

1.0

1.0

0.5

1.0

1.0

0.5

0.5

Components

0.0

0.0

1.0

1.0

1.0

0.0

0.5

1.0

Ajax

0.5

1.0

0.5

0.5

0.5

0.5

1.0

1.0

Plugins or
Add
-
Ons

0.5

0.0

1.0

1.0

0.5

0.0

1.0

1.0

Scalability

1.0

1.0

0.5

0.5

0.5

1.0

1.0

0.5

Testing

1.0

1.0

0.5

0.5

1.0

1.0

0.5

0.5

i18n and l10n

1.0

1.0

1.0

0.5

1.0

1.0

1.0

1.0

Validation

1.0

1.0

1.0

0.5

1.0

1.0

1.0

1.0

Multi
-
language Support
(Groovy /
Scala)

0.5

0.5

1.0

1.0

1.0

1.0

0.0

1.0

Quality of
Documentation/Tutorials

0.5

1.0

0.5

0.5

0.5

1.0

1.0

1.0

Books Published

1.0

1.0

0.5

1.0

0.5

0.5

1.0

0.5

REST Support (client and
server)

0.5

1.0

0.5

0.0

0.5

0.5

0.5

0.5

Mobile / iPhone Support

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0






D2.2


©
FIRST

consortium


Page
29

of
53

Criteri
a

Struts 2

Spring
MVC

Wicket

JSF 2

Tapestry

Stripes

GWT

Vaadin

Degree of Risk

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.5

Totals

14,5

17

15

13,5

15

14

17

15,5


Table
6
: A comparison of notable Java web application frameworks
(Raible, Comparing JVM
Web
Frameworks, 2010)

Performance and scalability are very important factors for success of the First project. For this
reason, GWT is selected as web application framework for developing the Integrated GUI after
analyzing advantages and disadvantages of
the application frameworks. GWT is the most
promising solution with its simplicity, scalability and performance.

GWT enables development of web applications without writing JavaScript code on the client
side. In some special cases (integration with a non G
WT application), it may require to develop
custom JavaScript code. JQuery
1
,
the most popular JavaScript library
, would be used for this
purpose. J
Query is a cross
-
browser JavaScript library designed to simplify the client
-
side
scripting
. There is plug
-
in c
alled
GwtQuery
, which can be used like JQuery within GWT
framework.

4.4.

Data storage design principles

The design of the knowledge base is conducted with the following prerequisites in mind:



Choice of paradigm(s) for physical storage system



Provide stable
access interface to hide underlying complexity



Encapsulate business logic in mediation layer

4.4.1

Storage paradigm
s

The most crucial and fundamental decision regarding data storage is to choose the paradigm of
the physical storage system. Besides the prevalent
relational database systems there is a
variety of alternatives, each with its characteristic advantages and disadvantages, which are
commonly referred to under the umbrella term NoSQL
2
. Many of these non
-
relational storage
alternatives provide better perfo
rmance, but this comes in most cases at the cost of a relaxation
of the ACID guarantees, which refers to the atomicity of transactions, consistency of the data
store before and after a transaction, isolation of transaction as well as the durability of
comp
leted transactions. As already
(Härder & Reuter, 1983, p. 291)

noted
“These four
properties, atomicity, consistency, isolation, and durability (ACID), describe the major highlights
of the transaction paradigm, which ha
s influenced many aspects of development in database
systems.”

This was already true nearly three decades ago when the statement was made, but
also was the main guideline for advancements in database technology made since then.


Not until the advent of int
ernet technology and strongly increasing read and write accesses to
(potentially distributed) database systems, it was questioned whether this underlying paradigm
fits best for all use cases. Though, as
(Brewer, 2000)

pointed o
ut in his CAP
-
Theorem, only two
of the three desirable characteristics consistency, availability and partition