System S - Streaming - World Federation of Exchanges

radiographerfictionΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

86 εμφανίσεις

© 2009 IBM Corporation

InfoSphere Streams


November 17’th 2009

InfoSphere Streams

© 2009 IBM Corporation

Streaming data is ubiquitous

Volume

Every day, 15 petabytes of
new information are being
generated. By 2010, the
codified information base of
the world is expected to
double every 11 hours.

Variety

80% of new data growth is
unstructured content,
generated largely by email,
with increasing contribution by
documents, images, and video
and audio

Velocity

An average company with 1,000
employees spends $5.3 million a
year to find information stored on
its servers. 42% of managers say
they use the wrong information at
least once per week
2

2

InfoSphere Streams

© 2009 IBM Corporation

Market

Data

Risk

Analytics

Data

Historical

Trade

Data

Analytics &


Insight

The Volume, Complexity & Semantic Depth of data that
Banks will require to be analysed will increase significantly

Market

Data

Risk

Analytics

Data

Video

News

Feeds

Corporate

Press

Reports

RSS

Feeds

Web

Pages

Weather

Data

Government

Statistics

Internal

Message

Bus

Blogs

&

Commentary

Historical

Trade

Data

Analytics &


Insight

Real
World
Sensors

Roadmap?

+ Other
Feeds

3

InfoSphere Streams

© 2009 IBM Corporation

Quick facts about InfoSphere Streams


Complete software environment for stream
processing: Includes programming language, runtime
and tools


Basic elements: Streams and Operators on Streams


Consume all manner of streaming digital data with a
programming model tailored to define and
manipulating streams


Edge adapters for interfacing with other systems


Operates on Linux clusters (from 1 to 125 nodes)


Complemented by IBM Research stream analytics
research & skills

4

InfoSphere Streams

© 2009 IBM Corporation

Low Latency Zone

Theo Price

Risk

results

market

data

Market data

trades

positions

Trader

Decision

Engines

Primary

Information

Position


Database

Stock and
Options
Exchanges

market

data

Feed
Handlers

In the bank

Example Application: algo
-
trading

5

InfoSphere Streams

© 2009 IBM Corporation

Quick look
-

Demo of low latency high ingest options
processing

6

InfoSphere Streams

© 2009 IBM Corporation

Parameterized application:
2 OPRA Ch


6 decision engines

2 Routers

for Ex 1


3 Collectors for

Ex 1,2, & 3

2

Ingesters

2 Routers

for Ex 3


2 Decision

Engines

for Ex 3


2 Decision

Engines

for Ex 1


7

InfoSphere Streams

© 2009 IBM Corporation

Demo: Low Latency End
-
to
-
End Exemplar

1,975 streams

2,133 streams

163 streams

24 channels

163 Decision Engines

356 Nodes

356 PEs

4,274 streams

Recorded
OPRA data

Data rate 15X recorded speed

Mean Latency 150 microseconds

Minimum Latency 50 microseconds

49 out of 65K with latency > 2 ms

8

InfoSphere Streams

© 2009 IBM Corporation

InfoSphere Streams


Streams


a one
-
way flow of data tuples
-

managed by the system



Operators


a programmer’s basic building blocks


Many built
-
in Operators: sink, source, join, split, aggregate…


User
-
defined operators for domain specific functions



Data flow graph


representation of how a particular application’s operators are
connected by streams



SPADE


declarative language and framework for writing stream
-
processing applications

<stream name> (stream schema) := <operator name> (input streams) {output assignments}


9

InfoSphere Streams

© 2009 IBM Corporation

Developer’s IDE

10

© Copyright IBM Corporation 2009

11

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

Sink

Source

Source

PE

PE

PE

PE

Sink

PE

Sink

11

© Copyright IBM Corporation 2009

12

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

SPADE

source


Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

12

© Copyright IBM Corporation 2009

13

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

13

© Copyright IBM Corporation 2009

14

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

Sink

Source

Source

PE

PE

PE

PE

Sink

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

14

© Copyright IBM Corporation 2009

15

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

Sink

Source

Source

PE

PE

PE

PE

Sink

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

15

© Copyright IBM Corporation 2009

16

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

Sink

Source

Source

PE

PE

PE

PE

Sink

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

16

SPADE

source


© Copyright IBM Corporation 2009

17

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

Sink

Source

Source

PE

PE

PE

PE

Sink

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

17

PE

Sink

© Copyright IBM Corporation 2009

18

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

Sink

Source

Source

PE

PE

PE

PE

Sink

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

18

PE

Sink

© Copyright IBM Corporation 2009

19

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

Sink

Source

Source

PE

PE

PE

PE

Sink

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

19

PE

Sink

© Copyright IBM Corporation 2009

20

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

PE

PE

Sink

Source

Source

PE

PE

PE

PE

Sink

20

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

© Copyright IBM Corporation 2009

21

InfoSphere Streams

© 2009 IBM Corporation

Lifecycle of streams applications

Blue Gene

Node

Blue Gene

Node

Blue Gene
Node

Blue Gene
Node

Blue Gene
Node

x86 blade

x86 blade

x86Blade

X86 Blade

x86 Blade

TCP
-
IP / Ethernet

Transport


Streams Data Fabric

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

Processing
Element
Container

21

PE

PE

PE

PE

PE

PE

PE

PE

Source

Sink

PE

SPADE

compiler

Job manager

InfoSphere Streams

© 2009 IBM Corporation

Strengths


Incremental development


start simple then embellish


Design and test independent of ultimate deployment


Create your topology declaratively with many supplied
operators


Customize with your own domain specific operators


Generation of high performance code


Intermediate results available to other applications


Visual presentation of deployed application


Thinking in terms of streams? Why not program
directly in streams?


22

InfoSphere Streams

© 2009 IBM Corporation

Low Latency Zone

Theo Price

Risk

results

market

data

Market data

trades

positions

Trader

Decision

Engines

Primary

Information

Position


Database

Stock and
Options
Exchanges

market

data

Feed
Handlers

In the bank

Example Application: algo
-
trading

23

InfoSphere Streams

© 2009 IBM Corporation

Data flow for OPRA data processing example

OPRAFAST
Data

Decode

Normalize

Want this
Symbol?

Msg.
Type

Quotes
from this
exchange
?

Y

N

NBBO

Sale

Y

N

All decision
engines for this
symbol

All decision
engines for this
symbol
-
exchange
combination

Discard

Quote

Result

InfoSphere Streams

© 2009 IBM Corporation

Different views for inspecting running applications

25

InfoSphere Streams

© 2009 IBM Corporation

Low latency high ingest options processing

26

InfoSphere Streams

© 2009 IBM Corporation

Questions & Discussion


Incremental development


start simple then embellish


Design and test independent of ultimate deployment


Declare your topology (access to a large library of
operators)


Extend your toolkit (your own new domain specific
operators)


Intermediate results available to other applications


Visual presentation of deployed application


Thinking in terms of streams


27

InfoSphere Streams

© 2009 IBM Corporation

InfoSphere Streams

Trend Calculator

Algo Parameters

Per Symbol

UTDF 1

playback

UTDF 2

playback

UTDF 3

playback

Up/down trend for

Requested symbols

Trend Calculator Example

28

Symbols to be


output

InfoSphere Streams

© 2009 IBM Corporation

UTDF 1

playback

UTDF 2

playback

UTDF 3

playback

Up/down trend for

Requested symbols

Symbols to be


output

Algo Parameters

Per Symbol

Trend Calculator Example

29

InfoSphere Streams

© 2009 IBM Corporation

Trend Calculator Example

30

InfoSphere Streams

© 2009 IBM Corporation

Trend Calculator
-

Inputs and outputs

31

Data sources

Parameters source

Parameters source

InfoSphere Streams

© 2009 IBM Corporation

All on one machine fused into one multi
-
threaded
process

32

InfoSphere Streams

© 2009 IBM Corporation

All on one machine; each operator in its own process

33

InfoSphere Streams

© 2009 IBM Corporation

Each operator in its own process; each process on its
own machine

34

InfoSphere Streams

© 2009 IBM Corporation

Code and development cycle walkthrough

35

InfoSphere Streams

© 2009 IBM Corporation

Accessing existing business function

36

Use your own single threaded code library in Streams


Advantages:


Run on many threads


Distribute across more hardware


Manage it in Streams


Re
-
use in many contexts without recoding it

InfoSphere Streams

© 2009 IBM Corporation

Incorporate legacy/complex business logic in your app

37



build a

C++ library

with

business logic





test the library

stand alone in

your standard

test harness




create a Spade

User defined

Operator

in your

Streams app




call business

logic in the

UDOP





run

Streams app

with

business logic





build Streams

application

with

the library



InfoSphere Streams

© 2009 IBM Corporation

Calling C or C++ code in your existing library

38

Input tuple arrives

Call to quant library

Construct


output tuple

Send output tuple

InfoSphere Streams

© 2009 IBM Corporation

SPADE message schema

39

InfoSphere Streams

© 2009 IBM Corporation

SPADE

Source operators

and

UTDF parsers

40

Source operator

UTDF parser

InfoSphere Streams

© 2009 IBM Corporation

SPADE

UDOP operator

and

Join operator

41

incorporate

algo library

filter on

symbol

InfoSphere Streams

© 2009 IBM Corporation

SPADE

Functor operators

and

Sink operators

42

eliminate

duplicates

filter on

inflection

InfoSphere Streams

© 2009 IBM Corporation

Deployment flexibility: for example 26 way split example

43

InfoSphere Streams

© 2009 IBM Corporation

Key points


Incremental development


start simple then embellish


Design and test independent of ultimate deployment


Declare your topology (access to a large library of
operators)


Extend your toolkit (your own new domain specific
operators)


Intermediate results available to other applications


Visual presentation of deployed application


Thinking in terms of streams


44