Content-Based Event Routing

hedgebornabaloneSoftware and s/w Development

Dec 2, 2013 (3 years and 9 months ago)

53 views

XML Data Binding:

Encoding for High
-
Performance
Content
-
Based Event Routing

Gail Kaiser

Phil Gross

Columbia University

Programming Systems Lab

Overview


PSL Intro


MEET Project


Encoding Conversion Efficiency


Encoding Size Efficiency


Encoding Classification Efficiency

Programming Systems Lab


“PSL conducts research on Web
technologies, collaborative work, virtual
worlds, process/workflow, extended
transaction models, software development
environments and tools, software
engineering, information management, and
distributed programming systems”


Lately, lots of XML stuff

PSL XML
-
related Research


FlexML: Flexible XML


Open
-
ended XML streams that may include “new” tags


Dynamic schema and semantics discovery and composition


XUES: XML
-
based Universal Event Service


Event Packager: Data mining over XML structured data


Event Distiller: XML event poset pattern matching


Learning

new application
-
domain events to recognize


DISCUS: Decentralized Information Spaces for
Composition and Unification of Services


Rapid and secure application composition using Web
Services


Trust Evolution: PGP Trust + KeyNote + real
-
world business

MEET


Multiply Extensible Event Transport


Content
-
based multicast routing


Must be efficient enough for embedded and
high
-
performance applications

MEET Motivations


Personal Life Recorder (sensor oriented)


GroupWork Recorder (computer/DB
oriented)


Parallel/Grid computing


Distributed simulation


Battlefield C4I


Last, but not least:


Dissertation submission

Relationship to Other Work


Generally modeling communication like




What actually goes over the line is
afterthought


But with N
-
Way Internet
-
scale
communication


Millions of publishers and subscribers



We can (must!) do better than ASCII text…


Line speed => ≈250 assembly instructions per
packet

Machine A

Relational

Machine B

XML

MEET Extensibility


Want to scale
up
, to millions of pubs and
subs


Want to scale
down
, to embedded and
wireless


No single solution satisfactory at all scales


Composed of hot
-
swappable subsystems


Router, transports, clock/causality,
types
, etc.

Why Types


Event data is not just an opaque bag of bits


Subscriptions are Boolean functions over
events


Type safety would be nice


What type system to use?

Initial MEET Type Design


Initial design calls for supporting Java, C#,
and XML Schema defined objects “out of the
box”


XML Schema used as Ur
-
language/Esperanto for conversions


Subscriptions are arbitrary boolean functions
on datatypes


XML Schema is not ideal ur
-
type


Excessively complex, verbose, etc.

Encodings for Efficiency


Java, C#, XML, ASN.1 have well
-
defined but
proprietary encodings for instances


Would be nice to have an independent
encoding scheme with some desirable
properties missing from the above


Fast serialization/deserialization


Elimination of redundant information from
message sequences


Data organized for rapid classification/routing

Conversion Efficiency


Need to get to and from wire format as fast
as possible


Leverage homogeneity to eliminate
unnecessary conversions, e.g., network byte
order


ECho system from Eisenhauer et. al.,
Georgia Tech


Using “native data” for ultra
-
low latency


Necessary for HPC

Size Efficiency


Ideal for single message is self
-
describing
data


With multiple messages of same type, one
can pull out redundant type info, e.g.,
schema


Goal is to go further: If 90% of content of
messages is the same, generate a new
subtype with fixed values


From self
-
describing to all
-
schema is a
continuum

Classification Efficiency


When bits start arriving serially at the router,
would like to begin cut
-
through routing as
soon as possible


Avoid the curse of IP/IPv6: source address first


Want key routing bits as close to the front as
possible


Want data in fixed locations

Fast Classifying: First Things
First


In the packet, type info first (after magic)


Would like to represent type codes as bit string
with “most significant” info e.g. parent type first,
followed by subtype identifier, sub
-
subtype, etc.


Need access to type hierarchy


Popular classification fields at the front


Need to tag with popularity metadata


“subscribers will want to select on me”

Fast Classifying: Fixed
Positions


Would like to avoid scanning through long or
variable
-
length fields


Long/Variable data needs to be in a separate
channel/section


Primitives and fixed
-
length references at the
front


References point into data section


Classifier can jump large, uninteresting data
quickly

Plus: Schema Format


We’d like the schema format to be amenable
to programmatic manipulation and analysis


For instance, when negotiating formats, we’d
like to be able to compute how our original
format offer differs from the counter
-
offer


XML Schema is pretty good for this

Conclusions


Efficient instance transfer is an interesting
case for data
-
binding


Special needs for efficiency


But we can negotiate our own format among
the communicating parties


Some explicit support for this in a general
data
-
binding solution could help acceptance