Implementing Declarative Overlays

mewstennisΛογισμικό & κατασκευή λογ/κού

4 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

65 εμφανίσεις

Implementing Declarative
Overlays

Boon Thau Loo
1


Tyson Condie
1
, Joseph M. Hellerstein
1,2
,

Petros Maniatis
2
, Timothy Roscoe
2
, Ion Stoica
1

1
University of California at Berkeley,
2
Intel Research Berkeley

P
2
Overlays Everywhere…

Overlay networks are widely used today:


Routing and forwarding component of large
-
scale
distributed systems


Provide new functionality over existing
infrastructure

Many examples, variety of requirements:


Packet delivery: Multicast, RON


Content delivery: CDNs, P2P file sharing, DHTs


Enterprise systems: MS Exchange

Overlay networks are an integral part of many
large
-
scale distributed systems.

Problem

Non
-
trivial to design, build and deploy
an overlay correctly:


Iterative design process:


Desired properties


Distributed algorithms
and
protocols


Simulation


Implementation


Deployment


Repeat…


Each iteration takes significant time and
utilizes a variety of expertise

The Goal of P2

Make overlay development more accessible:


Focus on algorithms and protocol designs, not the
implementation

Tool for
rapid prototyping
of new overlays:


Specify overlay network at a high level


Automatically translate specification to protocol


Provide execution engine for protocol

Aim for “good enough” performance


Focus on accelerating the iterative design process


Can always hand
-
tune implementation later

Outline

Overview of P2

Architecture By Example


Data Model


Dataflow framework


Query Language

Chord

Additional Benefits


Overlay Introspection


Automatic Optimizations

Conclusion

Traditional Overlay Node

node
Network State
route
...
Traditional Overlay Node

node
Network State
route
...
Overlay
Program

Packets Out

Packets In

node
Local Tables
route
...
P2 Overlay Node

...
Network In Dataflow
...
...
Network Out Dataflow
Overlay description:

dataflow scripting language

Runtime dataflows
maintain network state

Overlay description:

declarative query language

Planner

P2 Query Processor

Packets Out

Packets In

Overlay
Program

Advantages of the P2 Approach

Declarative Query Language


Concise/high level expression


Statically checkable (termination, correctness)

Ease of modification

Unifying framework for introspection and
implementation

Automatic optimizations


Query and dataflow level

Data Model

Relational data: relational tables and
tuples

Two kinds of tables:


Stored, soft state:


E.g.
neighbor(Src,Dst), forward(Src,Dst,NxtHop)


Transient
streams:


Network messages:
message (Rcvr, Dst)


Local timer
-
based events:
periodic (NodeID,10)

...
Network In Dataflow
...
...
Network Out Dataflow
node
Local Tables
route
...
Dataflow framework

Dataflow graph


C++ dataflow elements

Similar to Click:


Flow elements (mux, demux, queues)


Network elements (cc, retry, rate limitation)

In addition:


Relational operators (joins, selections,
projections, aggregation)

...
Network In Dataflow
...
...
Network Out Dataflow
node
Local Tables
route
...
Outline

Overview of P2

Architecture By Example


Data Model


Dataflow framework


Query Language

Chord in P2

Additional Benefits


Overlay Introspection


Automatic Optimizations

Conclusion

Simple ring
routing
example

Example: Ring Routing

3
28
15
18
40
60
58
13
37
0
56
42
22
24
33
Each node has an
address and an
identifier

Each object
has an
identifier.

Every node
knows its
successor

Objects
“served” by
successor

3
28
15
18
40
60
58
13
37
Ring State

node(IP
40
,40)
succ(IP
40
,58,IP
58
)

node(IP
58
,58)
succ(IP
58
,60,IP
60
)

Stored tables:


node(NAddr, N)


succ(NAddr, Succ, SAddr)

Example: Ring lookup

3
28
15
18
40
60
58
13
37
0
56
42
22
24
33
Find the responsible node
for a given key k?


n.lookup(k)

if k in (n, n.successor)

return n.successor.addr

else

return n.successor. lookup(k)

lookup(IP
40
,IP
37
,
59
)

Ring Lookup Events

3
28
15
18
40
60
58
13
37
node(IP
40
,40)
succ(IP
40
,58,IP
58
)

Event streams:


lookup(Addr, Req, K)


response(Addr, K, Owner)

lookup(IP
37
,IP
37
,
59
)

response(IP
37
,
59,IP
60
)

lookup(IP
58
,IP
37
,
59
)

node(IP
58
,58)
succ(IP
58
,60,IP
60
)

n.lookup(k)

if k in (n, n.successor)


return n.successor.addr

else


return n.successor. lookup(k)

Strand
1
Strand
2
...
...
...
node
Local Tables
succ
...
Network In Dataflow
Network Out Dataflow
Pseudocode


Dataflow “Strands”

Pseudocode:

n.lookup(k)

if k in (n, n.successor)


return n.successor.addr

else


return n.successor. lookup(k)

Dataflow Strand

Event
Stream


Actions

Element
1

Element
2

Element
n

Event:
Incoming network messages, periodic timers



Strand
1
Strand
2
...
Network In Dataflow
...
...
Network Out Dataflow
node
Local Tables
succ
...
Strand Elements

Condition:
Process event using strand elements

Action:
Outgoing network messages, local table updates

Pseudocode


Strand 1

n.lookup(k)

if k in (n, n.successor)


return n.successor.addr

else


return n.successor.lookup(k)


RECEIVE

lookup(NAddr, Req, K)




Stored tables


node(NAddr, N)


succ(NAddr, Succ, SAddr)

Event streams


lookup(Addr, Req, K)


response(Addr, K, Owner)


node(NAddr, N) & succ(NAddr, Succ, SAddr)


& K in (N, Succ]


SEND response(Req, K, SAddr) to Req

Event:

Condition:

Action:

Strand
1
Strand
2
...
Network In Dataflow
...
...
Network Out Dataflow
node
Local Tables
succ
...
Pseudocode to Strand 1

Event:

RECEIVE lookup(NAddr, Req, K)

Condition:

node(NAddr, N) & succ(NAddr, Succ, SAddr)


& K in (N, Succ]

Action:

SEND response(Req, K, SAddr) to Req

Match

lookup.Addr
= node.Addr

Match

lookup.Addr
= succ.Addr

lookup

Filter

K in (N,Succ)

Format

Response(Req,
K,SAddr)

Response

n.lookup(k)

if k in (n, n.successor)

return n.successor.addr

else

return n.successor. lookup(k)

succ

node

Strand
1
Strand
2
...
Network In Dataflow
...
...
Network Out Dataflow
node
Local Tables
succ
...
Join

Join

Select

Project

Dataflow strand

Pseudocode to Strand 2

Event: RECEIVE lookup(NAddr, Req, K)

Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr)



Join

lookup.Addr
= node.Addr

Join

lookup.Addr
= succ.Addr

Select

K not in
(N,Succ)

& K not in (N, Succ]

lookup

Action:

SEND lookup(SAddr, Req, K) to SAddr

Project

lookup(SAddr,
Req,K)

lookup

n.lookup(k)

if k in (n, n.successor)

return n.successor.addr

else

return n.successor. lookup(k)

node

succ

Strand
1
Strand
2
...
Network In Dataflow
...
...
Network Out Dataflow
node
Local Tables
succ
...
Dataflow strand

Strand Execution

Strand
1
Strand
2
...
...
...
node
Local Tables
succ
...
Network In Dataflow
Network Out Dataflow
lookup

lookup/

response

lookup

lookup

lookup

response

Actual Chord Lookup Dataflow

L
1
Join
lookup
.
NI
==
node
.
NI
Join
lookup
.
NI
==
bestSucc
.
NI
TimedPullPush
0
Select
K in
(
N
,
S
]
Project
lookupRes
Materializations
Insert
Insert
Insert
L
3
TimedPullPush
0
Join
bestLookupDist
.
NI
==
node
.
NI
L
2
TimedPullPush
0
node
Demux
(
@
local
?
)
TimedPullPus
h
0
Network Out
Queue
remote
local
Network In
bestLookupDist
finger
bestSucc
bestSucc
lookup
Mux
TimedPullPus
h
0
Queue
Dup
finger
node
RoundRobin
Demux
(
tuple
name
)
Agg min
<
D
>
on finger
D
:
=
K
-
B
-
1
,
B
in
(
N
,
K
)
Agg min
<
BI
>
on finger
D
==
K
-
B
-
1
,
B in
(
N
,
K
)
Join
lookup
.
NI
==
node
.
NI
Query Language: Overlog

“SQL” equivalent for overlay networks

Based on Datalog:


Declarative recursive query language


Well
-
suited for querying properties of graphs


Well
-
studied in database literature


Static analysis, optimizations, etc

Extensions:


Data distribution, asynchronous messaging,
periodic timers and state modification

Query Language: Overlog

Datalog rule syntax:

<head>


<condition1>, <condition2>, … , <conditionN>.

Overlog rule syntax:

<Action>



<event>,

<condition1>, … , <conditionN>.

Query Language: Overlog

Event: RECEIVE lookup(NAddr, Req, K)

Condition: lookup(NAddr, Req, K) & node(NAddr, N)


& succ(NAddr, Succ, SAddr) & K in (N, Succ]

Action: SEND response(Req, K, SAddr) to Req

response@Req
(
Req, K, SAddr
)



lookup@NAddr(Naddr, Req, K),

node@NAddr(NAddr, N),
succ@NAddr(NAddr, Succ, SAddr),
K in (N,Succ].

Overlog rule syntax:

<Action>



<event>,

<condition1>, … , <conditionN>.

P2
-
Chord

Chord Routing, including:


Multiple successors


Stabilization


Optimized finger
maintenance


Failure recovery

47 OverLog rules

13 table definitions

Other examples:



Narada, flooding, routing
protocols

10 pt font

Performance Validation

Experimental Setup:


100 nodes on Emulab testbed


500 P2
-
Chord nodes

Main goals:


Validate expected network properties

Sanity Checks

Logarithmic diameter and state (“correct”)

BW
-
efficient: 300 bytes/s/node

Churn Performance

Metric: Consistency
[Rhea at al]

P2
-
Chord:


P2
-
Chord@64mins: 97% consistency


P2
-
Chord@16mins: 84% consistency


P2
-
Chord@8min: 42% consistency

Hand
-
crafted Chord:


MIT
-
Chord@47mins: 99.9% consistency


Outperforms P2 under higher churn

Not intended to replace a carefully hand
-
crafted Chord

Benefits of P2

Introspection with Queries

Automatic optimizations

Reconfigurable Transport (WIP)

Introspection with Queries

Unifying framework for debugging and
implementation


Same query language, same platform

Execution tracing/logging


Rule and dataflow level


Log entries stored as tuples and queried

Correctness invariants, regression tests as
queries:


“Is the Chord ring well formed?” (3 rules)


“What is the network diameter?” (5 rules)


“Is Chord routing consistent?” (11 rules)

With Atul Singh (Rice) and Peter Druschel (MPI)

Automatic Optimizations

Application of traditional Datalog optimizations to
network routing protocols (SIGCOMM 2005)

Multi
-
query sharing:


Common “subexpression” elimination


Caching and reuse of previously computed results


Opportunistically share message propagation across rules

Join

lookup.Addr
= node.Addr

Join

lookup.Addr
= succ.Addr

lookup

Select

K not in
(N,Succ)

Project

lookup(SAddr,
Req,K)

lookup

Project

Response(Req,
K,SAddr)

Select

K in
(N,Succ)

response

Join

lookup.Addr
= node.Addr

lookup

Join

lookup.Addr
= succ.Addr

Automatic Optimizations

Cost
-
based optimizations



Join ordering affects performance

Join

lookup.Addr
= node.Addr

Join

lookup.Addr
= succ.Addr

Select

K not in
(N,Succ)

lookup

Project

lookup(SAddr,R
eq,K)

lookup

Project

Response(Req,
K,SAddr)

Select

K in
(N,Succ)

response

Join

lookup.Addr
= node.Addr

Join

lookup.Addr
= succ.Addr

Open Questions

The role of rapid prototyping?

How good is “good enough”
performance for rapid prototypes?

When do developers move from rapid
prototypes to hand
-
crafted code?

Can we get achieve “production quality”
overlays from P2?

Future Work

“Right” language

Formal data and query semantics

Static analysis


Optimizations


Termination


Correctness


Conclusion

P2: Declarative Overlays


Tool for rapid prototyping new overlay
networks

Declarative Networks


Research agenda: Specify and construct
networks declaratively


Declarative Routing : Extensible Routing
with Declarative Queries (SIGCOMM 2005)

Thank You

http://p2.cs.berkeley.edu

P
2
Latency CDF for P2
-
Chord

Median and average latency around 1s.