REDUNDANCY IN NETWORK TRAFFIC:

enginestagΔίκτυα και Επικοινωνίες

26 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

87 εμφανίσεις

REDUNDANCY IN NETWORK TRAFFIC:
FINDINGS AND IMPLICATIONS






Ashok Anand




Ramachandran

Ramjee


Chitra

Muthukrishnan



Microsoft Research Lab, India


Aditya

Akella


University of Wisconsin, Madison





Redundancy in network traffic


Redundancy in network traffic


Popular objects, partial content matches, headers


Redundancy elimination (RE) for improving network
efficiency


Application layer object caching


Web proxy caches


Recent protocol independent RE approaches


WAN optimizers, De
-
duplication, WAN Backups, etc.

2

Protocol independent RE

3


Message granularity: packet or object chunk


Different RE systems operate at different
granularity

WAN link

RE applications

4


Enterprise and data centers


Accelerate WAN performance



As a primitive in network architecture


Packet Caches [
Sigcomm

2008]


Ditto [
Mobicom

2008]



ISP

Protocol independent RE in enterprises






Enterprises

Wan

Opt

Wan

Opt

Data

centers


Globalized enterprise dilemma


Centralized

servers


Simple management


Hit on performance


Distributed servers


Direct request to closest servers


Complex management



RE gives benefits of both worlds


Deployed in network middle
-
boxes


Accelerate WAN traffic while keeping
management simple



RE for accelerating WAN backup
applications



5



ISP

Recent proposals for protocol independent RE






Enterprises

Web content

University

RE deployment on ISP access
links to improve capacity


Reduce

load on
ISP access

links


Improve effective capacity



Packet caches [Sigcomm
2008]


RE on all routers



Ditto [Mobicom 2008]


Use RE on nodes

in wireless
mesh networks to improve
throughput

6

Understanding protocol independent RE systems


Currently little insight into these RE systems


How far are these RE techniques from optimal?


Are there other better schemes?


When is network RE most effective?


Do end
-
to
-
end RE approaches offer performance close to
network RE?


What fundamental redundancy patterns drive the design
and bound the effectiveness?



Important for effective design of current systems as well
as future architectures
e.g. Ditto, packet caches


7

Large scale trace
-
driven study


First comprehensive study


Traces from multiple vantage points


Focus on packet level redundancy elimination


Performance comparison of different RE algorithms


Average bandwidth savings


Bandwidth savings in peak and 95
th

percentile utilization


Impact on burstiness


Origins of redundancy


Intra
-
user vs. Inter
-
user


Different protocols


Patterns of redundancy


Distribution of match lengths


Hit distribution


Temporal locality of matches



8

Data sets


Enterprise packet traces
(3 TB) with payload


11 enterprises


Small (10
-
50 IPs)


Medium (50
-
100 IPs)


Large (100+ IPs)


2 weeks


Protocol composition


HTTP (20
-
55%)


Spring et al. (64%)


File sharing (25
-
70%)


Centralization of servers


UW Madison packet traces
(1.6 TB) with payload


10000 IPs; trace collected
at campus border router


Outgoing /24, web server
traffic


2 different periods of 2
days each


Protocol composition


Incoming, HTTP 60%


Outgoing, HTTP 36%


9

Evaluation methodology


Emulate memory
-
bound (500 MB
-

4GB) WAN optimizer


Entire cache resides in DRAM (packet
-
level RE)


Emulate only redundancy elimination


WAN optimizers do other optimizations also


Deployment across both ends of access links


Enterprise to data center


All traffic from University to one ISP


Replay packet trace



Compute bandwidth savings as (saved bytes/total bytes)


Includes packet headers in total bytes


Includes overhead of shim headers used for encoding


10

Large scale trace
-
driven study


Performance comparison of different RE algorithms



Origins of redundancy



Patterns of redundancy


Distribution of match lengths


Hit distribution

11

Redundancy elimination algorithms

Redundancy elimination
algorithms

Redundancy suppression
across different packets

(Use history)

Data compression only
within packets

(No history)

MODP (Spring et al.)

MAXP (new algorithm)


GZIP and other variants

12

MODP

Packet payload

Window


Rabin fingerprinting

Value sampling:
sample those fingerprints

whose value is 0 mod p







Fingerprint table











Packet store

Payload
-
1

Payload
-
2


Spring et al. [Sigcomm 2000]


Compute fingerprints


Lookup fingerprints in Fingerprint table


13

MAXP


MAXP

Choose fingerprints that are local maxima

( or minima) for
p

bytes region


Similar to MODP


Only selection criteria changes


MODP

Sample those fingerprints whose value is 0
mod p

No fingerprint to represent the shaded
region

Gives uniform selection of fingerprints

14

Optimal


Approximate upper bound on optimal


Store every fingerprint in a bloom filter


Identify fingerprint match if bloom filter contains the
fingerprint



Low false positive for bloom filter: 0.1%

15

Comparison of MODP, MAXP and optimal


MAXP outperforms MODP by 5
-
10% in most cases


Uniform sampling approach of MAXP


MODP loses due to non uniform clustering of fingerprints


New RE algorithm which performs better than classical MODP

30
40
50
60
70
4
8
16
32
64
128
Bandwidth savings(%)

Fingerprint sampling period(p)

MODP
MAXP
Optimal
16

0
10
20
30
40
50
60
70
Small
Medium
Large
Univ/24
Univ-out
Bandwidth savings(%)

GZIP
(10 ms)->GZIP
MAXP
MAXP->(10ms)->GZIP
Comparison of different RE algorithms


GZIP offers 3
-
15% benefit


(10ms buffering)
-
> GZIP increases benefit up to 5%


MAXP significantly outperforms GZIP,
offers 15
-
60% bandwidth savings


MAXP
-
> (10 ms)
-
> GZIP further enhances benefit up to 8%


We can use combination of RE algorithms to enhance the bandwidth savings

17

-
> means
followed by

Large scale trace
-
driven study


Performance study of different RE algorithms



Origins of redundancy



Patterns of redundancy


Distribution of match lengths


Match distribution

18

Origins of redundancy










Enterprise

Middlebox






Data Centers

Middlebox

Flow
-
1

Flow
-
2

Flow
-
3

Flow
-
1

Flow
-
2

Flow
-
3


Different users accessing the same content, or same content being
accessed repeatedly by same user?


Middle
-
box deployments can eliminate bytes shared across users


How much sharing across users in practice?

INTER
-
USER: sharing across users

(a)
INTER
-
SRC

(b)
INTER
-
DEST

(c)
INTER
-
NODE

INTRA
-
USER: redundancy
within same user


(a) INTRA
-
FLOW


(b) INTER
-
FLOW

19

Study of composition of redundancy


90% savings is across
destinations for
Uout
/24



For
Uin
/
Uout
, 30
-
40%
savings is due to intra
-
user



For enterprises, 75
-
90%
savings is due to intra
-
user

0
10
20
30
40
50
60
70
80
90
100
UOut/24
UOut
UIn
Large
Medium
Small
Contribution to Savings (%)

intersrc
internode
interdst
interflow
intraflow
Inter User

Intra User

20

Implication: End
-
to
-
end RE as a
promising alternative










Enterprise

Middlebox






Data Centers

Middlebox

21


End
-
to
-
end RE as a compelling design choice


Similar savings


Deployment requires just software upgrade


Middle
-
boxes are expensive


Middle
-
boxes may violate end
-
to
-
end semantics

Large scale trace
-
driven study


Performance study of different RE algorithms



End
-
to
-
end RE versus network RE



Patterns of redundancy


Distribution of match lengths


Hit distribution

22

Match length analysis


Do most of the savings come from full packet
matches?


Simple technique of indexing full packet will be good



For partial packet matches, what should be the
minimum window size?


23

Match length analysis for enterprise


70% of the matches are less than 150 bytes and contribute 20% of savings


10% of the matches come from full matches and contribute 50% of savings


Need to index small chunks of size <= 150 bytes for maximum benefit

0
10
20
30
40
50
60
70
80
Match length distribution
Contribution to savings
24

Bins of different match lengths (in bytes)

Percentage

Hit distribution


Contributors of redundancy


Few pieces of content repeated multiple times


Small packet store would be sufficient



Many pieces of content repeated few times


Large packet store

25

Zipf
-
like distribution for chunk matches


Chunk ranking


Unique chunk matches
sorted by their hit counts



Straight line shows the
zip
-
fian

distribution



Similar to web page
access frequency


How much popular chunks
contribute to savings?

26

Savings due to hit distribution


80% of savings come
from 20% of chunks



Need to index 80% of
chunks for remaining
20% of savings



Diminishing return for
cache size

27

Savings vs. cache size


Small packet caches (250 MB) provide significant percentage of
savings


Diminishing returns for increasing packet cache size after 250 MB

0
5
10
15
20
25
30
35
40
45
0
300
600
900
1200
1500
Savings (%)

Cache
size
(MB)

Small
Medium
Large
28

Conclusion


First comprehensive study of protocol independent RE
systems


Key Results


15
-
60% savings using protocol independent RE


A new RE algorithm, which performs 5
-
10% better than
Spring et al. approach


Zip
-
fian

distribution of chunk hits; small caches are
sufficient to extract most of the redundancy


End
-
to
-
end RE solutions are promising alternatives to
memory
-
bound WAN optimizers for enterprises





29

Thank you!


Questions ?

30

Backup slides

31

Peak and 95
th

percentile savings

32

0
10
20
30
40
50
60
1
10
100
1000
10000
100000
Savings (%)

Time (seconds)

Mean
Median
95%tile
Peak
Effect on
burstiness

33


Wavelet based multi
-
resolution analysis


Energy plot


higher energy means more
burstiness


Compared with uniform compression


Results


Enterprise


No reduction in
burstiness


Peak savings lower than average savings


University


Reduction in
burstiness


Positive correlation of link utilization with redundancy

Redundancy across protocols

34


Large enterprise






University


Protocol

Percentage Volume

Percentage

redundancy

HTTP

16.8

29.5

SMB

45.46

21.4

LDAP

4.85

44.33

Src

code ctrl

17.96

50.32

Protocol

Percentage Volume

Percentage redundancy

HTTP

58

12.49

DNS

0.22

21.39

RTSP

3.38

2

FTP

0.04

16.93