Data center networks


Nov 5, 2013 (4 years and 5 months ago)


Data center networks

Enterprise networks .vs. data center networks
(Albert Greenberg,, “The Cost of a Cloud: Research
Problems in Data Center Networks,” ACM CCR, Jan

VL2: A Scalable and flexible data center network

Portland: A scalable fault
tolerant layer 2 data center
network fabric.

Cost of a data center

Dominated by the Servers

maximize utilization is important!

Network cost significant

Data center


Enterprise: IT cost dominates

Human to servers ratio: 1:100

Automation is partial, configuration, and
monitoring not fully automated

Data center: other costs

Human to server ratio: 1:1000

Automation is mandatory

Data center


Enterprise: scale up

Limited shared resources

Scale up: a few high priced servers

Cost borne by the enterprise

Utilization is not critical

Data center: scale out

100,000 servers

Distributed workload, spread out a number of commodity

High upfront cost amortized over time and use

Pay per use for customers

Utilization is very important.

VL2: A Scalable and Flexible Data
Center Network


One big IP subnet

Avoid broadcast with directory service
(remember SEATTLE).



Link state routing on switch topology (

Load balancing with randomization (Valiant load

Issues for the current DC networks

Cisco, “Data center: Load balancing data center services,” 2004.

Issues for the current DC networks

Not enough capacity between servers:

1:5 oversubscribed for servers in different branches

1:80 to 1:240 oversubscribed for traffic on the highest
level of the tree.

Issues for the current DC networks

No service isolation

Resource fragmentation with VLAN

reduced cloud utilization

Complex L2/L3 configuration.

Data center traffic/churn analysis

Center traffic analysis:

Traffic volume between servers to entering/leaving data center is 4:1

Computation is focused where high speed access to data on memory
or disk is fast and cheap.

Demand for bandwidth between servers growing faster

Network is the bottleneck of computation

Flow distribution analysis:

Majority of flows are small, big flow size is 100MB
1GB, > XGB flows
are rare

50% of time, each machine has 10 flows. 5% >80 flows.

Adaptive routing may not react fast enough

VLB at flow level should work well

Data center traffic/churn analysis

Traffic matrix analysis:

Poor summarizing of traffic patterns

Instability of traffic patterns

Unpredictable pattern

Failure characteristics:

Failures are mostly small in size (50%, < 4 devices, 95%, < 20 devices)

Downtimes can be significant: 95% < 1min, 98% < 1hr, 99.6% < 1 day,
0.09% > 10 days

With 1:1
, 0.3% of failures in all redundant components.

Use n:m redundancy

VL2 design objectives

Uniform high capacity among all servers

Maximum rate of server to server traffic flow should be limited only by
capacity on network cards

Assigning servers to service should be independent of network topology

Performance isolation:

Traffic of one service should not be affected by traffic of other services

All nodes in a service connected by virtual switch

2 semantics:

Easily assign any server to any service

VM keeps the same IP address even after migration

VL2 key techniques

Scale out topology: The folded

between aggregation and intermediate switches.

Link state routing at Layer 2.

VL2 key techniques

VL2 addressing and routing

Switches use location specific IP addresses (LA)

Applications use application
specific IP addresses (AA)

Use a directory to map AA to LA of TOR

ARP with a local agent

Routing table size: # of switches << # of servers

VL2 key techniques

Valiant Load Balancing

Every flow bounced off a random intermediate switch

Provably hotspot free for any admissible traffic.

Can be realized with ECMP

VL2 conclusion

VL2 achieves agility at scale via

L2 semantics

Uniform high capability between servers

Performance isolation between services


Randomization can tame volatility

Add functionality where you have control

: A Scalable Fault
Layer 2 Data Center Network Fabric


R1: Any VM may migrate to any physical machine without changing
their IP addresses

R2: An administrator should not need to configure any switch before

R3: Any end host should efficiently communicate with any other end
hosts through any available paths

R4: No forwarding loops

R5: Failure detection should be rapid and efficient


A single Layer 2 for all data center (R1 & R2)

MAC forwarding table with hundred of thousands of entries (R3)

Needs to be optimized.

Efficient routing protocols adapting to topology changes (R5)



Use Fat tree topology (folded
, multi
tree) to provide high capacity

Design routing/forward specific for fat
topology instead of for general topologies

Can achieve good effects without introducing



tree (folded

Three layers: edge, aggregation, and core

Split fat
tree into k pods, each having k^2/4 servers

Positional addressing

Positional addressing

Positional addressing

Positional addressing

Actual MAC address and Positional
Pseudo MAC address

A hierarchical pseudo MAC (PMAC) address
for each end hosts


Use a location discover protocol to discover
the position (and address of a host)

Use a centralized fabric manager to store all
mapping after location discovery

Host location and

is separate.

Routing based on PMAC

Routing table size indexed by




Much smaller than the flat address routing

Proxy based ARP

Loop Free forwarding

down routing

Guarantee no loops

Fault Tolerant

Have a fixed baseline topology

Fault tolerance is concerned about detecting faults

Once found, inform the fabric manager, which inform the effected switches.



A scalable, fault tolerant layer 2 routing and
forwarding scheme for DCN

Reducing the routing and forwarding
complexity by considering the specific fat

PMAC encode the location of end host

AMAC to PMAC mapping needed

Header rewriting.