Data center networks

possehastyMechanics

Nov 5, 2013 (3 years and 11 months ago)

57 views

Data center networks



Enterprise networks .vs. data center networks
(Albert Greenberg, et.al, “The Cost of a Cloud: Research
Problems in Data Center Networks,” ACM CCR, Jan
2009).



VL2: A Scalable and flexible data center network



Portland: A scalable fault
-
tolerant layer 2 data center
network fabric.


Cost of a data center





Dominated by the Servers



maximize utilization is important!



Network cost significant

Data center
vs

Enterprise


Enterprise: IT cost dominates


Human to servers ratio: 1:100


Automation is partial, configuration, and
monitoring not fully automated


Data center: other costs


Human to server ratio: 1:1000


Automation is mandatory

Data center
vs

Enterprise


Enterprise: scale up


Limited shared resources


Scale up: a few high priced servers


Cost borne by the enterprise


Utilization is not critical


Data center: scale out


100,000 servers


Distributed workload, spread out a number of commodity
servers.


High upfront cost amortized over time and use


Pay per use for customers


Utilization is very important.

VL2: A Scalable and Flexible Data
Center Network


Ideas:



One big IP subnet



Avoid broadcast with directory service
(remember SEATTLE).


Clos

topology


Link state routing on switch topology (
RBridge
)


Load balancing with randomization (Valiant load
balancing).

Issues for the current DC networks

Cisco, “Data center: Load balancing data center services,” 2004.

Issues for the current DC networks



Not enough capacity between servers:



1:5 oversubscribed for servers in different branches



1:80 to 1:240 oversubscribed for traffic on the highest
level of the tree.

Issues for the current DC networks



No service isolation



Resource fragmentation with VLAN



reduced cloud utilization



Complex L2/L3 configuration.

Data center traffic/churn analysis


Data
-
Center traffic analysis:


Traffic volume between servers to entering/leaving data center is 4:1


Computation is focused where high speed access to data on memory
or disk is fast and cheap.


Demand for bandwidth between servers growing faster


Network is the bottleneck of computation



Flow distribution analysis:


Majority of flows are small, big flow size is 100MB
-
1GB, > XGB flows
are rare


50% of time, each machine has 10 flows. 5% >80 flows.


Adaptive routing may not react fast enough


VLB at flow level should work well

Data center traffic/churn analysis


Traffic matrix analysis:


Poor summarizing of traffic patterns


Instability of traffic patterns


Unpredictable pattern



Failure characteristics:


Failures are mostly small in size (50%, < 4 devices, 95%, < 20 devices)


Downtimes can be significant: 95% < 1min, 98% < 1hr, 99.6% < 1 day,
0.09% > 10 days


With 1:1
redundance
, 0.3% of failures in all redundant components.


Use n:m redundancy

VL2 design objectives


Uniform high capacity among all servers


Maximum rate of server to server traffic flow should be limited only by
capacity on network cards


Assigning servers to service should be independent of network topology



Performance isolation:


Traffic of one service should not be affected by traffic of other services


All nodes in a service connected by virtual switch



Layer
-
2 semantics:


Easily assign any server to any service


VM keeps the same IP address even after migration


VL2 key techniques


Scale out topology: The folded
Clos

network
between aggregation and intermediate switches.


Link state routing at Layer 2.

VL2 key techniques


VL2 addressing and routing


Switches use location specific IP addresses (LA)


Applications use application
-
specific IP addresses (AA)


Use a directory to map AA to LA of TOR


ARP with a local agent


Routing table size: # of switches << # of servers


VL2 key techniques


Valiant Load Balancing


Every flow bounced off a random intermediate switch


Provably hotspot free for any admissible traffic.


Can be realized with ECMP

VL2 conclusion


VL2 achieves agility at scale via


L2 semantics


Uniform high capability between servers


Performance isolation between services


Lessons


Randomization can tame volatility


Add functionality where you have control

PortLand
: A Scalable Fault
-
Tolerant
Layer 2 Data Center Network Fabric


Motivation:


R1: Any VM may migrate to any physical machine without changing
their IP addresses


R2: An administrator should not need to configure any switch before
deployment


R3: Any end host should efficiently communicate with any other end
hosts through any available paths


R4: No forwarding loops


R5: Failure detection should be rapid and efficient


Implication:


A single Layer 2 for all data center (R1 & R2)


MAC forwarding table with hundred of thousands of entries (R3)


Needs to be optimized.


Efficient routing protocols adapting to topology changes (R5)



PortLand’s

approach


Use Fat tree topology (folded
Clos
, multi
-
root
tree) to provide high capacity


Design routing/forward specific for fat
-
tree
topology instead of for general topologies


Can achieve good effects without introducing
overheads.

PortLand

topology


Fat
-
tree (folded
Clos
)


Three layers: edge, aggregation, and core


Split fat
-
tree into k pods, each having k^2/4 servers

Positional addressing

Positional addressing

Positional addressing

Positional addressing

Actual MAC address and Positional
Pseudo MAC address


A hierarchical pseudo MAC (PMAC) address
for each end hosts


Pod.position.port.vmid


Use a location discover protocol to discover
the position (and address of a host)


Use a centralized fabric manager to store all
mapping after location discovery


Host location and
identifyer

is separate.

Routing based on PMAC


Routing table size indexed by


Pod


Position


Port


Much smaller than the flat address routing

Proxy based ARP

Loop Free forwarding


Up
-
down routing


Guarantee no loops


Fault Tolerant


Have a fixed baseline topology


Fault tolerance is concerned about detecting faults


Once found, inform the fabric manager, which inform the effected switches.

PortLand

conclusion


A scalable, fault tolerant layer 2 routing and
forwarding scheme for DCN


Reducing the routing and forwarding
complexity by considering the specific fat
-
tree
topology


PMAC encode the location of end host


AMAC to PMAC mapping needed


Header rewriting.