Performance Optimized Ethernet Switching - SmartData

hellhollowreadingNetworking and Communications

Oct 26, 2013 (4 years and 8 months ago)


Cajun White Paper #1 Performance Optimized Ethernet Switching 1

Performance Optimized
Ethernet Switching
Lucent Technologies’ Crossbar Switch Architecture:
Performance-Optimized Ethernet Switching
Crossbar switches offer substantial benefits in performance scaling when compared with
shared memory and shared bus architecture switches. This is because with a crossbar switch,
capacity scales linearly when more elements are added to the crossbar. Conversely, shared
memory/bus architectures become increasingly complex as the size and capacity of the
switch increases.
This paper explains the fundamental differences between shared memory/shared bus
technologies, and the crossbar switching technology. It also explains how Lucent
Technologies incorporates enhancements to the crossbar switch to overcome some of the
known limitations of crossbar technology, such as crossbar contention.
Finally, this paper provides detailed information concerning the key features of the P550
Switch’s core architecture, and how these features provide the P550 with the robustness and
scalability to perform in the backbone of even the busiest campus local area network (LAN).
Cajun White Paper #1 Performance Optimized Ethernet Switching 2

An ideal Ethernet switch would be one that moves packets from any network segment to
any other segment with zero time delay, irrespective of the number of segments or their
operational differences (such as, types of traffic, traffic loads, and traffic patterns). This ideal
switch should cost-effectively scale from a very small number of ports to the highest
imaginable number of ports without compromising any of the performance and
functionality that make it desirable in the first place.
The scalability of an Ethernet switch depends on its underlying architecture. Of the
architectures currently employed by the industry for increasing capacity – shared buses,
shared memory and crossbars – crossbar switches scale the best.
Typical bus-oriented LAN switches (see Figure 1) have maximum capacities in the range of 640
megabits per second (Mbps) to 2.4 gigabits per second (Gbps). When bus contention
overhead is taken into account, their aggregate capacity is even less.
Figure 1. Bus-Oriented Switch Example
Typical shared memory systems (see Figure 2) have aggregate capacities ranging from 4 Gbps to
around 10 Gbps. The major disadvantage of a shared memory architecture is the complexity
and cost that are added as you attempt to increase bandwidth. To gain greater capacity
requires either multiple shared memories, very wide memory access paths, and/or very
complex arbitration schemes.
Switching Bus
Cajun White Paper #1 Performance Optimized Ethernet Switching 3

Figure 2. Shared Memory Switch Example
Crossbar switches can be designed to easily scale to higher capacities because data is passed
through the switch using dedicated switching elements, as shown in Figure 3, rather than a
shared resource such as a shared bus or shared memory. Each connection from a crossbar
fabric input element to a crossbar fabric output element represents a dedicated path through
the switch. Therefore, adding more elements (i.e., links) to the crossbar provides a
corresponding linear increase in the crossbar switch’s bandwidth.
Figure 3. Crossbar Switch Example
Input Output
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
4 x 1 Gbps
12 Fabric ports @ 1.76
Cajun White Paper #1 Performance Optimized Ethernet Switching 4

However, during operation a crossbar output port (output link) may be busy, and a
condition called crossbar contention occurs. Because of this potential for contention at the
output connections, worst case output link utilization can fall as low as 60 percent of raw
capacity. Unless enhancements are made to the basic crossbar design, crossbar contention
can make the shared memory design seem more efficient than the crossbar switch.
After considering both shared memory and crossbar architectures, Lucent Technologies
decided that to address the growing needs for a highly-scalable Ethernet switch, the most
logical solution was to build a highly optimized crossbar switch architecture. This optimized
architecture solves the switch fabric contention problem in three ways:
1.It speeds up the switching matrix to the point where the worst-case 60 percent
efficiency of the crossbar is substantially higher than the maximum offered load.
2.It optimizes the basic crossbar for LAN-oriented broadcast and multicast traffic.
3.It employes a queuing strategy that keeps links unblocked – thereby enhancing the
potential of crossbars to achieve unlimited scalability. Intelligence in the queue
managers also provides advantages that can be more costly or difficult to provide with
the other architectures.
This approach results in tremendous bandwidth, virtually unlimited scalability, and unique
flexibility. These capabilities provide network managers with a consistent switching
environment that allows them to design and implement networks to meet not only today’s
needs, but the bandwidth demands of future networks as well.
Why Build a Gigabit Ethernet Backbone Switch?
Increasingly, organizations of all sizes are stressing the limited bandwidths of their networks.
Corporate intranets, sophisticated multimedia applications, and an increase in subnet traffic
traversing the corporate backbone have pushed the capacity, throughput, latency, quality of
service and manageability of existing LAN technologies beyond their limits.
At the same time traditional router architectures are limiting access to the campus backbone,
while the backbone technologies themselves, such as Fiber Distributed Data Interface (FDDI)
or Fast Ethernet, are already overwhelmed. With the wide availability of cheap 100 Mbps
Ethernet adapter cards, many LANs now have 100 Mbps Switched Ethernet ports feeding a
100 Mbps Fast Ethernet or FDDI backbone. As an analogy, imagine ten busy 4-lane
highways converging into another (even busier) four-lane highway! One guarantee is very
slow, unpredictable traffic. In data networks, the problem is even worse than this imaginary
So, how do you solve this problem? You can try to manage the traffic better on existing
roadways to get every ounce of capacity out of the existing infrastructure. However, this
adds complexity and carries a variety of additional expenses – mostly administrative. Lucent
Technologies includes sophisticated traffic management capabilities in its products, but we
believe that the most important solution to the bandwidth crunch is more fundamental:
Build bigger roads. If you can scale your imaginary “backbone” so that it’s large enough to
handle all of the traffic, bandwidth management becomes less important for solving the
bandwidth problem, and the added complexity is avoided.
Cajun White Paper #1 Performance Optimized Ethernet Switching 5

Why Does Architecture Matter?
With such a large number of vendors (both startup and established industry players) vying
for attention, the issue of a switch’s architecture may on the surface seem like just another
choice you have to make, comparable to choosing the type of box (e.g., standalone versus
multi-module chassis systems). However, the choice is not that simple.
The architecture decision not only affects the network’s performance today, but the
performance of future networks, as well as the ability of the network manager to deploy
network performance when and where it is needed. Ultimately, the best architecture is also
the one that can continue to meet your needs – not just for Gigabit bandwidth, but also for
maximum configuration flexibility, and a wide variety of features. For virtually all
organizations, the logical choice for high gigabit bandwidth is Lucent's crossbar architecture.
The remainder of this paper explains why Lucent’s crossbar architecture is inherently more
scalable than other switching architectures. We feel that this scalability makes our
architecture superior for building a high-capacity modular switch, particularly for the
network backbone where bandwidth needs will grow fastest, and will be needed first.
Why Shared Bus/Memory Architectures Are Less Scalable
Shared bus architectures are typically used in previous generation switches designed for
large numbers of 10 Mbps ports. These systems have internal bus capacities measured in
hundreds of Mbps, typically ranging from 640 Mbps to around 2.4 Gbps. Sometimes, shared
buses are used to interconnect shared memory switches over modular backplanes. Shared
bus systems can be the simplest and most cost-effective for limited implementations. In the
latest generation of Ethernet switches, shared buses are typically used at the low end for
inexpensive, oversubscribed 10/100 Mbps switches with from 8 to 24 ports.
Shared memory switch operation, shown in Figure 1, is based on providing a pool of buffers
shared among input and output ports. Incoming packets are written to the pool and
outgoing packets are read from it. The fact that a single logical memory is used is the greatest
advantage for shared memory switches. In smaller configurations a shared memory may
offer lower latency than shared bus or crossbar.
However, to run at wire speed the memory bandwidth has to be at least twice the sum of the
port speeds for all of the ports for the switch to run at full utilization. (Depending on how
memory is allocated, the bandwidth may actually have to be even higher to accommodate
for the differences between packet sizes and units of buffering.) Most shared memory
systems use a sophisticated memory controller to allocate memory for incoming packets, and
arbitrate access to determine which packets will be transmitted next. In addition to the
controller having to operate at a speed that’s twice the sum of the port packet rates, a CPU is
typically required to access the same memory pool – for filtering, address lookup and/or
routing operations – that is used for switching. This adds to the performance requirements of
the shared memory.
Cajun White Paper #1 Performance Optimized Ethernet Switching 6

The main disadvantage of a shared memory architecture is that its memory speeds limit its
ability to scale its bandwidth capacity. While shared memory is fine for a large number of
small ports or a small number of large ports, a shared memory interconnect is pushing its
limits when there are more than a modest number of gigabit ports. Further, depending on
the system’s design, the memory controller may be very sophisticated and expensive to
engineer to run at high speeds. The fast, wide memory and the controller required for a large
shared memory system would not be cost-effective as the bandwidth grows beyond the
limits of practical memory pool implementations.
The following example illustrates the point. Shared memory for a system with 8 x 1 Gbps
Ethernet ports could be built using 4-nanosecond memory that is 128 bits wide (which is too
expensive due to the required memory speed), or 32-nanosecond memory could be used that
is 1024 bits wide (which is very complex due to memory width). In both cases, very wide
memory buses are used that require complex implementations. These theoretical designs
would also have to process 48 million packets per second (24 million input operations and
24 million output operations, since each gigabit of traffic is 1.5 million packets).
A second disadvantage is that at some point, the traffic from the slowest speed port in a
shared memory/bus system must speed up enough to talk on a very high-speed bus. This
typically requires intermediate buffering, which further increases both the complexity and
the cost of the system.
The primary advantage of a shared memory interconnect is the low latency that results from
minimizing packet copying. Further, since the memory pool is RAM, the CPU or controller
has more flexibility to perform advanced queue management functions. For example, it can
sort incoming packets on the fly into queues using linked lists. There can even be a variable
number of queues, also using linked lists. For output, the controller implements the policies
that choose which queues and packets get serviced next. Since it is a potential bottleneck to
system performance, the challenge is to engineer the controller to run fast enough to be able
to manage memory, sort the packets, and evaluate service policies in the time available to it.
The end result is there is an identifiable limit to how “big” a shared memory and shared bus
type switches can be built. After examining the emerging traffic patterns in large networks,
we at Lucent Technologies decided that the scalability upper limits of shared memory/bus
systems were not sufficient to build an affordable and scalable backbone switch.
Crossbars – Simple Point to Point Connections
Crossbar switch fabric operation is in cycles, and provides a network of paths between ports.
With every cycle, the controller considers the traffic presented by the input side of the
switch, and makes a set of connections from the fabric’s inputs to its outputs. Unlike shared
buses or shared memory, a crossbar always connects input and output ports over a dedicated
link. This is analogous to a classic telephone switch that creates a hardwired connection
between callers (even on opposite sides of the planet), versus a classic computer network
which creates virtual circuits over shared pathways. A 13 x 13 crossbar fabric is built into
each 7-slot P550™ Cajun™ Switch chassis. The port assignments are shown in Figure 4.
Cajun White Paper #1 Performance Optimized Ethernet Switching 7

Figure 4. P550 Crossbar Switch Ports
Each of 13 input fabric ports has a dedicated path to each of 13 output fabric ports. Thus, the
crossbar switch fabric is like a mesh network. The mesh-like Cajun crossbar switch fabric
layout is shown in Figure 5.
Figure 5. Lucent P550 Crossbar Switch Fabric
Slot Port(s)
Slot 1 (CPU) 1 --
Slot 2 2 3
Slot 3 4 5
Slot 4 6 7
Slot 5 8 9
Slot 6 10 11
Slot 7 12 13
1 13
Cajun White Paper #1 Performance Optimized Ethernet Switching 8

The performance advantage of crossbars is: A dedicated link is the fastest way to communicate. No
sharing means nothing else gets in the way. The time it takes to set up the link is limited only
by the hardware's ability to read an output port address presented by the input port – which
can occur very, very quickly. The only limit on how fast a crossbar can pass information
relates to the physical medium that is used for the individual crossbar paths (that is, the
information can be passed at or near light speed). Furthermore, crossbars are inherently
very scalable – limited only by the number of links available. Add links and you can add
users, and still maintain peak performance. The process remains the same — read an
address, select a link and go. In the P550 Cajun Switch, the crossbar switch fabric is very
simple and very fast, capable of making up to 37 million connections per second. In contrast,
a shared memory architecture has additional performance requirements on the shared
memory due to the added tasks of filtering, address lookup, and routing. Expanded
examples of the shared memory and crossbar architectures are provided in Figures 6 and 7.
Figure 6. Shared Memory Architecture Example
Shared Memory
Switch Fabric
To I/O Card
To I/O Card
I/O Card
To I/O Card
To I/O Card
To I/O Card
To I/O Card
Resolution Engine
15 Gbps
Shared Memory
Switch Fabric
Switch Fabric
& CPU Card
per I/O Card
Cajun White Paper #1 Performance Optimized Ethernet Switching 9

Figure 7. Crossbar Switch Architecture Example
The only inherent drawback of the crossbar fabric architecture is that with random traffic
distributions and multiple inputs, there can be contention for any one of the output ports on
the crossbar fabric, as shown in Figure 8. Because of this potential for contention at the
output connections, worst case output link utilization can fall as low as 60 percent of the
raw aggregate capacity.
Figure 8. Output Fabric Port Contention
CrossBar Switch Fabric
To I/O Card
To I/O Card
I/O Card
To I/O Card
To I/O Card
To I/O Card
To I/O Card
Resolution Engine
45.76 Gb/sec.
Control Bus
Supervisor Module
Switch Fabric
& CPU Card
per I/O Card
4 3 2 1
4 2 3 1
Cajun White Paper #1 Performance Optimized Ethernet Switching 10

Enhancements must be made to the basic crossbar design to counter crossbar contention.
Otherwise, switch fabric contention could make shared memory switching systems seem
more efficient. The graph in Figure 9 illustrates the efficiency of a crossbar as the port count
increases from 2 up to 32 ports.
Figure 9. Graph of Crossbar Efficiency
However, this worst-case efficiency only translates to worst case throughput, if the entire
system operates at the speed of the inputs. If the crossbar is sufficiently undersubscribed, it
can compensate for the relative inefficiency of the contention process. An example of how
to calculate the amount of undersubscription required for 1 Gbps throughput for this
worst-case scenario is as follows:
This means that the minimum acceptable bandwidth through the crossbar for each 1 Gbps of
offered load should be 1.66 Gbps in order to sustain 1.0 Gbps inputs and outputs with no
blocking. Put another way, operating the switch fabric in excess of 1.66 Gbps guarantees
100% utilization of the fabric output link – regardless of any contention for the output port.
Output Utilization with no Speedup
Number of Ports
Cajun White Paper #1 Performance Optimized Ethernet Switching 11

Figure 10. P550 Cajun Switch Non-blocking Crossbar Architecture
The Lucent switching fabric operates at 1.76 Gbps per switch port (32 x 55MHz), which
eliminates any possible head-of-line blocking that might be caused by contention for the
switch fabric output ports.
Any remaining contention issues caused by uneven traffic distribution (such as a server
connection) are managed by buffering traffic in exact phase before and after the switch, as
shown in Figure 10. Input queues competing for the same output are serviced in a round-
robin fashion. While output contention exists, the queues can’t fall very far behind the fabric
if it runs faster than the inputs. Once contention ends, the queues quickly catch up.
Relative simplicity is another major advantage of the crossbar switch architecture. A crossbar
with the capacity to handle tens of gigabits of traffic can be built without exceeding the
limits of today’s semiconductor technology. The principal reason for this is that each
point-to-point connection only has to operate at 1.66 times the speed of the offered load.
Even in a switch with 100 x 1 Gbps ports, each connection on the crossbar still only needs to
operate fast enough to pass 1 Gbps of traffic. To handle this capacity a shared memory/bus
design would have to be capable of passing 100 Gbps of traffic in a single data path. Worse
still, each connection would have to be able to pass traffic to the memory bus at this very
high rate. With today’s technology, a shared memory switch with 100 Gbps capacity is
virtually impossible to build. A crossbar switch with this capacity is a relatively simple
extension to the current Lucent crossbar switch.
Of course, over a prolonged period of time no switch, no matter what its architecture, can
output more traffic over an output link than the link’s maximum capacity. Any network can
become congested. When considering architectures, keep in mind that congestion is the
biggest cause of packet loss and delay in an internet. The only sure cure for congestion is
sufficient capacity, and Lucent’s crossbar architecture allows the P550 Cajun Switch to
provide huge bandwidth at low cost.
Scalability, simplicity, and the advantages of being able to support both current and future
gigabit throughput requirements are the reasons Lucent chose to go with this architecture.
32 @ 55Mhz
1000 Mbps
1.76 Gbps Speedup
Input Buffered Output Buffered
1000 Mbps
1000 Mbps
1000 Mbps
1000 Mbps
1000 Mbps
32 @ 55Mhz
Cajun White Paper #1 Performance Optimized Ethernet Switching 12

P550 Cajun Switch Core - Complementing the Crossbar
Although two fundamental features (crossbar switch fabrics and intelligent buffering) make
it possible to design and build switches at gigabit bandwidths with unlimited scalability, there
are other measures that can further optimize crossbar switch architectures. These measures
enhance switch flexibility and fault tolerance, and provide networks that are very easy to
configure and manage for a wide range of applications. The following discussion of Lucent’s
P550 Cajun Switch architecture, shown in Figure 11, highlights some of these key features.
Figure 11. Cajun Switch Core – ASIC Block Diagram
Lucent Cajun Switch
Architecture Overview
Port 0
Queue Manager Switch
A16, D64
A16, D32
A16, D32
Packet Look-Up Engine
A16, D64
A16, D32
A16, D32
B_Bus_QS, Port 2
Port 2)
Queue Manager Switch Output
A16, D64
A16, D32
A16, D32
B_Bus_QG, Port 2
A_Bus_SQ, Port 2
Port 2
A_Bus_PQ, Port 2
CPU 860
860 CPU
Port 2
Port 2
Port 13
Port 13
Switch Matrix
Switch Controller
Port 1
Port 1
Port 0
A16, D64
A16, D32
A16, D32
A16, D64
A16, D32
A16, D32
B_Bus_QS, Port 13
Port 13)
A16, D64
A16, D32
A16, D32
B_Bus_QG, Port 13
A_Bus_SQ, Port 13
Port 13
A_Bus_PQ, Port 13
Cajun White Paper #1 Performance Optimized Ethernet Switching 13

Optimized Silicon
The P550 Cajun Switch can switch at up to 33 Million packets per second, or route IP and
IPX at up to 18 Million packets per second, accomplished with a state-of-the-art ASIC
architecture. Figure 9 shows the basic P550 Cajun Switch architecture.
No-delay address lookup
Address lookup occurs at line speed off of the wire prior to buffering, so packets are never left
waiting (There can be more than 24,000 MAC addresses in the address lookup table).
Unknown addresses are flooded until learned, and no delay is introduced into the
forwarding process by the learning function.
Input buffering
ASIC logic enforces priorities. For example, if traffic backs up on an in-bound link the most
performance-sensitive traffic is serviced first. This is one of several places where Quality of
Service (QoS) is implemented.
Output buffering
Organizations typically employ segments of different bandwidths in the same network.
Output buffering allows Lucent’s switches to send traffic through the switch at 1.76 Gbps,
while still buffering it for transmission at slower speeds. A flexible set of QoS disciplines is
also maintained in the output queuing.
Flood rate limiting
This is another technique that aids traffic management. The architecture supports the ability
to selectively filter multicast, broadcast and flooded traffic from higher-speed segments as it
flows onto a lower-speed segment. One key option is the ability to limit multicasts so that only
a certain percentage of multicast traffic is forwarded to selected outbound ports.
A Feature Set that Completes the Picture
Regardless of architecture and performance, a high-capacity backbone switch must have a
feature set that optimizes its usefulness in real networks. ASIC features that add value to the
P550 are described in the following sections.
Queue Management Engine (QMSx)
The Queue Management Engine is optimized for new multicast applications and supports:
• Packet pipelining – packets flow through the switch fabric with less interpacket gap
than when they are received off the wire, which is achieved using pipelined parallelism
in the lookup, routing and forwarding operations.
• Packed frame buffers – the packet frame buffers scale as each new module is added
to the switch. The switch optimally uses up to 21 MB of packet buffering by packing
large and small packets consecutively in memory.
• Class of Service/Quality of Service (CoS/QoS) - A rich feature set for CoS/QoS
includes flow control, prioritized traffic, and configurable queue thresholds.
• Hardware-assisted, multicast address pruning – The hardware provides
additional multicast pruning within VLAN boundaries.
Cajun White Paper #1 Performance Optimized Ethernet Switching 14

Layer 2 Packet Lookup Engine (PLE)
This feature implements Layer 1, 2 and 3 VLAN lookup, and 24,000 address bridging
functionality at gigabit wire speed. IEEE 802.1Q VLAN tagging is also implemented, as well as
other proprietary formats currently in widespread use.
Layer 3 Packet Routing Engine (PRE)
The PRE routes IP and IPX traffic using traditional “packet-by-packet” routing. The Packet
Routing Engine also implements IP unicast, IP multicast, IPX unicast, RSVP filtering and
flow separation, and additional Layer 4 forwarding rules.
A highly configurable package
The P550’s modularity and flexibility allows you to mix and match port speeds and
functionality for different application areas. You may mix and match any of the following:
• Up to 24 x 1 Gbps 1000BASE-X ports
• Up to 60 x 100 Mbps 100BASE-FX ports
• Up to 120 x 10/100 Mbps 10/100BASE-TX ports
• Up to 12 x 1 Gbps 1000BASE-X Layer 3 routing ports
• Up to 60 x 100 Mbps 100BASE-FX Layer 3 routing ports
• Up to 72 x 10/100 Mbps 10/100BASE-TX Layer 3 routing ports
Integrated Routing
The P550 Cajun Switch architecture is available as a pure Layer 2 Ethernet switch, or as a
Layer 2/Layer 3 switch with integrated routing. Based on the axiom that the customer should
pay for routing functionality only where it is needed, the P550 Cajun Switch with integrated
routing can supply up to 18 Million packets per second of routing capacity, which exceeds
traditional campus routers by over two orders of magnitude.
Deterministic Address Lookup
The address table lookup performs the same whether there is one address or 24,000
addresses. Each ASIC always runs at over 1,488,000 lookups per second, which is made
possible by a unique two-stage deterministic hashing algorithm.
Up to 240,000 Routes per switch
The P550 Cajun Switch with integrated routing supports up to 240,000 unique fine-grained
routes per switch. Each route can be a unique IP destination, an IPX destination, an IP
Source Address/Destination Address (SA/DA) flow, or a full Layer 4 application flow that
includes TCP or UDP port numbers. Each route can have its own unique destination and
QoS classification.
Up to 1024 VLANs per switch
The P550’s VLAN capabilities, and more than 24,000 MAC addresses in its Layer 2
forwarding tables, enable the switch to take maximum advantage of the crossbar's inherent
scaling capabilities. The high number of possible VLANs reduces the scope of VLAN
broadcast domains, and therefore limits the impact of Layer 2 broadcast traffic on network
performance as networks grow.
Cajun White Paper #1 Performance Optimized Ethernet Switching 15

Multicast Optimization
Traditional crossbar architectures were optimized around point-to-point applications. The
P550 Cajun Switch crossbar architecture is optimized for LAN-oriented broadcast and
multicast requirements. Any single input can be sent to multiple outputs simultaneously
without having to make multiple copies of a frame. The Cajun crossbar fabric actually makes
copies of broadcast frames on the fly, copying those frames to the required output ports in
one transfer cycle.
Multicast Traffic Pruning
Multicast pruning reduces the scope of multicast flooding to only those ports that are
involved in the multicast group. Additionally, the intelligent queue management should
discard broadcast, unknown destination, and unknown multicast traffic that exceeds a
threshold specific for that particular port. To further address the flooding issue, the switch
should also support the evolving IEEE 802.1p Group Address Registration Protocol (GARP).
Fault tolerance
The more scalable an architecture is, the easier it is to make fault tolerant. That is because a
necessary prerequisite of fault tolerance (i.e., parallelism) is often a natural extension of
scalable systems. Parallelism can be applied to improve reliability as well as performance. An
example is the P550's N+1 switching elements that switch on when a failure in an active
element occurs. Other examples include the ability to hot-swap modules, cables, and power
Low Latency
Due to its high internal data rates, pipelined switching logic and ASIC-intensive data paths,
the typical latency for the P550 is under 10 microseconds; with Gigabit Ethernet traffic
latency is as low as 3.5 microseconds. Fast Ethernet traffic has a minimum latency of 8.5
Enhanced Multi-Level Spanning Tree
This spanning tree method reduces the likelihood that a loop through an untagged access
port could cause a tagged trunk link to go into a blocking state. In a traditional spanning tree
implementation, all but one of the parallel paths to the root node are blocked – including
trunks – in order to prevent loops. By supporting parallel active trunks to the root and
reducing VLAN domains, Lucent effectively provides a type of spanning tree fault tolerance.
Even though a link between a VLAN and its backbone might be blocked, this does not cause
the trunk (backbone) itself to be blocked. Traffic can continue between segments connected
to other VLANs and other trunks.
Class of Service/Quality of Service:
One QoS/CoS feature results from the switch’s ability to queue frames of different priorities
separately, allowing high-priority traffic to bypass lower-priority traffic. Figure 10 shows
how the Lucent P550 Cajun Switch performs under load with various mixes of classes of
service. Note that as the offered load approaches 100 percent, even at a 50/50 ratio between
high and normal priority traffic, the mean delay of the high priority traffic remains bounded.
The importance of this graph is that it shows that the mean delay or latency of jitter-
sensitive traffic flowing through the P550 Cajun Switch can be maintained at the lowest
levels, even as the offered load reaches its peak. Recent third-party benchmark results have
confirmed the data in this graph, even as offered load approached 200 percent. (Please
contact Lucent Technologies for more specific data).
Cajun White Paper #1 Performance Optimized Ethernet Switching 16

Figure 12. High Priority vs. Low Priority Delay
Industry Leading Trunking Capability
OpenTrunk™ in the P550 Cajun Switch enhances network scalability, fault tolerance,
performance, and interoperability. Lucent continues to set the industry standard for allowing
network managers to leverage the newest in Gigabit switching and routing technology while
leveraging their existing investment in pre-standard VLAN-capable switching, routing and
server equipment.
Lucent's OpenTrunk implementation is a collection of three capabilities common to many
newest generation LAN switches. Yet, in each case, the Cajun Switch family enhances these
three capabilities in ways that unquestionably provide far higher functionality with more
practical usability than the competition.
OpenTrunk consists of:
• OpenTrunk VLAN Tagging
• OpenTrunk Multi-Link Trunk Groups
• OpenTrunk Spanning Tree
The following sections provide more detail on each of these three capabilities.
High Priority Mean Del ay
Offer ed Load
Low (90/10)
Low (70/30)
Low (50/50)
High (90/10)
High (70/30)
High (50/50)
Cajun White Paper #1 Performance Optimized Ethernet Switching 17

OpenTrunk™ VLAN Tagging
The P550 Cajun Switch introduced multi-vendor interoperability to VLAN tagging on
Ethernet. To date, no one else provides the degree of VLAN interoperability that the Cajun
switch provides. Period.
OpenTrunk™ VLAN Tagging performs interoperably with most popular standard and
pre-standard VLAN capable LAN switches. This includes tagging schemes such as standard
IEEE 802.1Q, pre-standard IEEE 802.1Q (such as Bay Networks), 3COM VLT as used in the
LinkSwitch family, and a multi-layer tagging scheme widely deployed by a major vendor.
Not only is VLAN tagging information supported, but the P550 Cajun Switch can process
QoS or priority signalling from each of these tagging systems and queue frames
appropriately. In fact, the P550 Cajun Switch is the
product in the market that can
translate frames from a single VLAN between all of these tagging dialects.
Even though it may appear that these translations could add software overhead to frame
forwarding, the Cajun ASICs handle them at gigabit wire speed with no performance
penalty. The P550 Cajun switch has been field-tested for interoperability with other vendors.
It is in every day use with products such as Intel's ’tagging’ Fast Ethernet server adapter,
Cisco's Catalyst 5x00 switches and high end routers such as the 7500, 3Com's Superstack II
Switch 1000 and 3000, as well as standards-based implementations that are only now
reaching the market in competitor’s products.
Multi-link Trunk Groups
Trunking has created a lot of interest in the LAN industry, and trunk groups provide the
ability to load-balance multiple parallel links between two switches. Originally introduced as
the Hunt Group feature set in the P550 Cajun Switch, multi-link trunk groups sets the standard
for capacity, scalability, fault tolerance, and interoperability in the LAN-switching industry.
One potential problem is a broad range of trunking implementations that are potentially
non-interoperable are being deployed. For example, competitive systems from Cisco, 3COM,
and Extreme Networks limit trunk groups to no more than 4 to 6 ports, require that ports in
a trunk group be sequential within the switch, or that selected ports must be limited to ports
within a single module. The Multi-link Trunking feature that is offered with the Cajun family
provides a truly superior capability.
The P550 Cajun Switch supports up to 15 trunk groups per switch, and a virtually unlimited
number of Gigabit Ethernet or Fast Ethernet ports per trunk group. Multi-link Trunking
group capacity is only limited by switch port density (up to 24 Gigabit Ethernet ports or up
to 120 Fast Ethernet ports), effectively providing network-wide scalability up to the full
45.76 Gbps bandwidth of each P550 Cajun Switch. Recovery time from link failures within a
trunk group is virtually unmeasurable, and traffic is almost instantaneously rebalanced
across the remaining ports within the trunk group. Fault tolerance is further optimized by
allowing ports to be spread across multiple modules. Effectively, meantime-to-repair (MTTR)
can be reduced to virtually zero, since failed ports on one module within an active trunk
group can be quickly hot-swapped.
Finally, in order to demonstrate the openness of our Multi-link Trunking technology, we
have tested it with other vendor’s products, such as Sun's Quad Ethernet Adapter and
Cisco's Fast EtherChannel, to ensure interoperability (with and without VLAN tagging).
Cajun White Paper #1 Performance Optimized Ethernet Switching 18

OpenTrunk™ Spanning Tree
The P550 Cajun Switch works cooperatively with almost any implementation or derivative
of IEEE 802.1D spanning tree. Offering three basic modes of operation, the Cajun Switch's
OpenTrunk Spanning Tree supports a simple single tree implementation, as well as two
sophisticated spanning tree-per-VLAN implementations.
Single Spanning Tree
The more typical of the many legacy switching and bridging spanning trees is the single
spanning tree that is specified by IEEE 802.1D. The IEEE 802.1D spanning tree is easy to
understand, but suffers from poor scalability and the inability to differentiate between leaf
and trunk connections in a VLAN-capable network. For example, if the IEEE 802.1D
spanning tree is incorrectly configured, a loop between two untagged ports can shutdown a
trunk link, and interrupt VLAN connectivity.
Spanning Tree per VLAN
With the more complex spanning tree-per-VLAN implementation, interoperability is still
provided for IEEE 802.1D spanning trees on untagged ports, but multiple spanning trees
(including multiple roots) on a per-VLAN basis are also supported. VLAN load-balancing can
be performed across multiple tagged trunk paths, which provides better convergence and
tuning within each VLAN. Spanning tree-per-VLAN operation in the P550 Cajun Switch is
completely interoperable with systems implemented by Cisco in the Catalyst 5x00 family.
While Spanning tree-per-VLANs are more sophisticated, they suffer from architectural
limitations that preclude bridge/routers from providing redundant paths for bridged
protocols, such as LAT or NETBEUI. Further, if two untagged ports on different VLANs are
accidently interconnected, both VLANs will unintentionally merge as the spanning trees
collapse into a single spanning tree.
Multi-Level Spanning Tree
Lucent's exclusive Multi-level Spanning Tree feature provides both interoperability and
sophistication. It provides all of the flexibility needed for multiple paths and trunk load-
balancing. And, with Multi-level Spanning Tree, the Cajun Switch can also detect untagged
VLAN loops, protect against VLAN merging, provide smaller spanning tree domains, and
provide much quicker convergence. Further, traditional bridge/routers that route IP or IPX,
but bridge LAT or NETBEUI, can now implement bridging redundancy using their own
legacy spanning tree with no ill effects. Finally, in cases where non-standard spanning trees
have been implemented by the vendor (such as the Bay Networks 28115), Lucent’s
Multi-level Spanning Tree provides an overlay that limits the scale and scope of
interoperability issues within the network.
OpenTrunk™ Summary
Lucent Technology’s P550 Cajun Switch’s OpenTrunk technology brings multi-vendor
interoperability to VLAN tagging, multi-link trunking, and spanning tree. And, all are
provided without compromising performance or functionality. OpenTrunk sets a practical
and useful standard of performance that no other switch on the market today can match.
Cajun White Paper #1 Performance Optimized Ethernet Switching 19

We began this discussion with a definition of what an ideal switch should be — one that
moves packets from any network segment to any other segment with zero time delay,
irrespective of the number of segments or their differences (i.e., kinds of traffic, traffic loads,
or traffic patterns).
The Lucent P550 Cajun Switch is a very close approximation to that ideal switch. The reason
for this is the scalability of the crossbar architecture, Lucent's unique ability to exploit that
architecture to its fullest, and designed-in fault-tolerance to ensure maximum up-time. The
P550 is not only fast, but its flexible configuration, and its many features and options allows
its speed to be deployed wherever and whenever network managers need it.
Unlimited capacity is the best cure for increasing bandwidth demands in the LAN, and an
optimized crossbar architecture is the best way to provide virtually unlimited scalable