MPLS-TE IN NETWORKS WITH VARYING CAPACITY

businessmakeshiftΔίκτυα και Επικοινωνίες

29 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

164 εμφανίσεις

1


MPLS
-
TE IN NETWORKS WITH
VARYING CAPACITY
CONSTRAINTS


Authors:
Kyriaki Levanti (CMU), Vijay Gopalakrishnan (AT&T Labs


Research), Hyong S. Kim (CMU),
Seungjoon Lee (AT&T Labs


Research), Aman Shaikh (AT&T Labs


Research)

1.1

Motivation

Multi Protocol Label Switching (MPLS) is a widely deployed forwarding mechanism in backbone networks.
Unlike popular intradomain routing protocols such as OSPF
and

ISIS
that

perform shortest
-
path routing
given a single cost metric (e.g., link capacity), MP
LS also allows for fine
-
grained traffic
-
engineering (TE)
and fast
-
reroute (FRR) during failures. Further, constraint
-
based routing (CB
-
R) allows the selection of
paths other than the shortest paths chosen by OSPF/ISIS, and tunnel priorities allow the diffe
rentiated
treatment of traffic. These two capabilities enable important network objectives such as load
-
balancing
and Quality
-
of
-
Service (QoS) for specific flows.

The ability to balance load and provide QoS is becoming increasingly important in networks. A
s traffic
demands increase, network providers can no longer afford to overprovision

network resources
.
N
etwork
providers need to efficiently use their network resources by balancing traffic across their links. Similarly,
as traffic demands increase and the

Internet is required to support many different types of services, it is
essential for network providers to be able to
offer service
-
specific QoS
guarantees to different
traffic
flows
.

Network providers are currently using MPLS in the following ways: 1)
Q
oS

service to customer networks:
customer networks specify their requirements in terms of the volume of traffic to be forwarded,
destination points, and q
uality of service, and provider networks

setup and manage the tunnels
that

will
meet the customer’s ne
eds. 2) Pinning
-
down important or large traffic flows: network providers setup
2


static tunnels between Points
-
of
-
Presence (PoPs) in their network in order to closely monitor the traffic
that flows between these PoPs. Operators gain traffic awareness once th
ey pin
-
down the traffic on
specific known paths. In network operations, traffic awareness translates to simple

and effective

traffic
management. 3) Creating a
n

iBGP
-
routing
-
free core backbone network: operators create a full mesh of
tunnels between the net
work’s PoPs.
The paths of the tunnels are either statically defined in
configuration or
calculated by the in
tradomain routing protocol. In this architecture,
the core
backbone
routers only perform forwarding and do not need to run i
nternal
BGP (iBGP). We n
ote that
i
BGP
distributes the routes learned through external BGP to
the network’s
backbone routers
. The

iBGP
-
routing
-
free core backbone
architecture
results in less
processing
overhead on
the network’s core routers.
Additionally, this architecture allows
for path protection through MPLS fast
-
reroute.

Although there are considerable benefits in the full mesh tunnel design as desc
ribed above, this MPLS
design
does not achieve load
-
balancing, one of the key motivating factors for deploying MPLS.
However,
MPL
S
-
TE achieves n
etwork
-
wide load
-
balancing
. This is feasible

when the tunnel
configurations

include
bandwidth requirements. If the per
-
tunnel bandwidth reservations are statically defined, then there is a
disconnect between the routing design and the networ
k traffic demands. In order for t
he routing design
to follow
fluctuating traffic demands, operators need to regularly monitor the traffic demands and
manually reconfigure the bandwidth of the tunnels as needed.

Network
-
wide load
-
balancing is also feasible

without manual intervention through a
self
-
adjusting tunnel
design
.

The

reconfiguration of
a tunnel’s

bandwidth reservation and
the dynamic adjustment of

the
reservation
can be automated to
track changes in the

per
-
tunnel traffic volume
s
. In particular, t
he
auto
-
bandwidth

option
Error!
Reference source not found.

automatically adjusts a tunnel’s bandwidth
reservation according to its incoming traffic rate
.
Upon ban
dwidth adjustment, a tunnel may stay on the
same path or it may change to a new path that better accommodates the adjusted bandwidth
requirement.
Auto
-
bandwidth is

one of the few router mechanisms
that

dynamically changes network
configuration
to avoid

hum
an intervention. Auto
-
bandwidth
is implemented by multiple router vendors
.

3


Dynamic traffic
-
engineering

refers to the traffic
-
engineering practices that adjust to changes in the
network’s
traffic demands.

Dynamic traffic
-
engineering
, as implemented by auto
-
bandwidth, reduces the
management overhead of intradomain routing in backbone networks but it also includes challenges and
risks. First of all, the automated adjustment of tunnel bandwidth can increase the number of tunnel
path
changes in the network. C
ons
equently,
it can reduce

the
tunnel path visibility
.
Tunnel path visibility
refers to the operators’ awareness of the paths that the tunnels take in the network at all times.
Additionally, when the routing depends on external factors such as the traffic vol
umes forwarded by
neighboring networks, then the risk of network
-
wide routing instabilities increases. A single tunnel with
highly variable bandwidth requirements can potentially trigger multiple tunnel path changes.
A tunnel
path change follows

the make
-
b
efore
-
break paradigm

and

does

not cause packet loss
. However, the path
change can
c
ause packet reordering, reduce the application throughput
,

and deteriorate the data plane
performance.

Also, t
unnel path changes can
cause tunnel preemptions. Tunnel preempt
ion is tearing
down one or more low
-
priority tunnels and using the previously reserved bandwidth for setting up a high
-
priority tunnel.

Tunnel preemption

does not follow the make
-
before
-
break paradigm, and therefore, it
intermittently disrupts low
-
priority

traffic.
Overall, i
t is important

to investigate the challenges and risks
involved with dynamic traffic
-
engineering in order to ensure that it is both an
effective
as well as a
safe

traffic
-
engineering solution.

In this chapter, we propose the
tunnel
visibility system
, a system that calculates the network’s tunnel
paths in an offline manner, even when the network deploys dynamic traffic
-
engineering.
The system uses
as input

the network’s configuration and traffic demands.

Then, we apply the tunnel visi
bility system on
the backbone network of a major tier
-
1 network and investigate the impact of dynamic traffic
-
engineering
on the stability of the tunnel setup.





4




1.2

Related Work

First,
we present two recent network measurement studies on the prevalence o
f MPLS in the Internet and
on the performance of MPLS
-
TE in a service provider network. The results of these works further
motivate the investigation of dynamic traffic
-
engineering. W
e
also
present previous works on MPLS
traffic
-
engineering (MPLS
-
TE). We o
bserve that the early works on MPLS
-
TE focus on the tunnel
placement problem

and

address the static traffic
-
engineering problem. Also, many works assume multiple
tunnels between two edge nodes. However, the operational practice is to have one or only a few

tunnels
per ingress
-
egress pair. The proposed traffic
-
engineering algorithms output explicitly
-
routed tunnels,
whereas the common practice is to use dynamic tunnel path calculation so that the tunnel setup
automatically adjusts
according
to network topolo
gy changes. Overall, these works do not consider the
operational
aspects

in MPLS
-
TE deployments

but they focus on the traffic
-
engineering algorithms
.
Finally,
w
e present

some previous works on load
-
sensitive routing. These works investigate packet
-
based lo
ad
-
sensitive routing, whereas auto
-
bandwidth implements flow
-
based lo
ad
-
sensitive routing
.

Two recent works perform measurement studies of operational MPLS deployments.
[1]

measures the
prevalence and characteristics of MPLS deployments in the Internet over the past 3.5 years. This study is
based on traceroute
-
style path measurements that include MPLS label stacks. However, the MPLS label
stack is not always visible
to a public user of traceroute. Therefore, this measurement methodology
underestimates the number of MPLS tunnels in the Internet. Nevertheless, they find that 7% of all ASes
have been consistently deploying MPLS over the past 3
.5

years. 25% of all paths i
n 2011 cross at least
one MPLS tunnel. Also, they find that the largest deployments are found in tier
-
1 providers, and that
many ASes deploy traffic classification and engineering in their tunnels. Overall, this study shows that a
growing number of ASes ad
opt MPLS traffic
-
engineering. Thus, MPLS traffic
-
engineering practices
become increasingly important.

5


[2]

studies the performance of MPLS in Microsoft’s online ser
vice network (MSN). This network connects
Microsoft’s data centers to each other and to peering ISPs. MSN uses dynamic traffic
-
engineering in order
to route the changing traffic demands between the data centers using auto
-
bandwidth. The study focuses
on th
e increased latency caused by routing the traffic along paths with sufficient bandwidth, instead of
routing the traffic along shortest paths. They find that 80% of the increase in latency occurs due to
tunnel path changes presumably caused by auto
-
bandwidt
h.
This increase in latency reflects the tunnel
path changes where the new path is longer than the old path but the new path has sufficient capacity for
the new bandwidth reservation.
In our work, we perform a thorough investigation of auto
-
bandwidth
focus
ing on the risks involved in deploying a dynamic traffic
-
engineering scheme in backbone networks.

[3]

presents

a set of techniques for network
-
wide tunnel path optimization subject to the routing
constraints imposed by QoS requirements.
The

goal is to find the explicit routes that will jointly optimize
a
set

of

tunnel demands. They use multi
-
commodity flow solutio
n methods as primitives and focus on the
scalability and speed
-
of
-
response o
f the proposed techniques.
[4]

addresses the same problem but with a
non
-
linear programm
ing approach. Both works suggest a centralized network
-
wide ap
proach with the
assumption of
pre
-
programmed Label Switch Path
s

(LSP
s) that change

on relatively long time scales.

The following works target specific traffic
-
engineering problems
that

are not c
ommon in

operational
environments
:

in backbone networks, the tunnel specifications are usually static
,

and there are only a
few, if any, tunnel pairs between two edge nodes
.

[5]

focuses on the problem of dynamically routing
bandwidth guaranteed tunnels when the tunnel requests arrive one
-
by
-
one and there is no a priori
knowledge of
what bandwidth requirements the
future
LSP
requests

will have
. Their approach is rout
ing a
new tunnel on a path that results in minimum interference to potential future LSP setup requests. In
particular, they propose an algorithm
that

outputs explicit routes that
avoid
the
loading
of
the bottleneck

links.
[6]

focuses on
bidirectional

LSP setup and proposes an optimal LSP pair (upward and downward
LSP) selection method in the case where there are multiple LSP pairs between two edge nodes.

The followi
ng works implement
the
dynamic traffic
-
engineering but are not operation
ally viable solutions.

MATE
Error! Reference source not found.

does not respond to traffic changes and TeXCP
Error!
6


Reference source not found.

assumes multiple
tunnels between two edge nodes.
We note that,
although ISPs may deploy more than one tunnels between two edge nodes, they generally maintain a
small number of such tunnels for scalability reasons.
In detail,

MATE and TeXCP
require active monitoring
of the
network’s state. Specifically, MATE focuses on load
-
balancing short
-
term traffic fluctuations among
multiple LSPs between two edge nodes. They assume that the
network
-
wide
LSP
setup

is determined
using long
-
term traffic matrixes and they achieve effective
load
-
balancing by probing the current state of
the network. However, their approach cannot react to real
-
time traffic changes. TeXCP, on the other
hand, proposes an online distributed traffic
-
engineering protocol that performs
the
load
-
balancing in real
-
ti
me. TeXCP requires an agent residing at each ingress router and multiple tunnels to deliver the traffic to
the egress node. The agent moves traffic from the over
-
utilized to the under
-
utilized paths

between the
same source/destination pair
. This work is th
e first to address the stability requirement
of the
online
TE
protocols. Finally, COPE
Error! Reference source not found.

is a class of traffic
-
engineering
algorit
hms that optimize for the expected traffic demands and provide
s the

worst
-
case guarantee for the
unexpected traffic demands.
As the authors

note
,
their future work includes developing an efficient
implementation of COPE
that

can be integrated with the curr
ent operational practices. That is MPLS,
OSPF, and online traffic
-
engineering.

Next, we present works on packet
-
based load
-
sensitive routing. In Section
1.6.2.2
, we compare packet
-
based load
-
sensitive routing with flow
-
based load
-
sensitive routing. The latter is implemented by auto
-
bandwidth.
[7]
[8]
[9]

investigate

packet
-
based load
-
sensitive routing in the early days of the Intern
et. In
particular, they investigate link cost metrics for intradomain routing protocols
that

reflect the network
load. They find that dynamic metrics lead to routing instabilities and oscillations. We elaborate on these
works and compare packet
-
based with
flow
-
ba
sed load
-
sensitive routing in Section
1.6.2.2
.





7



1.3

Contributions

Our

contributions in intradomain routing management for
backbone

networks are
:



An offline
tunnel visibility system
that

is applicable to networks
deploying

dynamic traffic
-
engineering
:
To the best of

our knowledge,
this is the first
system
that

performs lightweight
simulations
of
operational
MPLS domains
. The tunnel visibility system predicts
t
he paths

of all the
tunnels in the network
given the network topology, the tunnels’ configuration
, and the tunnels’
traffic volume measurements
.
In other words, it predicts the network
-
wide tunnel setup.
Additionally, it

enables the network
-
wide simulation of the auto
-
bandwidth
mechanism
. The
purpose of this system is to provide
offline
visibility
of

the MPLS functionality
to

network
operators.



The measurement of the impact of factors
that

contribute to unpredictable tunn
el setups
:
We
verify the simulation of dynamic traffic
-
engineering with network measurement data from the
backbone network of a major ISP
. We

expose the factors
that

lead to
unpredictable, or so called
non
-
deterministic
,

tunnel setups. The network under an
alysis is configured with a full mesh of
tunnels between edge routers with different priorities and network
-
wide deployment of
the auto
-
bandwidth mechanism
. We find that, when auto
-
bandwidth is enabled,
the following factors play
a significant role in the
tunnel path predictions: (i)
T
unnel traffic measurements
missing from our
dataset
reduces the accuracy with which we can predict auto
-
bandwidth events by 15%.

(ii)
T
he
timing factor

that

is reflected in

the order with which tunnels establish their paths

ca
n reduce the
tunnel reroutes

in a two
-
month period

by
8
%
.

(iii)
The tiebreaking process between paths of the
same cost makes 5% of the tunnel paths unpredictable.
Due to these three factors, the tunnel
visibility system cannot predict the exact tunnel setu
p. However, we find that the predicted
tunnel dynamics, i.e. number of tunnel reroutes and failures, follow the real network operation
.

8




The investigation of dynamic traffic
-
engineering using the tunnel visibility system
: The auto
-
bandwidth option exposes t
he intradomain routing to external factors,
namely

the network’s
traffic demands. This makes the tunnel setup vulnerable to network
-
wide instabilities. Tunnel
path instabilities can be harmful to the data plane performance. We
identify the risks

involved in
MPLS tunnel designs that automatically adjust the tunnel bandwidth reservations according to the
observed traffic demands
,

and
we
investigate their impact through extensive simulations

based
on a real network dataset. Our dataset includes the
network topology, the network topology
changes, the tunnel configuration, and the tunnel traffic demands
. W
e focus on
various aspects
of auto
-
bandwidth: the

impact of the mechanism on the amount of tunnel reroutes, failures, and
preemptions, the responsive
ness of the mechanism to highly variable traffic patterns, and the
stability of the mechanism.



The
analysis of the impact of dynamic traffic
-
engineering on networks with varying capacity
constraints
: The
tunnel visibility system along with
a large

dataset
from
a

tier
-
1 ISP allows us to

investigate the operational
practices of
dynamic traffic
-
engineering.

We find that

as the capacity

of the network reduces to
50% of its initial capacity
: (i) Auto
-
bandwidth

increases the tunnel
reroutes by
almost 41
% and the
tunnel preemptions by
34

times
.
(ii)
Most of the additional
reroutes

occur
because of the preemption of lower
-
priority traffic
as higher
-
priority tunnels adjust
their bandwidth
. Th
ese reroutes are indirectly caused by auto
-
bandwidth. So, they represent
the

cascading effect of auto
-
bandwidth on the network
-
wide tunnel setup
.
(iii)
Auto
-
bandwidth
causes the reroute of increasingly large tunnels and the preemption of an increasing number of
small lower
-
priority tunnels.
Overall,
the
total
size of the tunnels
t
hat

are rerouted increases
by
three

times.

Thus, large amounts of traffic are impacted by the increased tunnel dynamics.



The identification of
a routing design detail

with
major impact on networks with stringent
capacity constraints
:
When tunnels cannot
find a path to accommodate their adjusted bandwidth
reservation, they maintain their old reservation and do not resize to the

maximum bandwidth
that can be accommodated. We call this
auto
-
bandwidth failure
.
Auto
-
bandwidth failures are

soft
9


failures

because

the tunnels do establish a path but the bandwidth reservation does not

correspond to the amount of incoming traffic. W
e find that t
he
ir

impact aggravates in
networks

that

are
have less spare capacity
.

In detail,
auto
-
bandwidth failures
barely

occur in the

real

network but,
when the capacity of the network decreases
by 50%
,
large amounts of traffic
are
impacted by the auto
-
bandwidth failures
.



The investigation of the auto
-
bandwidth responsiveness and stability
: W
e
test the behavior of the

auto
-
bandwidth
mec
hanism
in the presence of

extreme

traffic shift
s

and we
find

that the
bandwidth adjustments are significantly
delayed by the
moving

average
algorithm
run

by the
routers of a major vendor
for calculating the tunnel traffic rates. In terms

of
the
stability

of auto
-
bandwidth
, we analyze the previous experiences with packet
-
based load
-
sensitive routing and
conclude that auto
-
bandwidth is
stable
. This means that
auto
-
bandwidth

does not cause
recurring tunnel path changes to a single tunnel. However, this holds

provided that the tunnel
path chang
es do not
cause the reaction of

end
-
to
-
end traffic
-
engineering

that is external to the
network
.











10



1.4

Background

1.4.1

MPLS
-
TE Basics

Multi Protocol Label Switching

(MPLS)
[10]

uses labels to forward traffic across the MPLS domain.
When a packet enters an MPLS domain, the router imposes a label on the packet and the label, as
opposed to the IP header, determines the

next
-
hop
. The label is removed at the egress point of the MPLS
domain. When a packet arrives at a Label Switching Router (LSR), the incoming label determines the
outgoing interface. The LSR may also swap the incoming label to the appropriate outgoing labe
l. Labels
are assigned to packets based on Forwarding Equivalence Classes (FECs). Packets belonging to the same
FEC are
forwarded in

the same way
within

the MPLS domain. The Label Switched Path (LSP)
that

a
packet takes in an MPLS domain is what we call an

MPLS
tunnel
.

Traffic
-
engineering

(TE) refers to the process of selecting the
path for data traffic belonging to some
service

so that the service’s objectives are satisfied. TE mostly aims at simultaneously optimizing the
network utilization and the traffi
c performance. Existing Interior Gateway Protocols (IGPs) are inadequate
for traffic
-
engineering because their routing decisions are based on shortest path algorithms and do not
take into account bandwidth requirements or other
service
-
specific requirement
s
.

In
Figure
3
-
A
, we

show the block diagram of
MPLS
T
raffic
-
engineering

(MPLS
-
TE)

as implemented at
the
head
-
end router
, the LSR that originates
a

tunnel. MPLS
-
TE
relies on the IGP

to distribute the
network t
opology and network state information: TE
-
specific link metrics, available link bandwidth, link
attributes (e.g. core
-
core link or edge
-
core link), etc. This information forms the TE Topology Database.

Path selection is based on the collected TE informati
on and the
Constrain
t
-
Based Routing

(CB
-
R)
algorithm. CB
-
R calculates the best path for a tunnel. Path selection takes place when (i) there is a new
tunnel request, (ii) an existing tunnel has failed,
or
(iii) the path of an existing tunnel is
reoptimized

[11]
.
Tunnel
reoptimization

refers to the re
-
running of the CB
-
R algorithm by the head
-
end router. The
11


head
-
end router may
fi
nd a better path for the tunnel and rer
oute it
.
It

take
s place periodically and the
re
optimization frequency is specified in the head
-
end router’s configuration.

In one example
CB
-
R

implementation

[12]
, the algorithm ignores the links
that

do not have sufficient
resources and vi
olate policy constraints, and then,
it runs Dij
kstra on the remaining topology.

T
his
returns the shortest path (or paths)
that also satisfies

the routing constraints specified by the operator. In
IGPs, traffic uses multiple paths with the same cost to the destination
if such paths exist
(Equal
-
Cost
Multi
-
Path). However, CB
-
R
looks

for
a single path from the tunnel’s source to the tunnel’s destination.
Therefore, in the case of multiple paths, the following tiebreakers determine the
tunnel’s
path

[12]
:



Choose

the path with the highest minimum
available
bandwidth
.



Then, choose the path with the lowest number of routers on path.



Then, choose a random path. This
is usually

the first path in the list of chosen paths.

Afterwards,

the head
-
end router sig
nal
s

the chosen path and set
s

up the tunnel. The tunnel setup is
implemented by

a reservation protocol
that

supports TE, namely RSVP
-
TE
[13]

or CR
-
LDP
[14]
. We
Figure
0
-
A
:

MPLS
-
TE system b
loc
k diagram
.

http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6557/ps6608/whitepaper_c11
-
551235.html


12


elaborate on
RSVP
-
TE

because it is the most widely used protocol for MPLS
-
TE. R
SVP
-
TE is an extension
of the RSVP protocol. The head
-
end router sends an RSVP
-
TE PATH message along the selected path in
order to request resources for the tunnel. The tail
-
end router sends back an RSVP
-
TE RESV message
upon the receipt of the PATH message

and initiates the label distribution process. Once the LSP is
established, traffic can be forwarded across the tunnel.

Finally, we elaborate on the
tunnel attributes
.
Every tunnel is specified in the configuration file of the
head
-
end router. We describe
the most important
tunnel attributes
:

1)

P
ath calculation: T
he tunnel’s path can be explicitly specified in router configuration or it can be
dynamically calculated by the head
-
end router using the CB
-
R algorithm.

2)

Bandwidth requirement:
The tunnel’s
bandwidth requirement

can be statically defined in
the
router configuration or it can dynamically adjust
to the tunnel’s traffic rate. In Section
1.4.2
, w
e
elabora
te on the
auto
-
bandwidth

option
that

implements the automatic bandwidth adjustment
.

3)

Priority: T
unnels can preempt each other to acquire bandwidth based upon their
defined priority
value. Priority values range from 0 to 7
.

4)

Adaptability: T
his is the frequen
cy of reoptimizing the tunnel’s path.

5)

Resiliency: F
ast
-
reroute (FRR) is the dominant mechanism for tunnel
path
protection. When FRR
is deployed, more options
regarding

the backup path calculation need to be specified.

6)

Affinity flags: Affinities are the p
roperties that the tunnel requires in its links. The permitted or
excluded link
affinities

(otherwise known as link colors) introduce additional constraints in the CB
-
R algorithm.

For example,
an operator may specify an affinity flag that does not allow
edge
-
to
-
core links to be included in the tunnel’s path.

Finally, w
e elaborate on tunnel priorities and on how
preemptions

are performed in the MPLS domain.
Each tunnel is configured with a setup and a hold priority. In particular, a new tunnel with high
se
tup

priority can be established by preempting tunnels with lower
hold

priority. This happens when there is
insufficient reservable bandwidth for the new tunnel to establish but there is sufficient reservable
bandwidth if tunnels of lower hold priority are
torn down.
The head
-
end routers implement proprietary
13


decision logic algorithms when
deciding which tunnels to preempt
. W
e
also
note that
,

when a tunnel is
preempted, its head
-
end router tries to re
-
establish it on another path with its previously signalle
d
bandwidth.

1.4.2

Auto
-
bandwidth Mechanism

14


The
auto
-
bandwidth

Error! Reference source not found.

mechanism enables dynamic traffic
-
engineering. Auto
-
ban
dwidth automatically adjusts the bandwidth
reservation

of an MPLS tunnel based
on how much traffic is flowing through the tunnel. Hence, it automates the monitoring and
reconfiguration of the tunnel’s bandwidth. Auto
-
bandwidth adjusts the tunnel bandwidth
based on the
largest average output rate observed during the last
adjust
-
interval

as long as this rate does not exceed a

configured maximum and minimum bandwidth value. The output rate is estimated by sampling with a
configured frequency value. When the tu
nnel’s bandwidth is adjusted
to a new bandwidth constraint, the
head
-
end router generates a

new RSVP
-
TE PATH message
. If the new bandwidth reservation cannot be
satisfied by any path

in the network, the current path

will continue to be used

and
the tunnel’
s

bandwidth
reservation will remain unchanged
.

Figure
3
-
B

shows an example of auto
-
bandwidth in time. Next, we
present an elaborate list of
the
auto
-
bandwidth parameters
that

control the operation of the mechanism.

The auto
-
bandwidth operation is determined per tunnel by the following parameters:

Figure
0
-
B
:

Auto
-
bandwidth example.

http://s
-
tools1.juniper.net/solutions/literature/app_note/350080.pdf

15




Adjust
-
interval: the interval between bandwidth adjustments.



Collection
-
interval: the interval of collecting
output rate information for the tunnel.



Maximum
-
bandwidth: the maximum automatic bandwidth for the tunnel.



Minimum
-
bandwidth: the minimum automatic bandwidth for the tunnel.



Adjust
-
threshold percentage: the b
andwidth change
percentage threshold

that

trigge
r
s

an
adjustment if the
maximum bandwidth in the adjust
-
interval

is higher or lower than the current
bandwidth

reservation
.



Adjust
-
threshold minimum:
the bandwidth change
value
threshold

that

triggers

an adjustment

if
the
maximum bandwidth in the adjust
-
in
terval

is higher or lower than the current bandwidth

reservation.



Overflow/underflow threshold percentage and minimum: the bandwidth change percentage and
value threshold
that

trigger
immediate

adjustment of the tunnel’s bandwidth
.



Overflow/underflow limit
: the necessary number of
consecutive

collection
-
intervals
that

exceed
the specified overflow/underflow thresholds and trigger an immediate bandwidth adjustment.

Finally, we comment on how the router estimates the output rate
of

a tunnel.
Routers from a
major
vendor

use a five
-
second
moving average algorithm

[15]
.
This algorithm has the following effect:

the
output rate rises less quickly on traffic spikes and falls less rapidly if traffic drops suddenly
, as opposed to
not using this algorithm and setting the output rate equal to the actual output rate
. The output rate in a
five
-
second interval is given b
y the following weighted average formula:

new rate = ((old rate


current rate) * exp (
-
5secs / 5mins)) + current rate

where
new rate

is the output rate reported to the auto
-
bandwidth mechanism,
old rate

is the output rate
five

seconds ago, and
current rat
e

is the
actual output rate

in the last
five

seconds. The exponential
factor is applying exponential decay to the deviation of the current rate from the previously reported
output rate.


16


1.5

Tunnel Visibility

The visibility of a routing mechanism is essential
in network operations. It
happens

that the routing
mechanism and protocol implementations do not follow the network operators’ perception about how
the
se

mechanisms or protocols function. In these cases, it is difficult to redesign the routing when
network

problems appear or when the network objectives
change
.
When

operators can accurately predict
the

routes selected

by the

routing mechanism

given
sufficient

information about the network’s state, then
the routing mechanism is
visibl
e

and easier to manage.
H
ere, by network state, we refer to the network
topology, the routing configuration, and the traffic demands.

T
unnel visibility refers to the accurate determination of the tunnel setup. In detail, a tunnel visibility
system estimates the network
-
wide tunnel

setup
in an offline manner
based on the tunnel
specifications

in the network’s configuration. When dynamic traffic
-
engineering is deployed, the
tunnel visibility
system
also requires
as input the tunnel traffic demands. If

the tunnel visibility system
cannot

provide an
accurate view of the real tunnel setup,
then
it
ignores

factors
that

are affecting the tunnel setup in an
operational environment.

We
implement a tunnel visibility system for networks deploying dynamic traffic
-
engineering and
investigate
the presence and impact of such factors.

In the rest of this section, we
describe and evaluate the

tunnel visibility system. This system simulates
the

network
-
wide MPLS deployment. In a nutshell, the tunnel visibility system predicts the paths that the
tun
nels take based on an offline constraint
-
based routing algorithm imp
lementation. W
e verify the tunnel
visibility system’s accuracy by applying it to a tier
-
1 network both when dynamic traffic
-
engineering is
enabled and when it is not. We use multiple data
sources in order to compare the system’s output with
the real network operation. We further investigate the determinism of the tunnel setup with respect to
the
sequence

that the head
-
end routers reoptimize their tunnels. Finally, we discuss our findings.

To summarize
, dynamic traffic
-
engineering reduces
the
tunnel visibility because
: (i) I
t requires detailed
traffic volume measurements in order to accurately
predict

the
operation of the
aut
o
-
bandwidth
mechanism
. In our dataset, 20% of the auto
-
bandwidth ad
justments are mis
predicted

because
the
available traffic volume
measurement
s are more coarse
-
grained than the ones used by the auto
-
17


bandwidth mechanism in the real network
.

(ii
)
When dynamic traffic
-
engineering is enabled, t
he
C
-
BR

tiebreaking

introduces
n
on
-
determinism

in the tunnel setup
. 5% of the tunnel paths are
inaccurately
estimated.
(iii)
T
iming factors
that

are not
controlled by the operators
,
such

as

the order with which t
he
head
-
end routers

reoptimize their
tunnels,

introduce
small but not
negligible non
-
determinism in the
tunnel setup.

For example, w
e find
one order of tunnel reoptimization
that

results in
8
% less tunnel
reroutes than the average number of tunnel reroutes measured over three random tunnel reoptimization
orders in a two
-
mont
h period.

1.5.1

System

The tunnel visibility system consists of an inte
grated set of components:
(i
) an open source toolbox for
traffic
-
engineering methods
(
TOTEM
)

[16]
[17]
,

and
(ii)
the
AutoBw S
imulator
, an implementation of the
auto
-
bandwidth mechani
sm. In detail, we modify and extend TOTEM
to convert it
from a simulator of
traffic engineering methods into a simulator of network
-
wide MPLS operation.
We

extend TOTEM to
enable the simulation of dynamic traffic
-
engineering and
of
other mechanisms
that

de
termine the MPLS
functionality in production net
works. We implement the AutoBw S
imulator according to the auto
-
bandwidth implementation of a major router vendor.
Figure
3
-
C

illustrates the architecture of the tunnel
visibility system. In the rest of this section, we present TOTEM, the open source toolbox
for

traffic
Figure
0
-
C
:
T
unnel visibility system

architecture
. The system includes two main components: the
AutoBw Simulator and the modified and extended TOTEM toolbox. The input is the network’s state

the
network topo汯lyI
the
tunnel
s’

捯nf楧urat楯iI and
the per
-
tunne氠traff楣i vo汵me measurements. qhe
output 楳 the network
-
w楤e tunne氠setup
. qhat 楳 the tunne氠paths
.

18


engineering methods. Then, we present the extensions
and modifications
introduced to TOT
EM. F
inally,
we describe the AutoBw S
imulator.

1.5.1.1

TOTEM

The TOolbox for Traffic Engineering Methods (TOTEM)
[16]
[17]

simulates how traffic is routed on a
network using
Shortest
-
Path
-
F
irst

(SPF),
Constrained
-
Shortest
-
Path
-
F
irst

(CSPF) and other traffic
-
engineering routing algorithms.
CSPF implements Constraint
-
Based Routing (CB
-
R). From now, we use
the terms CSPF and CB
-
R interchangeably.
TOTEM is a flow
-
based event
-
driven simulator and not a
packet
-
based simulator. This means that it simulates the calculation of paths for network flow
s but not
the
per
-
packet per
-
hop behavior
. In other words, it does not simulate

the routing of
the
packets in the
network

but it simulates
the control
-
plane functionality
that

defines the tunnel paths through which
packets flow
.

In detail, TOTEM includes a

repository of
traffic
-
engineering methods

grouped into several
categories: 1) IP: algorithms using only IP information such as

the
IGP weight
s
, 2) MPLS: algorithms
using MPLS
-
TE functionalities such as
the
MPLS source
-
based routing,
the
computation of pr
imary LSP
paths using CSPF, and
the
computation of backup LSP paths using link/node disjoint constraints, 3) BGP:
a BGP decision process simulator
that

combines

inter
domain routing information with intra
domain routing
algorithms, 4) g
eneric: classic optimi
zation and search algorithms used by various components of the
toolbox.

TOTEM also includes a set of
components

that

interface with the algorithms repository and allow the
flexible use of the toolbox in a
n operational

setting. The three most essential comp
onents are the
topology manager, the scenario manager, and the traffic matrix manager. The topology manager handles
the network representation. The scenario manager handles the automatic execution of simulation events.
Some events already integrated with T
OTEM are the LSP creation,
link and node failures, and
link
utilization computations. The traffic matrix manager handles the traffic representation. Traffic flow
volumes can be
combined

with the traffic
-
engineering algorithms to generate network utilizatio
n
statistics. In addition to these three components, TOTEM includes a topology and traffic generator, a web
19


se
rvice interface for remote on
line control of the toolbox, a native interface for toolbox extensions, and a
graphical user interface for network op
erators.
Figure
3
-
D

summarizes the TOTEM architecture.

1.5.1.2

TOTEM Extensions

and Modifications

TOTEM simulates various path calculation
algorithms

but it omi
ts MPLS
-
TE functionality performed by
head
-
end routers
that

is important for the accurate representation of
the
network operation.
Hence, we
extend and modify TOTEM to
reflect the operation of an MPLS domain.

We

extend TOTEM to handle
tunnel

auto
-
bandwidth

events and network
-
wide reoptimization events.

In
detail, we simulate a
tunnel bandwidth adjustment

by rerunning the CSPF algorithm with the tunnel’s
adjusted bandwidth reservation. The auto
-
bandwidth events are generated by the AutoBw
S
imulator
that

we d
escribe next. A
fter the CSPF algorithm runs with the adjusted bandwidth requirement, the tunnel
may
: 1)
resize its bandwidth but stay

on the same path, 2)
change

path in order to accommodate the
new bandwidth reservation, 3)
stay

on the same path without a
djusting its bandwidth reservation
Figure
0
-
D
:
TOTEM toolbo
x

architecture
.

TOTEM includes the algorithms repository, the topology,
scenario and traffic matrix managers, and various interfaces for the flexible usage of the toolbox.

20


because no path
is

found to accommodate the new bandwidth reservation.
We extend TOTEM to
simulate this behavior.

We simulate
tunnel r
eoptimization

by rerunning the CSPF algorithm for each tunnel. The sequence of
reoptimi
zing the tunnels

can be inserted in TOTEM or calculated by TOTEM according to some criteria.
We elaborate on the tunnel reoptimization sequence in
Section
1.5
.3
. In reality, each tunnel maintains its
own reoptimization timer and
this timer
expires according to the reoptimization period
that

is specified in
the
router configuration.

In addition to th
ese extensions, we make four major modifications to TOTEM.

The

required modifications
are necessary to accurately
simulate

the stateful router operation

in production networks
:



CSPF

tiebreaking: TOTEM randomly picks one of the multiple paths returned by th
e constraint
-

shortest
-
path
-
first calculation whereas routers run specific tiebreaking rules before choosing the
single path
on which they will signal the tunnel
. We implement CSPF tiebreaking according to
the

major
router
vendor’s implementation.



Preempti
on decision logic: TOTEM randomly picks a sufficient set of lower priority tunnels
that

are on the path where the higher priority tunnel establish
es
. However, head
-
end routers in
production networks run specific preemption decision logic algorithms
.
We imp
lement the
preemption decision logic
of
the

router
vendor.



Handling preempted tunnels: TOTEM tears down
the
preempted tunnels whereas routers
attempt

to re
-
establish these tunnels on an alternative path. We implement this functionality.



Maintenance of tunn
el state: TOTEM does not keep state of the tunnels
that

fail

to establish

and/or are torn down
. However, head
-
end routers periodically attempt to re
-
establish the tunnels
that

have failed to
set up a path

in the past. We add tunnel state to TOTEM and simul
ate the
periodic router

attempts to re
-
establish failed tunnels.

Overall, the extended and modified version of TOTEM results in an
operational

version of TOTEM, a
version
that

simulates the operation of an MPLS
-
TE
-
enabled production network.


21


1.5.1.3

AutoBw Simula
tor

The AutoBw S
imulator simulates the auto
-
bandwidth mechanism as described in
Section
1.4.2
. The
AutoBw S
imulator takes as input a series of
traffic

rates (one value per
collection
-
interval
) and returns
a sequence of bandwidth adjustment events for the tunnel.

Below, we describe the

AutoBw S
imulator in
pseudocode. We assume that the overflow/underflow limit is
set to
1 for simplicity.

At the end of each
adjust
-
interval
, there is a bandwidth adjustment if the maximum bandwidth
measured during this interval (
MaxAvgBW
) exceeds the adjus
t
-
threshold percentage (
Adjust
-
Threshold
Perc
)

and the adjust
-
threshold value (
Adjust
-
Threshold Min)
. At the end of each
collection
-
interval
,
there is a bandwidth adjustment if the maximum bandwidth measured during the
adjust
-
interval

(
MaxAvgBW
) exceeds th
e over/underflow
-
threshold percentage (
Over/Underflow
-
Threshold Perc
)

and the over/underflow
-
threshold value (
Over/Underflow
-
Threshold Min)
.

whi
le

(end of
Collection
-
Interval
)



BW

=

tunnel
traffic

ratio



Max Avg BW


=

max(BW) within
Adjust
-
Interval




# Auto
-
Bandwidth A
djustment





if

(end of
Adjust
-
Interval
)






if

Max Avg BW

> (1 +

Adjust
-
Threshold Perc
)


*

Signalled BW






&&
Max Avg BW
-

Signalled BW

>

Adjust
-
Threshold Mi
n





|| Max Avg BW < (1
-


Adjust
-
Threshold Perc

)


*

Signalled BW








&&
Signalled BW
-

Max A
vg BW


>

Adjust
-
Threshold Min










BW
ADJUSTMENT
:
Signalled BW = Max Avg BW






restart

Adjust
-
Interval





# Overflow/U
nderflow

E
vent



else






if

Max Avg BW > ( 1 +

Overflow
-
Threshold Perc
)


*

Signalled BW







&&
Max Avg BW
-

Signalled BW

>

Overflow
-
Threshold Min





||
Max Avg BW < ( 1
-

Underflow
-
Threshold Perc
)


*

Signalled
BW







&&
Signalled BW
-

Max Avg BW

>

Underflow
-
Threshold Min









BW
ADJUSTMENT
:
Signalled BW = Max Avg BW





restart Adjust
-
Interval



22


1.5.2

Verification

We investigate the accuracy of the tunnel visibility system by simulating the MPLS deployment of a tier
-
1
backbone network and by verifying the simulation
observables

with the real network operation, as
extracted from various network measurements. In detail, we simulate the MPLS
-
TE functionality for two
months in 2011. We extract the network’s state from router configurations, IGP messages, and SNMP
traffic measurement
s, and use this data as input to the tunnel visibility system. During simulation, we
capture
the

tunnel activity
(i.e., the tunnel path changes)
and compare it against router logs
on MPLS
-
TE
and tunnel state measurements.

Next, we describe our dataset, th
e simulation setup, and the results from comparing the simulated
tunnel
dynamics

with the real
tunnel dynamics
. We find that the
AutoBw Simulator

accurately predicts on
average 80% of the real auto
-
bandwidth adjustments and that its prediction accuracy is
heavily
dependent on the
completeness

of the traffic volume
s

data
set
. We
also
find
that
the
simulated tunnel
paths match with the real tunnel paths in

95%
of the cases

and
that

the mismatches are attributed to the
CSPF tiebreaking.
Finally, although the tu
nnel visibility system
provides
good estimates

of the

tunnel
activity, it cannot predict the exact tunnel setup. The factors
that

contribute to this non
-
determinism are
the granularity of the traffic measurements, the CSPF tiebreaking
, as well as the
tunne
l
reoptimization
order

analyzed in

Section
1.5.3
.

1.5.2.1

Dataset

Network topology
:

We
obtain

the network topology using IGP topology snapshots generated by a
network
-
specific IGP tool
that

monitors the exchange of IGP protocol messages. This network uses OSPF
as
the
intradomain routing protocol. The OSPF topology includes the links
that

tunnels c
an traverse. We
retrieve specific link attributes
,

such as link cost and link capacity
,

from parsed daily snapshots of
the
router configuration files

and

convert the generated topology into the TOTEM
-
specific format. However,
this topology snapshot is vali
d for as long as there are no node up/down events, link up/down events,
and link cost change
s
. In order to maintain an accurate network topology

at all times
, we track such
events by processing the OSPF messages collected by the
OSPF

monitoring tool and no
te the changes in
23


the topology during
the

two
-
month simulation period. These
topology
changes are included in the
simulation scenarios, as described in the next section.

Tunnel configuration:

The configuration of the tunnels is available in both router con
figuration files
and in high
-
level design documents. Tunnel configuration also includes the auto
-
bandwidth

parameters
used by the AutoBw S
imulator. In the first month of our dataset, auto
-
bandwidth is gradually deployed
on
the network’s

tunnels. In the sec
ond month of our dataset, auto
-
bandwidth is enabled throughout the
network. Except for this change, the tunnel configuration remains static throughout the two
-
month
period.

Traffic measurements
: We have per
-
tunnel traffic volume measurements
for

3
-
6 minut
e intervals. This
dataset is collected by
the Simple Network Management Protocol (
SNMP
)
, and it is used both for
gener
ating the network traffic matric
es and for
simulating

the auto
-
bandwidth
mechanism
.

SYSLOG
messages
: The SYSLOG messages

that

pertain to
MPLS
-
TE routing are available from all the
routers in the network. These alarms include tunnel events such as reroutes after the expiration of the
reoptimization timer, bandwidth adjustments because of the auto
-
bandwidth mechanism, fast
-
reroutes,
changes i
n tunnel state (i.e., a tunnel going down or coming back up). Each alarm includes the
accurate
timing of the event, the affected tunnel name, and the event type.

Tunnel path measurements
:
SNMP

provides us with tunnel path snapshots along with incremental
u
pdates
of

the tunnel path changes. We note that this dataset only shows the path changes but includes
no information about the root
-
cause of
the

change.
Table
0
-
I

illustrates an overview of our dataset.



Table
0
-
I
:
Dataset Overview

Network Data

Description

Network Topology

Router configuration files, IGP topology snapshots and updates

Tunnel Configuration

Router configuration files, MPLS
-
TE design documents

Traffic Measurements

SNMP data including 3
-
6 min traffic volume rates per tunnel

SYSLOG
Messages

MPLS
-
TE SYSLOG messages

per router

Tunnel Path Measurements

SNMP data including tunnel

path
snapshots and
updates


24


1.5.2.2

Simulation Setup

In this section, we describe the scenario generation for TOTE
M and the setup for the AutoBw S
imulator.
The scenarios include the
daily events

ordered in time. Therefore, for each day we perform the following
steps t
o generate the scenario
that

is representative of the
MPLS domain

activity
for

that day:

1.

Scenario initialization:

We

(i) load the topology snapshot of the network at the beginning of the
day, (ii) start the routing algorithm, that is CSPF where
the
path length is determined by
the
OSPF costs and the constraint is the tunnel’s band
width reservation, and (iii)
establish the
tunnels. Since our dataset does not include per
-
tunnel bandwidth reservations, we assume that
the initial tunnel reservations coin
cide with the output rate of each tunnel at the beginning of the
day. In reality, the signalled bandwidth of the tunnel and its output rate should
be close in value

because of the auto
-
bandwidth mechanism.

2.

Topology updates
: W
e insert OSPF events, such as r
outer up, router down, link up, link down,

and

link cost change. When TOTEM executes these events, it updates the loaded network
topology but it does not act upon the tunnel setup. However, when routers are informed of
such

events from
the OSPF

protocol

me
ssages
, they
recalculate the tunnel paths
. Therefore, after
OSPF

event we also insert a tunnel reoptimization event. We note that, in reality, when a router
or link goes down, fast
-
reroute

(FRR)

is triggered within 50ms. In our simulations, we focus on
the

dynamics introduced by dynamic traffic
-
engineering, and thus, omit FRR, the prevalent MPLS
-
TE network restoration mechanism. This will inevitably lead to an inaccurate prediction of the
tunnel setup
during network

failures

because
the tunnels
that

are fast
-
rerouted in
reality

will be
torn down in simulation. However, this

does not interfere with our goal of
investigating

the auto
-
bandwidth mechanism.

Nevertheless, the tunnel visibility system can be extended to include the
FRR mechanism. We leave t
his as future work.

3.

Auto
-
bandwidth events: W
e insert the per
-
tunnel bandwidth adjustment events
that

are
generated by the AutoBw S
imulat
or. We elaborate on the setup of the AutoBw Simulator
next.

25


4.

Periodic
tunnel reoptimization: W
e insert a network
-
wide re
optimization event according to the
frequency of reoptimization
that

is specified in the router configuration. During each
reoptimization,
the extended version of TOTEM
(i) rerun
s

CSPF for all the tunnels
that

are up and
(ii) attempt
s

to re
-
establish all
the tunnels
that

are down
,

given their latest signalled bandwidth
before going down.

5.

Show link utilization: W
e output network
-
wide link utilization statistics after each reoptimization
event in order to track the network’s utilization.

Although we have per
formed a network
-
wide
analysis of the utilization levels in the network, we do not disclose this information because it is
proprietary for the network under analysis.

We order the topology updates, auto
-
bandwidth events
,

and periodic tunnel reoptimizations in time and
convert them into the TOTEM
-
specific scenario form
at. Finally, we set the AutoBw S
imulator parameters
to the values specified in router configuration and run the simulator on a per
-
tunnel basis given the
tunnel’s traffic volume measurements
during

a single day. Again, we assume that
(i)
the tunnel h
olds

an
initial bandwidth r
eservation
equal to the tunnel’s

traffic rate at the beginning of the d
ay
, and (ii) the
auto
-
bandwidth timer
that

defines the adjustment intervals starts at the beginning of the day
.
Furthermore, we find that the auto
-
bandwidth collection frequency

is
set to a value
that

is smaller than
the traffic
volume
measurement

frequency in our dataset
. Therefore
, we feed the
simulator with the
Figure
0
-
E
:
Overview of the scen
ario generation.

26


same
traffic rate
value for multiple collection intervals
. This

assume
s

that the rate remains constant
during the measurement window. We

expect that the lack of fine
-
grained traffic volume measurements
will
result in

inaccuracies between the auto
-
bandwidth simulator and the auto
-
bandwidth mechanism

deployed in the network
.
Figure
3
-
E

summarizes the simulation setup

for

the tunnel visibility system
.

1.5.2.3


Simulation Observables

After the execution of each scenario event, we output the
impact

of the even
t on the MPLS domain. On a
high
level, ch
anges in tunnel state are caused by (i) topology changes, (ii) traffic volume changes, and
(iii) periodic reoptimization. Tunnel state changes refer to
tunnel
(i)
reroutes
, (ii)
failures

to establish a
path
, and (iii)
setups

after failed attempts to establish a path
.
Figure
3
-
F

illustrates the MPLS domain as a
black
-
box

where network events result
in

tunnel state

changes.


In detail, topology updates can cause tunnels to (i) reroute, (ii) fail (i.e., go down), or (iii) re
-
establish
(i.e., come back up).
Auto
-
bandwidth events can cause a tunnel to (i) resize but stay on its path, (ii)
reroute, or (iii) fail to resize and stay on its path. We call the last case an
auto
-
bandwidth failure
. Finally,
a reoptimization event can cause tunnels to (i) reroute or (i
i) re
-
establish (
head
-
end routers

peri
odically
attempt to re
-
establish failed tunnels
). Additionally,
we perform a separate analysis of

a particula
r type of
reroutes and failures,
the reroutes and failures
of

lower
-
priority tunnels
because of
preemption
. After
each event,
we output the impact of the event

including

detailed information about the affected
tunnel(s), the path changes, the bandwidth reservation changes, and the inflicted preemptions.

Figure
0
-
F
:
The MPLS domain as a black
-
box of tunnel
dynamics
.

27


We elaborate on the

metrics
calculated
after

processing T
OTEM’s output
for each daily

scenario
.

We
focus on tunnel reroutes and failures
and
on
the amount of traffic impacted by these events
. We omit
tunnel setups because they follow the tunnel failures. Thus, they do not provide any additional insights
on the t
unnel dynamics. Finally, we
analyze

tunnel preemptions because of their impact on low
-
priority
traffic. Note that preemptions follow the break
-
before
-
make paradigm. Thus,

when a tunnel is
preempted, its traffic is temporarily disrupted.

Table
0
-
II

summarizes the simulation observables:



Tunnel Reroutes
: Reroutes represent
tunnel path changes
in

the MPLS domain. There are four
causes for a tunnel to change

paths: (i) topology update (
OSPF reroute
), (ii) auto
-
bandwidth
event (
AutoBw reroute
), (iii) tunnel reoptimization (
ReOpt reroute
), (iv) tunnel preemption
(
p
reemption reroute
).



Reroute Impact
: The reroute impact represents the amount of traffic that is
affected by the
reroute. When auto
-
bandwidth is enabled, the impact is reflected
by

the tunnel’s bandwidth
reservation.
Therefore, we estimate the impact of the reroute
using

the bandwidth reservation of
the affected tunnel.
We further discuss the impact o
f a tunnel reroute
on the data plane
when we
analyze the dynamic traffic
-
engineering in
Section
1.6
.



Tunnel F
ailures
: Tunnels experience
hard

and
soft

failures. In hard failures, tunnels fail to find a
path and go down after (i) a setup attempt (
ReEstabl failure
), (ii) a topology change (
OS
PF
failure
), (iii) being preempted (
p
reemption failure
). In soft failures, we include the auto
-
bandwidth failures (
AutoBw failure
) where a tunnel maintains its path but fails to resize.



Failure Impact
: The failure impact represents the amount of traffic t
hat is affected by the failure.
Again, when auto
-
bandwidth is enabled, the impact is reflected by the tunnel’s bandwidth
reservation.



Tunnel P
reemptions
: There are four causes for a low
-
priority tunnel to be

preempted by a higher
-
priority tunnel: (i) high
-
priority tunnel setup or re
-
establishment (
ReEstabl preemption
), (ii) high
-
priority tunnel reroute because of a topology change (
OSPF preemption
), (iii
) high
-
priority tunnel
bandwidth adjustment (
AutoBw preemption
), (iv)
high
-
priority tunnel reroute becaus
e of a
reoptimization event (
ReOpt preemption
).

28


Table
0
-
II
:
Simulation Observables

Observable

Sub
-
Classification Based on Root
-
Cause

Tunnel
Reroutes

OSPF, AutoBw, ReOpt,
p
reemption



Reroute Impact

OSPF,
AutoBw, ReOpt,
p
reemption


Tunnel
Failures

ReEstabl, OSPF,
p
reemption, AutoBw

Failure Impact

ReEstabl, OSPF,
p
reemption, AutoBw

Tunnel
Preemptions

ReEstabl, OSPF, AutoBw, ReOpt


1.5.2.4

Verification Results

We verify the accuracy of the tunnel visibility system by comparing the TOTEM output with the various
real network measurements
collected
during the simulat
ed
period. The verification process includes
multiple step
s. First, we verify the AutoBw S
imulator b
y comparing the simulated auto
-
bandwidth events
with the real auto
-
bandwidth events. Then, we verify the SPF and CSPF algorithms by comparing
the
TOTEM tunnel paths with the real tunnel paths. Finally, we verify the TOTEM functionality as a whole by
compar
ing the aggregate values of the TOTEM observables per day with the corresponding values
inferred from the real network measurements.

We find that the auto
-
bandwidth mechanism cannot be accurately
simulated

unless
fine
-
grained

traffic
volume measurement
s a
re

available.
Given our dataset, the AutoBw Simulator predicts 80% of the real
bandwidth adjustments.
The
SPF

and CSPF algorithms are
deterministic except for their tiebreaking
process

that

results in 5%
unpredictable

paths
.
These two factors, along with t
he tunnel reoptimization
order analyzed next
,
challenge the offline tunnel path calculation
. However,
the general levels of tunnel
activity
, given by the total number of
reroutes and failures

within a day
,
match the levels

predicted by t
he
tunnel visibilit
y system
.


AutoBw Simulator
: We compare the number of auto
-
bandwidth adjustment even
ts observed in the
SYSLOG messages

with the number of auto
-
bandwidth adjustment events generated by the
AutoBw
S
imulator on a daily basis. We do so only for the month when
auto
-
bandwidth is fully enabled throughout
the network. From the SYSLOG
messages
, we exclude the underflow and overflow events because the
granularity of the traffic volume measurements does not allow the observation of these events (the auto
-
29


bandwidth col
lection interval is lower than the measurement collection window). We also note that the
completeness of the traffic volume measurements varies per day. By completeness, we refer to
the
number of available traffic volume measurements per day in comparison
to a complete dataset (i.e., 288
5
-
minute measurements for each tunnel in the network)
.

Figure
3
-
G

illustrates the real auto
-
bandwidth adjustments, th
e simulated auto
-
bandwidth adjustments,
the matching events, and the completeness percentage of our measurements on a daily basis.
In
particular, we count a matching event when a SYSLOG bandwidth adjustment matches with a simulated
event in a time window o
f two adjust
-
intervals around the real timing of the event. When

excluding the
days with low
measurement completeness

(less than 75%), on average 80% of the real events are
also
predicted

by the AutoBw S
imulator. This percentage increases to 85% on days
26, 27, 28, when the
completeness percentage is almost 100%.

Even then, the AutoBw Simulator cannot predict all auto
-
bandwidth adjustments because (i) it is not aware of the initial tunnel bandwidth reservations, (ii) the
exact timing of the adjust
-
interva
ls

for each tunnel
, (iii) the
auto
-
bandwidth mechanism uses more fine
-
grained measurements than the ones available in our
per
-
tunnel

traffic

volume dataset.

We conclude that the AutoBw S
imulator yields reasonable accuracy given the granularity of the
measu
rement data
that

serves as input to the simulator. We also investigate the simulated auto
-
bandwidth events
that

do not match with a
ny

r
eal auto
-
bandwidth event
and we find that these events
are very close to the thresholds
that

trigger an adjustment. This
is
justified
because these events may
pass the
auto
-
bandwidth
thresholds in
simulation but not in
reality.
It

is
for

these borderline cases that
we need
fine
-
grained

measurements in order to accurately predict the behavior

of the mechanism
.

30





Figure
0
-
G
:
Verification of the AutoBw S
imulator
. We show the real bandwidth adjustments as inferred
by the SYSLOG messages, the bandwidth adjustments predicted by the AutoBw Simulator, and the
bandwidth adjustm
ent events which are present in both datasets. We also show

the

traffic volume
measurement completeness.


31


SPF and CSPF algorithms
: The second step towards verifying the tunnel

visibility system compares

the
routing algorithm implementations. First, we compare the SPF
implementation by comparing
the TOTEM
tunnel paths with the OSPF paths for 30 snapshots during the first month
. During this time,

auto
-
bandwidth is not fully enabled and the routers signal n
ominal bandwidth reservations.

Because of the
static nominal bandwidth reservations, t
unnels take the shortest path to their destinat
ion routers. We
obtain these
paths from the network
-
specific OSPF monitoring tool. In the case of ECMP, the tool returns
mu
ltiple paths. We c
onsider

it
a match when the TOTEM path matches one of the ECMP paths. We find
that TOTEM paths match 100% the OSPF paths. Therefore, the SPF algorithm implementations
in the
operational environment

and
in the tunnel visibility system

are
consistent.

Then, we compare the CSPF i
mplementations by comparing
the TOTEM tunnel paths with the real paths
that tunnels take in the network, as given by the tunnel path measurements, for 30 snapshots during the
second month when auto
-
bandwidth is fully
enabled. We assume that each tunnel holds a bandwidth
reservation equal to the tunnel’s traffic volume at the time
of the snapshot
. Again, even though auto
-
bandwidth is enabled, tunnels are expected to take the shortest path to their destination routers be
cause
the network
has enough spare capacity to accommodate multiple concurrent network failures
. Thus,

all
tunnel bandwidth requirements can be fulfilled
by

the shortest path
s
.

Figure
3
-
H

shows the percentage of the path matches, the path mismatches, and the paths that
we do
not verify

because the tunnel path measurements do not include the corresponding information.
On

average
, 95% of the tunnel paths e
stimated by the TOTEM coincide with the real tunnel paths in the
network for the time snapshots under investigation
. We
look into

the
path
misma
tches and we find that
the path

reported by TOTEM and the
path extracted from the
tunnel path measurements are e
qual
-
cost
paths. This is justified because the first CSPF tiebreaker depends on
the
exact tunnel r
eservations in the
network. T
he simulated and real tunnel reservations

do not match exactly
because of

the granularity of
the
traffic

volume measurements and
the assumption we make for the initial tunnel reservations.
Thus
,
the
CSPF algorithm implementations are consistent but
, for this network,

the
CSPF tiebreaking
in
conjunction with dynamic traffic
-
engineering
reduces the determinism of the tunnel

setup

up
to

5%
.

32


TOTEM functionality
: The last step for verifying that the tunnel visibility system yields similar behavior
with the real network includes comparing the number of
reroutes
,
failures
, and
preemptions

observed in

the

TOTEM simulations with the same numbers
as extracted
from

the

real network

measurements
.
Specifically, w
e infer the real values of the

three metrics from the MPLS
-
TE SYSLOG
messages

and the
tunnel path measurements. Given that the simulated auto
-
bandwidth events include events
that

do not
match with real auto
-
bandwi
dth events, we do not expect these numbers to match

100%
.

Our analysis includes aggregate values over the course of a day for the month when auto
-
bandwidth is
enabled throughout the network. We exclude a few days
when

our measurement dataset is less than
75% complete. We elaborate on the challenges involved in collecting complete measurement datasets in
Section
1.5.4
. Finally, we note that we run the simulations usi
ng five different sequences of
tunnel
reoptimization
. We analyze the impact of this parameter next. In the rest of this section, when we
present TOTEM numbers, we
present the

average numbers over all five simulation sets.

Reroutes
: In
Figure
3
-
I
, we plot

(i) the number of SYSLOG
messages

noting reroutes, (ii) the number of
tunnel path changes observed in the tunnel path measurements,
and
(iii) the number of TOTEM reroutes
.
We make the following observations: 1) TOTEM reroutes exhibit the same trend with the reroutes in the
real network but TOTEM overestimates the number of reroutes
. 2
) The

number of SYSLOG
messages

and
Figure
0
-
H
:

Verification of the

CSPF algorithm.


33


tunnel path chan
ges are very close to each other but do not coincide. This shows that none of these
measurement datasets is 100%
complete
.

Failures
: In

Figure
3
-
J
,

we plot

(i) the number of SYSLOG
messages

noting when a tunnel is down,
and
(ii) the number of TOTEM tunnel f
ailures.
We observe that TOTEM overestimates the number of tunnel
failures but fo
llows

the actual
network behavior. We further investigate the tunnel failures predicted by
TOTEM and
we find that the vast majority of
these
tunnel failures
occur because of

router maintenance
operations and not because of
insufficient

capacity.
When a rou
ter is taken off the OSPF topology, all
tunnels originating from and destined to this router go down.
However, the tunnel traffic is not impacted
because it has already been rerouted to other tunnels by routing mechanisms external to the MPLS
domain. Never
theless, i
n this case,
the router under maintenance may not report the tunnel failures
to
the

SYSLOG
server
. Th
is
likely explains

the
discrepancy between the tunnel failures predicted by TOTEM
and the ones reported by the
SYSLOG
messages
.




Figure
0
-
I
:
Normalized number of reroutes per day in reality and in simulat
ion.

The values are
normalized to the maximum value, TOTEM reroutes on day 17.

34



Preemptions
:

In

Figure
3
-
K
,

we plot

(
i)
the number of SYSLOG
messages

noting when a

tunnel has been
preempted, (ii
) the number of TOTEM tunnel preemptions
, (iii) the nu
mber of SYSLOG
messages

noting
when a tunnel has been fast
-
rerouted (FRR)
.
We explain why we show the FRR SYSLOG
messages

next.
We observe that
tunnel preemptions are not frequent. Also, we observe that the
preemptions
projected
by TOTEM
do not match with
the preemptions
observed
in the real network

on days 9 and 28
.

On these
same days, we observe a high number of SYSLOG FRR
messages
. We find that this happens because
TOTEM misses the preemptions caused by
FRR
. In detail, when a tunnel is fast
-
rerouted, it
may preempt
lower
-
priority tunnels on the backup path
.
However, since we do not simulate
FRR
,

TOTEM
underestimates the number of preemptions

that

take place after links go down.
Additionally, we note that
the number of preemptions
varies significantly per
simulation set. Note that each simulation set runs with
a different tunnel reoptimization order
.
We present more details on this in the following Section
1.5.3
.




Figure
0
-
J
:
Normalized number of failures per day in reality and in simulation.

The values are
normalized to the maximum value, TOTEM failures on day 15.

35


To summarize, the tunnel visibility system provides good estimates of the total number of tunnel reroutes
that take place in a specified time window. But it does not provide good estimates of the total number of
tunnel failures and preemptions because it

does not simulate the FRR mechanism. However, the FRR
mechanism can be easily included in the tunnel visibility system for the purpose of investigating the fault
-
tolerance of the network. Next, we investigate the impact of the tunnel reoptimization order
on the
network
-
wide tunnel dynamics.








Figure
0
-
K
:
Normalized number of
preemptions

per day in reality and in simulation.

The values are
normalized to the maximum value, SYSLOG FRR alarms on day 9.


36


1.5.3

Tunnel Reoptimization Order

After verifying the accuracy o
f the tunnel visibility system
, we investigate the impact of the tunnel
reoptimization order on the MPLS deployment. The order that the per
-
tunnel reoptimization timers expire
is a factor
that

cannot be contr
olled by the network operators. This

order could potentially be
defined

by

a centralized system
that

controls

the tunnel setup in a network
-
wide manner.

We investigate the impact of the tunnel reoptimization order by running the
simulation

o
f

the two
-
month
period with five different
tunnel
reoptimization
orders: i) three different random orders (
O1
-
random
,
O2
-
random
,
O3
-
random
), ii)
before

each reoptimization, TOTEM orders the tunnels based on their priority
and bandwidth reservation (
O4
-
minF
irst

and
O5
-
maxFirst
). In
O4
-
minFirst
, the tunnel with the lowest
priority and lowest bandwidth reservation is reoptimized first and so on. In
O5
-
maxFirst
, the tunnel with
the highest priority and highest bandwidth reservation is reoptimized first and so o
n.
O4
-
minFirst

and
O5
-
maxFirst

assume that there is
a

centralized system with knowledge of all the tunnel priorities and
reservations and that it controls the reoptimization process throughout the network.

In
Figure
3
-
L
,

we
present the impact of the tunnel reoptimization order on the number of reroutes, the
reroute impact, and the number of preemptions. Tunnel failures are not impacted by the tunnel
reoptimization order because tunnels
fail

due to topology changes
that

effect
iv
ely are maintenance
operations. I
n these cases, the head
-
end router cannot find any path to the tunnel
’s destination

as the
destination is not available
. Therefore, we do not include
tunnel failures

in our results. In detail, we show
values
that

are norm
alized to the
average value

of the metric for the three random orders
O1
-
random
,
O2
-
random
,
O3
-
random
. We make the following observations:

1.

In terms of tunnel reroutes,

O4
-
minFirst

and
O5
-
maxFirst

do

not significantly
differ
from the
random orders
O1
-
random
,
O2
-
random
,
O3
-
random
.
O4
-
minFirst

only causes
1
%

more reroutes
than the random orders
, and
O5
-
maxFirst

only causes 2% less reroutes than the random orders
.

Also, we observe that the reroute impact is not affected by the tunnel reoptimization order.