Goals and vision:

blackstartΔίκτυα και Επικοινωνίες

26 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

105 εμφανίσεις

Goals and vision:

In the last three decades,
connectionless
packet
switching technologies

have

been
widely deployed in
enterprise LANs and WANs to form
the current
-
day Internet.
While
t
he
Internet uses the
communication link infrastructure of the circuit
-
switched telephone
network
,
by using leased circuits as links between IP routers
,
the
IP

routers

forward

packets from one network to another

on a connectionless basis. In this mode of operation,
no bandwidth is reserved
for any single
communication session


prior to
the
start of
data
transfer
.

Bandwidth sharing on the Internet is largely controlled by TCP, which runs at
end hosts. Observing
the rate of
packet losses
and/or changes in round
-
trip delay, TCP
sender
s

adjusts
their

sending rate
s
. The bandwidth s
hare of any single session depends
largely on the number of concurrent sessions and the sending rates on these sessions.

In contrast
, connection
-
oriented networking technologies, whether circuit
-
switched or
packet
-
switched, allow an end application to res
erve bandwidth for
its
own
communication session prior to data transfer. The only connection
-
oriented network that
supports such a high degree of bandwidth sharing, where applications can request
bandwidth reservations for individual communication ses
sions, is the telephone network.
The amount of bandwidth reserved per session is small, 64kbps, and more importantly, is
uniform for all calls.
Over the last three decades, we have experimented with different
types of connection
-
oriented d
ata networks
, suc
h as X.25 networks, ATM networks, and
now, MPLS and GMPLS
1

networks.
There were concerted efforts to extend ATM
networks to the desktop and create end
-
to
-
end ATM virtual circuits (VCs). When these
proved too expensive, efforts centered on
creating “partial

VCs
,” by which we mean
virtual circuits that extend across only part of the end
-
to
-
end path. The thinking was that
a
utomatic flow classification techniques at IP routers
could be used
to redirect packets
from long
-
lived flows to
these
dynamically setup
pa
rtial
ATM VCs [ref: Ipsilon Peter
Newmann paper from Infocom 96].
How
ever, prediction of flow
length is
difficult
,

and
hence

automatic flow
-
length detection and flow redirection to ATM networks were
not

realized in practice.

IP routers
in

the
current
Inte
rnet2 backbone, Abilene, and DOE’s ESNET
have built
-
in
MPLS capabilities. Projects, such as Abilene’s BRUW [ref] and ESNET’s OSCARS
[ref], provide end users web
-
based access to centralized schedulers for advance
reservation of MPLS
VCs
.
These projects diff
er from the previous partial
-
connections
ATM effort in a significant way. Instead of requiring automatic flow classification, the
end user specifies flow identifiers
for which the

advance
bandwidth
reservation

is being
requested
. This data is used by the c
entralized scheduler to configure the ingress IP router
to map packets from only the specified flow (s) on to the partial MPLS
VC
, whose setup
is triggered by the centralized scheduler just before the agreed
-
upon reservation time.

In this proposal, our goa
l is carry the work done in the BRUW and OSCARS projects
further by answering two questions: (1) how does an end user application that wants to
communicate with a remote host determine whether it needs to reserve bandwidth on any



1

ATM: Asynchronous Transfer Mode;
MPLS: MultiProtocol Label Switching; GMPLS: Generalized
MPLS

link of the end
-
to
-
end pat
h, and if so, on which links should it reserve bandwidth. (2)
what security and pricing mechanisms are required to support immediate
-
requests for
bandwidth, rather than advance reservations. While the BRUW and OSCARS projects
provide centralized scheduler
s to manage requests for advance reservations of bandwidth
on the Abilene and ESNET networks, these projects do not attempt to answer the
question of how an end user decides that he/she needs to make such a reservation. They
are designed primarily for scie
ntists, who require high
-
bandwidth connectivity, and/or
rate
-
/delay
-
controlled connectivity for remote instrument control.

Our goal is to extend this work for more commonplace applications that would scale
to millions of users. The process of determining
where and how much bandwidth to
request should run unbeknownst to the end user of the application. Our answer is based
on the observation that m
any links in current LANs and WANs are lightly loaded, while
often enterprise access links are heavily loaded. F
or example, the University of Virginia
campus LAN consists almost entirely of GbE
2

switches with light
ly

load
ed
links. The
campus is however connected to Internet2 via an OC3 link, whi
ch is heavily loaded
during workday

hours. Traffic weather maps of the I
nternet2 connector and backbone
links show fairly light loads [Ref: web site]. This variable loading on different links
provided us the insight that perhaps it would be sufficient to reserve bandwidth for
specific communication sessio
ns only on heavily loa
ded links rather than on every link of
the end
-
to
-
end path.

An analogy for “partial
VCs
” exists in the transportation world. A typical airline
travel
er uses the (connectionless) roadways system
3

to get to an airport, but occupies a
reserved seat on a fli
ght (connection
-
oriented), before trave
ling across another
(connectionless)
set of roads to reach the destination. Analogous to the explicit
reservation made for the flight by the
travel
er

is the notion of providing the centralized
scheduler
an identifie
r for the

flow

for which the partial MPLS VC is being reserved
.

The analogy
seems to break

down when considering the
location of
partial path. In the
transportation world,
seemingly reservations are required
for
the
long
-
distance
portion of
the end
-
to
-
end

path
. But in the data communications world,
our intuition seems to
indicate a need for VCs on

highly loaded links.

[Rick: should we work in the black cloud
experiment an even suggest using VCs for long
-
distance paths


but then TCP needs to
be terminated
at the edges of the VC]
. In both cases, reservations are required on the
expensive segments. Just as we cannot afford to run constant flights and allow travellers
to simply wander into airports and grab a seat as needed, the WAN access segments are
clearly

loaded because of the expense involved in upgrading them. Hence it appears that
the expensive segments are the ones that need support for reservation
-
based service.


As Modiano notes in [ref], packet switching has the advantage under light loads while
c
ircuit switching has the advantage under heavy loads.
Consider for example, the use of
TCP for bandwidth sharing in connectionless packet
-
switched networks. Under heavy
loads, either losses are incurred in which case retransmissions, (which wastes bandwidt
h)



2

GbE: 1 Gigabit/sec Ethernet

3

The roadways is deemed “connectionless” because one does not make a reservation for a timeslot along a
road before driving on to it.

are required, or rate reductions may be over
-
aggressive

causing
momentary
underutilization

of links. Prior partitioning of bandwidth to circuits/VCs under heavy
loads could therefore lead to lower delays and better utilization.
On the other han
d, on
lightly loaded links, partitioning off a small amount of bandwidth for a particular flow
will lead to longer transfer delays than if that flow was allowed to r
un freely and enjoy as
much of the link bandwidth
as possible. We propose to run simulatio
ns to gain a
quantitative understanding of the advantages and disadvantages of these two bandwidth
sharing techniques. Based on this intuition,
we contend that reservations for bandwidth
should be made primarily on heavily loaded links, analogous to the us
e of High
-
Occupancy Vehicle (HOV) lanes on heavily loaded roads.

Our

proposed method

is to design a
gridlock
-
potential
determination
and bypass
(GPDB)
server
, ideally one per IP router, that collects measurements on all the router
interfaces and
determines

whether a particular link
is likely to become

a gridlock point
for a particular flow (
with a
specified bandwidth and duration).
A
definition of “a
gridlock
-
potential

link” in our context (with respect to a particular call request for a
specific bandwidth,

duration, and other QoS parameters, such as delay, loss and jitter) is a
link with the following property: the probability, that the QoS available on the link

if

it
were not reserved for the call

falls below the required levels at some time during the
cal
l, exceeds some probability
-
threshold value (say 0.001). In othe
r words, a link is a
gridlock
-
potential
link if it is likely to violate the call QoS requirements. We start by
limiting QoS to just bandwidth.

An important point to note in the above definiti
on of a
gridlock
-
potential
link is that
there could be multiple
such

links on an end
-
to
-
end path. An analogy can again be drawn
in the roadways transportation world. Consider a driver
headed from southern Virginia to
Connecti
c
ut. Such a driver is likely t
o be caught in gridlocks on the highways around
Washington DC
(e.g., the Beltway) and around

NYC. A “gridlock
-
potential
determination server” for the Beltway can independently answer the question of whether
or not there is
a potential for
gridlock

when the

driver arrives

on the Beltway. For
example, it can answer a query such as “Am I likely to require more than 20 minutes to
drive across a particular 20
-
mile stretch of the Beltway?”

If a link is identified as being a gridlock
-
potential
link, the server wo
uld initiate the
setup of a bandwidth
-
reserved circuit/VC

to bypass the gridlock if such a path is
available
.
Hence we

call
our
architecture

Gridl
ock
-
Potential Determination and Bypass
(GPDB
)
, and
design

a highly distributed (and hence scalable)
implementa
tion of GPDB
.

T
he
second question
, of supporting immediate
-
requests for bandwidth rather than
advance reservations (as in BRUW and OSCARS),
is

motivated by

providing

more
commonplace
applications
rather than those envisioned for high
-
end scientific needs.

For
example, we consider file transfers, video
-
telephony and Internet gaming.
The control
-
plane protocols for signaling and routing in MPLS and GMPLS networks, are designed
for highly distributed implementations. These protocols currently do not support a
dvance
reservations. For example, the RSVP
-
TE signaling protocol
4

for both MPLS and GMPLS
networks do
not have parameters to specify
a future start time or call duration. Therefore



4

RSVP
-
TE: Resource reSerVatio
n Protocol with Traffic Engineering

the BRUW and OSCARS advanced reservation systems cannot use the bandwidth
m
anagement capabilities of the RSVP
-
TE engines built into IP routers. This functionality
is
instead

centralized for all links in the Abilene and ESNET networks into
BRUW and
OSCARS

scheduler
s, respectively
. RSVP
-
TE signaling is used for provisioning of the
MPLS virtual circuits just prior to the reservation time.
Given our focus on scalability, we
choose to demonstrate
immediate
-
request

types of calls in the “partial
VC
” context, and
relegate advance
-
reservation calls to future projects. We expect immediate
-
request calls
from applications such as file transfers and video
-
telephony
, and that these calls will

be
short
-
lived. High call handling throughput will thus be necessary, making centralized
bandwidth
-
management schedulers unsuitable.

A major concern
of ne
twork providers to allow

such immediate
-
requests for
bandwidth is security.
In

the BRUW and OSCARS centralized

advance
-
reservation
schedulers,

authentication and authorization mechanisms
are a
pplied to verify the identify
of the user requesting the
advance

reservation

before accepting the request. With
applications, running RSVP
-
TE clients

that send in requests for bandwidth unbeknow
n
st
to users, security becomes a major concern. Having encountered this issue in the
CHEETAH network (which supports immediate
-
request calls with RSVP
-
TE signaling
from end hosts), we designed an IPsec based solution in which host
-
level authentication
is provided [ref]. We plan to apply this IPsec solution or one based on SSL
5

if

user
authentication proves important to address t
his very important aspect.

In addition, given
the FIND goals to create designs that will foster investment through
economic means, we will research pricing issues in connection
-
oriented networks.
For
example, in telephone networks, call
-
related

data

is co
llected for later billing. We plan to
investigate whether there are intrinsic pricing
-
related advantages in connection
-
oriented
networks (where users signal intent to use before actual usage) relative to CL networks.

This vision of making bandwidth reserva
tions only on gridlock
-
potential

links feeds
into plans for creating hybrid networks in both the HOPI
6

and CHEETAH testbeds. The
HOPI testbed was built to encourage experimentation into the use of both packet
-

and
circuit
-
switched networks. The main networ
king nodes of the HOPI testbed are GMPLS
-
enabled Ethernet switches with IEEE 802.1q virtual LAN (VLAN) and 802.1p QoS
capabilities. These switches connect to IP routers on the Abilene network, which makes it
an ideal testbed for our
GPDB

architecture
.
We a
re currently planning to connect the
CHEETAH testbed into HOPI, which will allow for interesting heterogeneous VC testing
(given CHEETAH is SONET based).

An important goal for this project is to
inform future Internet architectures
. We
will answer questio
ns such as “should connection
-
oriented service be supported in (a)
end
-
to
-
end mode or in partial VC mode, (b) in immediate
-
request mode or advance
-
reservation mode (c) in the form of multiple classes of service (varying

bandwidth
levels)?” Does CO service

help with the problems of security and pricing encountered in
today’s CL Internet? When is the CL mode appropriate and when are connections useful?




5

SSL: Secure Sockets Layer

6

HOPI: Hybrid Optical and Packet Infrastructure.


Figure
1
:
An example
GPDB

architecture

with two enterprise networks connected to

a WAN

GPDB

Architecture
:

We propose using basic RFC 2205 RSVP to signal requests for bandwidth since this
protocol has been standardized for exactly this purpose.
An RSVP signaling client runs
on end hosts participating in this architecture. A library fun
ction to issue RSVP requests
is embedded into applications such as
the Unix
very secure file transfer

protocol (
vsftp
)

for file transfers. An external bandwidth requestor process is loaded into the end hosts for
applications such as games, whose source cod
e is unavailable for modification. The
RSVP client is configured with the IP address of the closest
GPDB

server. In most
enterprises, this is likely to be required only at the WAN access router as shown in
Enterprise II of
Figure
1
. However, for generality, we show that it could be placed even
at an Ethernet switch as in Enterprise I of
Figure
1
.
Within WANs,
GPDB

servers can be
associated with any router that has links that could potentially become gridl
ocked. For
example, if a router has an OC192 interface that is far from being loaded (e.g., some of
the Abilene links [ref to traffic weather URL]), then a
GPDB

server is not required for
this interface. Links could be shared between connectionless (CL) an
d connection
-
oriented (CO) modes or a separate CO network could be available as shown in the
example WAN of
Figure
1
. IP routers equipped with MPLS engines effectively have
links that can be shared between the CL and CO modes.

GP
DB

servers run a neighbor discovery protocol and compile next
-
hop
GPDB

server
data corresponding to different destination addresses. This allows
GPDB

servers to
communicate with each other to determine if gridlocked links are concatenated, in which
case a
multiple
-
link VC can be provisioned with a bandwidth allocation.

When a vsftp or web server receives a request for a file download, it can determine
whether the file size and round
-
trip delay warrant the use of bandwidth
-
reserved VCs
(see work on routing d
ecision algorithms in the CHEETAH project


ref). If it decides
that a VC is worth attempting, the vsftp/web server, through its rsvp library, sends an
RSVP Path message to the first
GPDB

server.

For example
, assume a host in enterprise
I
I
of
Figure
1

runs the web server that decides to attempt a VC setup. It signals the
GPDB

server II.1, this having being configured as the nearest
GPDB

server to the host. The
GPDB

server II.1 extracts the destination IP addresses from the RSVP Pa
th message, and
checks whether the outgoing interface
corresponding to

the destination
could potentially
become

a gridlock link within the duration of the call. This requires us to add an object to
carry a new parameter, call duration, in the RSVP Path mes
sage. If the link is likely to
be
come

a gridlock, the
GPDB

server triggers (
e.g.
by sending an appropriate CLI
7

command) VC setup on the enterprise access link. As this is an
inter
-
domain link, we
expect this

VC to be a single
-
link VC. If VC setup is succ
essful, the
GPDB

server issues
a second CLI command to configure the IP router to
filter out

IP datagrams
corresponding to a particular flow (extracting the source and destination port numbers
from the RSVP Path message)

and map these to the newly establis
hed VC. Thus packets
from this flow are isolated and given the QoS treatment requested.

Our preliminary security
solution
is to run RSVP on an IPsec tunnel between end
hosts and their corresponding
GPDB

servers. Furthermore, as RSVP message parsing
capabi
lity is built into the
GPDB

servers, they can perform authorization based on source
IP address for the QoS requested. This is similar to the Intelligent Network (IN) design
used in telephone networks, where user parameters are extracted from SS7 ISUP
8

mes
sages and the source telephone number validated against an accounting database
before call setup proceeds.

E
ach
GPDB

server
independently decides

whether the interface on which packets
from the flow will be routed if a VC is not established

is a gridlock
-
p
otential link
,
as
RSVP signaling proceed
s

hop
-
by
-
hop (
GPDB

server hop to
GPDB

server hop). For
example,
if

the
GPDB

server III.2 finds that
the link on to which
packets from the flow
will be routed is not likely to be
come

a grid
lock, it simply
sends the RS
VP signaling
request to
GPDB server III.1 without creating a bypass VC. The latter

makes the decision
on whether a bypass is required for the access link to enterprise I

(assuming the
vsftp/web client is located at that enterprise). If
,

on the other hand,

GPDB

server III.2
finds the outgoing interface has a potential to be gridlocked then seeing that it has a
bypass path through a separate CO network to the router with which the next neighbor
GPDB

server en route to the destination is located, it sends the

RSVP request on to
GPDB

server III.1 with an additional parameter,
the

IP address

of its interface to the CO
switch
. This will allow the
GPDB

server III.1 to initiate VC setup from its router to the
router interface
indicated in the RSVP message. This int
ra
-
domain VC thus bypasses the



7

CLI: Command Line Interface

8

SS7 ISUP: Signaling System No. 7 ISDN User Part

two router links. Perhaps only the first

one has the potential for grid
lock. Nevertheless if
the VC path is available, it will be useful

to setup the VC because one of the two intra
-
domain links is a gridlock
-
potential link
.
Note that if VC setup fails on any bypass
attempt
, the request is denied and the requestor receives a
call failed

error message.
9

Thus, piecemeal bypass VCs may be set up to avoid gridlocked links. The RSVP Resv
message is sent back in the reverse directio
n through the chain of
GPDB

servers. This
message can be held up if the bypass VC on any path is not fully established. This allows
for a pipelining of the forward RSVP Path signaling with VC setups. Any reduction in
VC setup delay overheads is useful to i
mprov
e

utilization and reduc
e

total delay.

A

GPDB

server

run
s

the following processes:



Measurements process
, which process
es

the
SNMP and Netflow
measurement
data to
derive metrics useful to the gridlock
-
potential determination process,



Gridlock
-
potential
determination process
, which runs the algorithm described
in XX (modeling section),



RSVP
process
, which parses and constructs RSVP messages
with the additional
GPDB

parameters



Router
-
interface process
, which i
nterfac
es

with the IP routers

using CLI

to
trig
ger VC setup/release,
map

particular flows to

VCs
, set CO
-
bandwidth
threshold
, read Netflow data and IP routing tables, and using SNMP
10

to read
management information bases (MIBs),



Neighbor
GPDB

server discovery

process



VC
-
triggering process,
which will ge
nerate the RSVP
-
TE messages to the
external CO networks if such connectivity is available (e.g.
,

in the WAN in

Figure
1
)

Note that
GPDB

servers do not perform any bandwidth management functions. They
only receive

and parse
RSVP messages

to extract destination, source and QoS
parameters, such as bandwidth
. Bandwidth management is strictly

performed by the
RSVP
-
TE engines running at the MPLS/GMPLS switches.

The Utility Optimizing Server (UOS) is used to set CO
-
bandwi
dth threshold
parameters on the links that offer both CL and CO services. In practice, we plan to
exploit the label stacking feature of MPLS to implement this threshold. An MPLS VC
with a bandwidth equal to the CO
-
bandwidth threshold is set on interfaces t
hat participate
in the
GPDB

architecture. This VC can then be configured as an “interface” at the router
allowing

for inner MPLS VCs to be set within

this outer VC. Algorithms run at the UOS
are domain
-
wi
de to optimize utility across the

network. Details a
re described in Section
XX (modeling).




9

An important point about “call failure” in the context of th
is work is that it is only the reserved capacity
that fails. The application could still proceed using standard IP CL service in perhaps a somewhat degraded
fashion.

10

SNMP: Simple Network Management Protocol

Experimental
track
:

We plan to use a combination of
Abilene and the HOPI

testbed to test our
GPDB

architecture. Given that the loading on Abilene backbone links is currently light, we plan
to p
rovision
100Mbps
MPLS
virtual circuits between
strategically selected Abilene

IP
routers
, and limit
traffic routed to these VCs to
datagrams

sourced from and destined to
the
GPDB
-
experimental subnets (which will be located at the four institutions). This is
illustrated in
Figure
2
.
For example, traffic between

GPDB

hosts at
UVA and G.Tech
will be routed on the provisioned VC between Washington, DC, and the Atlanta routers,
while traffic between
GPDB

hosts at
CSU and SLAC will

be routed
on a VC between t
he
Sunnyvale and Denver routers
. We
plan

to
run multiple flows between pairs of
GPDB

hosts so that the
virtual
-
circuit
links

allocated to our project on Abilene backbone links
become

highly loaded.
This would allow the measurements and
GPDB
-
determination
a
lgorithm to require bypassing the CL path with a VC setup on
-
demand

through the
HOPI path
.


Figure
2
: Abilene
-
HOPI testbed for
GPDB

evaluation

[
Rick, Bob, Matt (Internet2 co
-
PIs): are there

better links on which to create MPLS
VCs

for overloading and rerouting to HOPI


e.g., Wash
-
NY
-
Chicago? Can we add
GPDB

hosts to the Abilene racks leveraging the Observatory opportunity?

How about the
GPDB

servers?
Would you prefer that these GPDB servers be located
in the racks at the Abilene
nodes?
The
VC
-
triggering
process
will
talk

directly
with

the
VLSRs in HOPI, but the CLI command
s

to the router
are
necessary to set the PBR
mapping of a flow to the HOPI interface VLAN. Also need to read routing table, SNMP
and Netflow databases.
The CLI c
ommands will be sent using SSH
. Would you like to be
listed as the developer team for the

router
-
interface process

within the GDB server?
]

[Russ: should we discuss applications in this experimental track?]

[Edwin, all: Should we not include the UOS as part

of the experimental track? Too
ambitious?]

[Warren: I continued using the word “server” rather than “services” because I wanted
to be specific to say we have one
GPDB

server per router that has links which could
potentially become gridlocked. Do you see h
ow we could fit in web services in this
description. I thought perhaps a provider may not want to offer gridlock identification as
a generic web service for others to access. Here the fact that a link is a gridlock is not
advertised back to the end host. T
his way service providers can keep the fact that their
links are heavily loaded to themselves. Thoughts?]

[Les: On measurements, do we have a raw sample of netflow or SNMP data? Do we
still need end
-
to
-
end measurements from iperf, thrulay? What data about
a specific link
can be construed from such measurements. Given this definition of bottleneck (gridlock)
link (per link, independent of others), how do end
-
to
-
end measurements help?

Russ
suggested that we should keep these as an option in case a service pro
vider refuses
Netflow data. In our previous version, I’d been thinking that the end host collects data
about the path, figures out itself which link should be bypassed and flies in a signaling
request to one edge of that link


ala the airlines model, wher
e the user has information
about all the segments and then sends out a specific request for a seat on a specific flight.
But in this version, I’ve presented it such that a provider does not explicitly tell the end
host whether or not its network has a grid
locked link. It silently bypasses it. Does that
make sense from a pricing point of view?
Does the user need to be told?
But
considerations of call setup delay made me want to do this in one round trip rather than in
multiple round trips


which would be ne
eded if the end host has to first collect
information on the bottlenecks and then fly in requests for reservations. I even had a bit
of pipelining between the VC setup and forward m
arch of gridlock determination.]

[Les: on end
-
to
-
end measurements, we can u
se these to cache data at end hosts to
make the preliminary routing decision on whether to even bother asking for a VC.
For
example, use TCP for small files.
In the cheetah project, we gained some experience
here.]

[Russ: for work load identification, coul
d we draw out the software modules we would
need to develop for the end hosts,

the
GPDB

server and UOS server? This would help
with the coordination piece


to establish to reviewers we have a nice coordinated story]


Results from prior NSF support:

Our r
ecent experience with GMPLS networks in the NSF
-
funded CHEETAH
11

project
demonstrated the value of these partial connections. The vision of the CHEETAH project



11

CHEETAH: Circuit
-
switched High
-
speed End
-
to
-
E
nd Transport ArcHitecture

was to cros
s
-
connect Ethernet links from end hosts to wide
-
area SONET circuits,
exploiting the d
ominance of Ethernet in LANs and SONET in MANs/WANs, to create
end
-
to
-
end (host
-
to
-
host) dedicated circuits. Using GMPLS signaling and routing
protocols, the CHEETAH offers a distributed dynamic call
-
by
-
call high
-
speed bandwidth
sharing mechanism. While th
is network is currently deployed and available to networking
and scientific researchers, our experience taught us that this vision of creating end
-
to
-
end
high
-
speed circuits entails high costs. .... hence partial VCs....