Abstract - University College London

mexicanmorningΔιαχείριση Δεδομένων

16 Δεκ 2012 (πριν από 4 χρόνια και 7 μέρες)

142 εμφανίσεις




A
BSTRACT

This paper reports on QoS experiments and
demonstrations done in the MB
-
NG and
DataTAG EU projects. These are leading edge
network research projects involving more that
50 researchers in the UK, Europe and North
America, concerned with

the development and
testing of protocols and standards for the next
generation of high speed networks.

We implemented and tested the Differentiated
Services Architecture (DiffServ) in a multi
-
domain, 2.5 Gbits/s network (the first such
deployment) definin
g appropriate Service
Level Agreements (SLAs) to be used between
administrative domains to guarantee end
-
to
-
end
Quality of Service.

We also investigated the

behaviour

of DiffServ
on High Bandwidth, high delay development
networks connecting Europe and Nort
h
America using a variety of manufacturer’s
equipment.

These quality of service tests also included
innovative MPLS (Multi
-
Protocol Label
Switching) experiments to establish guaranteed
bandwidth connections to GRID applications in
a fast and efficient way.

We finally report on experiences delivering
quality of Service networking to high
performance applications like Particle Physics
data transfer and High Performance
Computation. This included implementation
and development of middleware incorporated
in the

Globus toolkit that enables these
applications to easily use these network
services.


I.

INTRODUCTION


Last years have seen the appearance of a wide
range of scientific applications with extremely
high demands from the network. Once more
scientists require t
he network to be pushed to
the limits and challenge the idea that
bandwidth availability is not a problem any
more.

Unlike traditional applications like email,
WWW or even peer
-
to
-
peer systems, these
applications require reliable file transfers on
the orde
r of the 1Gb/s and have often tight
delay requirements. High Energy Physics,
Radio Astronomy or High Performance Steered
Simulations cannot achieve its goals on a
sustainable, efficient and reliable way with
current production networks almost totally
based

on a best
-
effort service model.

Although part of the networking community
believe that bandwidth

over
-
provisioning

will
always solve every network problem, our work
shows that Quality of Service enabled
networks provide a vital role to support high
per
formance applications efficiently,
inexpensively and with smaller additional
configuration work.


QoS performance has been studied
exhaustively through analytical work and
simulation (see for example [1]). These works
are of extreme importance and releva
nce but
they have two drawbacks. On one hand
simulation models have proved to be
incomplete [2,3] because they fail to represent
all possible real configurations of the network.
They also fail to account for several
implementation details of “real” network
s.
These include operating system tuning, driver
configurations, memory and CPU overflows,
etc. Therefore testbed networks play a vital role
in network research as a way of consolidating
technology and, through exhaustive debugging
and testing, provide imp
lementation guidelines
to the QoS network user community.


The work reported here used two testbeds with
different characteristics: A United Kingdom
testbed used in the context of the MB
-
NG
Providing Quality of Service Networking to
High Performance GRID Applications

Miguel Rio, Andrea di Donato, Frank Saka, Nicola Pezzi, Richard Smith, Saleem Bhatti, Peter Clarke
Networked Systems Centre of Excellence, University College Lo
ndon

project and a European/Transatlantic one used
in the context of th
e DataTAG project.


The Managed Bandwidth
-

Next Generation
(MB
-
NG) project [4] created a pan
-
UK
Networking and Grid testbed that focused upon
advanced networking issues and
interoperability of administrative domains.

The project addressed the issues which

arise in
the sector of high performance inter
-
Grid
networking, including sustained and reliable
high performance data replication and end
-
to
-
end advanced network services.


MB
-
NG testbed can be seen in Figure 1 It
consists of a triangle connecting RAL,
U
niversity of and London at OC
-
48 (2.5Gb/s)
speeds using CISCO’s 12000. Each of the edge
domains is built with 2 Cisco’s 7600
interconnected at 1Gb/s.



Figure 1: MB
-
NG Network




The DataTAG project [5] created a large
-
scale
i
ntercontinental Grid testbed involving the
European DataGrid project, several national
projects in Europe, and related Grid projects in
the USA. It involves more than 40 people and
among other things is researching inter
-
domain
quality of Service and high
throughput
transfers of high delay networks (making use
of an intercontinental link connecting Geneva
to Chicago which can be seen in Figure 2).


This paper is organized as follows. In the next
session we describe the Differentiated Services
Architecture a
nd some experimental results
implementing it in our testbeds. In section III
we describe MPLS and why it is a useful
technology for GRID applications. In section
IV we discuss Service Level Agreements
definition between administrative domains
followed by t
he description of Middleware and
Control Plane in section V. We describe some
GRID demonstrations using our QoS testbeds
in section VI and present conclusions and
further work in section VII.



Figure 2: DataTAG Network


II.

D
IFFERENTIATED
S
ERVICES


In the
last decade there has been many attempts
to provide a Quality of Service enabled
network that extends the current best
-
effort
Internet by allowing applications to request
specific bandwidth, delay or loss. These
included complete new networks like ATM [6]
or “extensions” to the TCP/IP like the
Integrated Services Architecture [7]. Both these
approaches required state to be stored in every
router of the entire path for every connection.
Soon was realized that the core routers would
not be able to cope and th
ese architectures
would not scale.


With the Differentiated Services Architecture
[8] a simpler solution was proposed. Traffic at
the edges is classified into classes and routers
in the core only have to schedule the traffic
among these classes. Typically
routers in the
core only deal with approximately 10 classes
making it easier and manageable to implement.
Routers at the edges may have to maintain per
-
flow state (specially the first router in the path)
but since the amount of traffic is several orders
of

magnitude smaller this is not a significant
problem.


However applications can only choose now in
which class they want to be classified into as
opposed to a complete traffic specification in
the IntServ architecture. This makes
experimentation in testbed
s crucial to
understand how applications should make use
of a DiffServ enabled network.


Manchester

RAL

London

UKERNA

2.5Gb/s

1Gb/s

Our first tests used
iperf

[9], a traffic
generation tool (which we used to generate
UDP flows) to understand the end to end
effects produced by such a network. We tes
ted
the implementation of two new services: Less
than Best
-
effort (LBE) and Expedited
Forwarding which will be of use by different
kinds of applications. Both classes were tested
individually against Best
-
effort traffic.


Less than Best
-
effort tests can be

seen in
Figure 3 where we injected traffic in two
classes increasing the offered load on both of
them simultaneously. Here Scheduling
mechanisms guarantee at least 1% of the link
capacity to LBE and the rest to normal Best
-
Effort. When there is no congest
ion both
classes share the link equally. When the link
gets saturated LBE traffic gets dropped and in
the extreme only 1% of the link capacity is
guaranteed to be given to LBE.



Lest Than Best Effort
0
200
400
600
800
1000
1200
1400
0
200
400
600
800
1000
1200
1400
1600
per flow offered load (Mbps)
Throughput (Mbps)
Best Effort
LBE

Figure 3: Less than Best
-
effort



Expedited Forwarding is the premium servi
ce
in the DiffServ architecture. Whatever the
congestion level of the link EF traffic should
always receive the same treatment. In our
example (see Figure 4) 10% of the link
capacity is allocated to EF and this is always
guaranteed. If EF traffic exceeds t
hat
percentage the remaining traffic is dropped or,
less frequently, remarked to lower priority.



Throughput OC-48 (EF-10%)
0
500
1000
1500
2000
2500
0
200
400
600
800
1000
1200
per flow offered load (Mbps)
Throughput (Mbps)
BE1+BE2+BE3 received
EF received

Figure 4: Expedited Forwarding


III.

MPLS

MPLS


Multiprotocol Label Switching [10]
has its origins in the IP over ATM effort.
Nevertheless soon people realized

that a label
switching technology could be extended to
other layer 2 technologies and complement IP
in a global scale.


MPLS is considered by some as a layer 2.5
technology since it resides between IP and the
underlying medium. It adds a small label to
ea
ch packet and the forwarding decision is
made solely based on this label. To establish
these labels a signaling protocol, like RSVP
-
TE [11], must be used. These signaling
protocols allow specifying explicitly the route
that the MPLS flows will take bypassi
ng
normal routing tables (see Figure 5). Here we
can see Label Edge Routers (LER) classifying
traffic into a specific label and Label Switch
Routers forwarding traffic according to these
labels.


MPLS has two major uses: Traffic Engineering
and VPNs


Virt
ual Private Networks. In this
work we were mainly concerned with the
former. The ability to switch based on a label,
as opposed to traditional IP forwarding based
solely in the IP destination address, allows us
to manage available bandwidth in a more
effic
ient way. Since GRID applications have
traffic flows orders of magnitude bigger than
traditional applications but will represent a
small percentage of the flows in the network, it
becomes cost efficient to select dedicated paths
for their flows. This way w
e can be sure that
the network complies with the QoS constraints
and that the bandwidth available in the network
is used on a more efficient way.


Figure 5: MPLS Example



Unfortunately at the time of writing MPLS
implementations we used do not allow MPL
S
traffic to be isolated in case of congestion.
Although the signaling protocol allows
specifying a bandwidth for the flow, this will
not be a guaranteed bandwidth in the case of
congested link(s). To force this bandwidth
guarantee, traffic need to be poli
ced at the edge
of the network.


In MB
-
NG we executed tests to verify if MPLS
could be used to reserve bandwidth for a given
TCP flow. In Figure 6 we can see the result of
reserving an MPLS tunnel for a given TCP
connection using explicit routes. Not only
we
can optimize the network bandwidth but we
could guarantee with very good precision a
400Mbits/s connection to our application (in
this case just simulated traffic with
iperf
). The
typical TCP saw
-
tooth

behaviour

did not
prevent us from having a very sta
ble TCP
connection in a congested network.



TCP flow over MPLS tunnel
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0
5
10
15
20
Time
Throughput
Single TCP connection
Background Traffic

Figure 6: MPLS for Flow Reservation


IV.

S
ERVICE
L
EVEL
A
GREEMENTS


An important issue in the implementation of an
end to end Differentiated Services enabled
network is the definition of Service Level
Agreements (
SLAs) between Administrative
Domains. Because individual flows are only
inspected in the edge of the network, strong,
enforceable agreements about the traffic
aggregates need to be made at each border of
every pair of domains, so that all the
guarantees ma
de by all the providers can be
met.


One of the goals of MB
-
NG as a leading multi
-
domain DiffServ experimental network is to
provide guidelines for the definition and
implementation of Service Level Agreements.


As our first definition, we are trying to
st
andardize the definition of IP Premium (or
EF


Expedited Forwarding [12] in the
DiffServ literature). This is to be used by
applications that require tight delay bounds.
The SLA is divided in two parts: An
administrative part and a Service Level
Specifica
tion part (SLS). The SLS contains
information about:




Scope
-

defines the topological region to
which the IP premium service will be
provided



Flow Description


This will indicate for
which IP packets the QoS guarantees of
the SLS will be applied



Performan
ce Guarantees


The
performance guarantee field depicts the
guarantees that the network offers to the
customer for the packet stream described
by the flow descriptor over the topological
extent given by the scope value. The
suggested performance parameters

for the
IP Premium are:


o

One
-
way delay

o

Inter
-
Packet delay variation

o

One way packet loss

o

Capacity

o

Maximum Transfer Unit




Traffic Envelope and Traffic Conformance



Excess treatment



Service Schedule



Reliability



User visible SLS metrics


This SLA template is i
nspired by the one
DANTE defined in [13] and can be read in
more detail in [14].



V.

M
IDDLEWARE FOR THE
C
ONTROL
P
LANE


The final piece for providing the potential of
QoS enabled networks to the applications is a
usable and efficient control plane. Even when
the network is configured to support
Differentiated Services and/or MPLS it is
unreasonable to assume human intervention for
every flow request. There has to be a way for
Applications to request resources from the
network. In the Integrated Services
Archit
ecture applications would use a signaling
protocol like RSVP [15] to allocated resources.
In a DiffServ network resources are not
allocated in the entire path for a specific flow.
The literature [16] describes two mechanisms
to achieve this: in the first R
SVP is used and
DiffServ clouds are seen as single hops. In the
second a Bandwidth Broker [8] per domain is
used. The application “contacts” the Bandwidth
Broker which is responsible to check,
guarantee and possible reserve resources.


It is still unclear

how the network of
Bandwidth Brokers will work in the future and
serious doubts to its scalability are always
raised. In our context, where a small number of
users, on a small number of computers using a
small number of Administrative domains, we
can post
pone the scalability issue and
implement a bandwidth broker architecture that
works in our testbeds.


We are currently researching the
implementation of two architectures: The
GARA architecture [17] and to a smaller
extend the GRS architecture [18].


Gara
was first presented in [17] and is is tightly
connected to the Globus toolkit middleware
package [19] (although in the future it may be
made standalone). It is designed to be the
module that reserves resources (mainly
network resources but not only) in GRI
Ds.


GARA follows the Bandwidth Broker
architecture that can be seen in Figure 7 a
program called Bandwidth Broker (BB)
receives requests from applications and
reserves, when possible, appropriate resources
in the network. This Bandwidth Broker is
designed

to interact with heterogeneous
networks hiding the particulars of each
router/switch implementation from the final
users. Applications should be linked with the
client part of the Middleware to interact with
the BB.


Both in MB
-
NG and DataTAG we are
contr
ibuting to the development of GARA and
implementing it on the testbeds running trials
with simple applications. Since applications
need to be modified to interact with GARA this
is unreasonable to get all the applications to
work with it. We are porting si
mple file
transfer applications and creating
documentation of how to port other ones.



Figure 7: Bandwidth Broker Architecture


VI.

D
EMONSTRATIONS


The concluding part of the work reported here
was to execute demonstrations of High

Performance Applications on our QoS testbed.
These kinds of applications will drive future
network research and are, therefore, a vital
piece to our work. We worked with two
applications: High Performance Visualisation
and High Energy Particle Physics



H
igh Performance Computing (HPC)
Visualisation applications have particularly
different requirements than pure data transfer
applications. Because they are frequently
interactive, they have tight delay constraints in
both directions of the communication. Wh
en
these requirements are coupled with high
transfer rates (between 500Mbits/s and 1Gb/s)
the necessity of new network paradigms
becomes evident.


The RealityGRID (see Figure 8) project aims
to grid
-
enable the realistic modeling and
simulation of complex c
ondensed matter
structures at the meso
-

and nanoscale levels as
well as the discovery of new materials. The
project also involves applications in
bioinformatics and its long
-
term ambition is to
provide generic technology for grid based
scientific, medical
and commercial.


Our tests with RealityGRID consist of
transferring visualization data from a remote
high performance graphic serve to a user’s
client interface. The Differentiated Services
enabled network allows the application to run
Bandwidth
Broker

Application

Middleware

seamlessly across mu
ltiple domains in several
degrees of network congestion.




Figure 8: Communications between
simulation, visualisation and client in the
RealityGRID project




The second application are that we focused our
tests is the High Energy Particle Physics. By
th
e nature of its large international
collaborations and data
-
intensive experiments,
particle physics has long been in the vanguard
of computer networking for scientific research.
The need for particle physics to engage in the
development of high
-
performance

networks for
research is becoming stronger in the latest
generation of experiments, which are
producing, or will produce, so much data that
traditional model of data production and
analysis centred on the laboratories at which
the experiments are located
is no longer viable,
and the exploitation of the experiments can
only be performed through the creation of large
distributed computing facilities. These facilities
need to exchange very large volumes of data in
a controlled production schedule, often with
low real
-
time requirements on such
characteristics as delay or jitter, but with very
high aggregate throughputs.


The requirements of HEP in data transport and
management are one of the high profile
motivations for “hybrid service networks”. The
next gene
ration of collider experiments will
produce vast datasets measured in Petabytes
that can only be processed by globally
distributed computing resources (see Figure 9).
High
-
bandwidth data transport between
federated processing centres is therefore an
essent
ial component of the reconstruction and
analysis of events recorded by HEP
experiments.


Our tests in an HEP environment tend to be
bulk data transfer where the delay requirements
are not as tight. The use of LBE (Less than best
effort) is appropriate for
this scenario since we
can use spare capacity when available without
affecting the rest of the traffic when the
network is congested or near congestion.



Figure 9: HEP Data Collection

VII.

C
ONCLUSIONS AND
F
URTHER
W
ORK


Experimental
work in network testbed plays a
crucial part in network research. Many
problems are undetected by analytical and
simulation work which practical experiments
find in its early stages. They also provide good
feedback for new topics of theoretical research.


In our experiments we concluded that quality
of service networks will play an important role
in future GRIDs and IP networks in general.
Bandwidth over
-
provisioning of the core
network does not solve all the problems and it
will be impossible to guarantee
end to end.


Both Differentiated Services and MPLS
provide allow for the creation of valuable
services for the scientific community with no
major extra administration effort.


We successfully demonstrated high
performance applications in a multi
-
domain
QoS

network, showing major qualitative
improvements on the quality perceived by the
final users.


As current work we are trying to integrate
GARA middleware into the OGSA [20]
architecture and researching how we can scale
the Bandwidth Broker Architecture to

several
domains. Work is also being done on the
integration of AAA (
Authentication,
Authorization and Accounting) mechanism
into the GARA framework to solve crucial
security problems arising in the GRID
community.


Our QoS tests are being extended to rese
arch
the behavior of new proposals for TCP in a
Differentiated Services enabled network. This
will enable us to use more efficiently
applications that require reliable transfers in a
QoS network.



VIII.

R
EFERENCES

[1]

Best
-
Effort versus Reservations: A Simple
Compa
rative Analysis, Lee Breslau and Scott
Shenker, in Proceedings of SIGCOMM 98,
September 1998.

[2]

Difficulties in simulating the Internet. Sally Floyd
and Vern Paxson. IEEE/ACM Transactions on
Networking volume 9 number 4, 2001

[3]

Internet Research needs better m
odels, Sally
Floyd, Eddie Kohler, in Proceedings of HOTNETS
-
1,
October 2002

[4]

http://www.mb
-
ng.net.org

[5]

http://www.datatag.org

[6]

Essentials of ATM Networks and Services, Oliver
Ibe, Addison Wesley, 1997

[7]

RFC 1633


Integrated Services in the Internet
Architectur
e: an Overview, R. Braden, D. Clark, S.
Shenker, June 1994

[8]

RFC 2475


An Architecture for Differentiated
Services, S. Blake, D. Black, M. Carlson, E. Davies,
Z. Wang, W. Weiss. December 1998

[9]

http://dast.nlanr.net/Projects/Iperf

[10]

MPLS: Technology and Applica
tions, Bruce S.
Davie and Yakov Rekhter, Morgan Kaufmann Series
on Networking, May 2000

[11]

RSVP Signaling Extensions for MPLS Traffic
Engineering, White Paper, Juniper Networks, May
2001

[12]

RFC 2598


An Expedited Forwarding PHB, Van
Jacobsen, K. Nichols, K. Pod
uri, June 1999

[13]

SLA definition for the provision of an EF
-
based
service, Christos Bouras, Mauro Campanella and
Afrodite Sevasti, Technical Report


[14]

SLA definition


MB
-
NG Technical Report
(work in progress)

[15]

RSVP


A New Resource Reservation Protocol,
Lixia
Zhang, Steve Deering, Deborah Estrin, Scott
Shenker, Daniel Zappala, in IEEE Nework,
September 1993, volume 5, number 5.

[16]

Internet Quality of Service: Architectures and
Mechanisms. Zheng Wang, March 2001

[17]

A Quality of Service Architecture that Combines
Resou
rce Reservation and Application Adaptation,
Ian Foster, A. Roy, V. Sander. Proceedings of the 8
th

International Workshop on Quality of Service
(IWQoS2000), June 2000

[18]

Decentralised QoS Reservations for protected
network capacity, S. N. Bhatti, S.A. Sorenson
, P.
Clarke and J. Crowcroft in Proceedings of TERENA
Networking Conference 2003, 19
-
22 May 2003

[19]

http://www.globus.org

[20]

The Physiology of the GRID: An Open Grid
Services Architecture for Distributed Systems
Integration, Ian Foster, Carl Kesselman, Jeffrey M
.
Nick and Steven Tuecke



A
PPENDIX
A:

C
ONFIGURATION
E
XAMPLE

The following example shows an example for
the configuration of DiffServ in Cisco IOS. As
can be seen the amount of configuration
needed in each router is minimal.


class
-
map match
-
any EF


match

ip dscp 46

class
-
map match
-
any BE


match ip dscp 0

class
-
map match
-
any LBE


match ip dscp 8

!

!

policy
-
map UCL_policy


class BE


bandwidth percent 88


class LBE


bandwidth percent 1


class EF


priority percent 10



interface POS4/1





ser
vice
-
policy output UCL_policy

































[NOTES ARE IN SQUARE BRACKETS]


[CUT EVERYTHNG FROM 'It is still unclear
how the network of Bandwidth Brokers will
work' DOWN TO FIGURE 7. THEN INSERT
THE FOLLOWING TEXT]


[DESIGN]


We have ins
tigated a project to develop such a
solution, which we call Grid Resource
Scheduling (GRS), following the Bandwidth
Broker model. The initial design was detailed
in [REFERENCE S. N. Bhatti, S.
-
A. Sørensen,
P. Clarke, J. Crowcroft, "Network QoS for
Grid Sy
stems", International Journal of High
Performance Computing Applications, Special
Issue on "Grid Computing: Infrastructure and
Applications.", vol. 17, no. 3, August 2003.].
Our goal is to enable Grid users (or
applications acting on their behalf) to micr
o
-
manage network capacity allocations at the
edge of the network.


Our Bandwidth Broker is called the Network
Resource Scheduling Entity (NRSE) and we
currently have one per domain. Client systems
in a domain send reservation requests to their
local NRSE.

The local domain will have
agreements with other domains to carry
DIFFSERV EF traffic. The NRSE knows how
much DIFFSERV bandwidth is available,
given existing reservations, and checks whether
the new reservation can be accommodated. If
it can, it conta
cts the NRSE in the remote
domain where the same check is made, before
the reservation is admitted. When the
reservation is due to begin, the NRSE instructs
the gateway router to begin marking packets
belonging to the specified flow as DIFFSERV
EF.


Reser
vations for
non
-
realtime

traffic only may
be modified by the NRSE. For example, local
policy might dictate that large file transfers
happen overnight. If it is not possible to
schedule a continuous block of bandwidth, the
NRSE may choose to split a non
-
r
ealtime
reservation into multiple smaller reservations.


The GRS protocol uses human readable XML
for signaling between clients and NRSEs (
a

in
diagram)
,

as well as between NRSE peers (
b
).
SLA requests contain token bucket parameters,
filter specification
s, authentication credentials,
start times, etc. As well as booking
reservations, the protocol supports
administrative functions (e.g. querying the
NRSE's scheduling table) and signaling
operations (e.g. signaling to a client that a
reservation is about t
o end).


GRS scales well because reservation state is
only stored at the edges of the network. Also,
authentication is peer
-
to
-
peer. Users
authenticate with their local NRSE, which in
turn authenticates itself with a remote NRSE,
so there is no need for
a global database of
users. Domain administrators are free to chose
their own policies for authentication as well as
reservation scheduling.




[IMPLEMENTATION]


We have now implemented the NRSE for the
Java platform and made an initial release with
supp
ort for Linux routers. We used the BEEP
application protocol framework [REFERENCE
M. Rose, “RFC 3080 The Blocks Extensible
Exchange Protocol Core”. March 2001.] to
carry our XML
-
based GRS protocol.
Reservations are stored in a PostgreSQL
database, provi
ding long
-
term persistence for
advance reservations. Authentication is done
via PGP signatures, but this mechanism is
extensible and we intend to add other types of
authentication.


The interface between the NRSE and the
gateway router(s) is modular (
c

in

diagram),
and a module to support Cisco routers has now
been written and is being tested on MB
-
NG.


[******* INSERT GRAPH+EXPLANATION
OF MB
-
NG DEMO RESULTS ????]


We have a low
-
level client library which
exposes the full functionality, as well as a
highe
r level library designed to be easy to bolt
-
on to existing applications. We have produced
an FTP client which uses this library to make
reservations automatically for its file transfers.


Java has performed acceptably on our current
testbeds, but we have
not yet investigated
scaling GRS to a large number of domains.
We intend to implement clients and servers for
other platforms. We are also looking at the
case where the bandwidth bottleneck is not in
either of the edge domains.


GRS is not dependent on t
he Globus platform,
although we are currently investigating the
possibility of a GRS
-
OGSA gateway.