DOE-main-final - Department of Computer Science - the ...

vainclamInternet και Εφαρμογές Web

14 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

135 εμφανίσεις


1


Control and Provisioning of Ultra
-
High Speed Networks

for Large
Science Applications





Principal Investigator: Biswanath Mukherjee

Co
-
Principal Investigators: Dipak Ghosal and Xin Liu

Department of Computer Science

University of California

Davis, CA 9
5616

E
-
mail:
mukherjee@cs.ucdavis.edu

Phone: +1
-
530
-
752
-
4826; +1
-
530
-
752
-
7004

Fax: +1
-
530
-
752
-
4767




Submitted to:

Office of Science

Notice DE
-
FG01
-
04ER04
-
03

High
-
Performance Network Research:

Scientific

Discovery through Advanced Computing (SciDAC) and

Mathematical, Informational, and Computational Sciences (MICS)


Program Manager:

Dr. Thomas D. Ndousse

Mathematical, Informational, and Computational Sciences Division

Germantown Bldg/SC
-
31

Office of Sci
ence

U.S. Department of Energy

1000 Independence Avenue, SW

Washington, DC 20858
-
1290

Email:
tndousse@sc.doe.gov

Phone: +1
-
301
-
903
-
9960,

Fax: +1
-
301
-
903
-
7774



2

Table of Contents



1. DOE LARGE SCIENCE

APPLICATIONS

................................
................................
................................
.........

3

2. CURRENT STATE OF
THE NETWORKS

................................
................................
................................
........

5

3. RESEARCH PLAN

................................
................................
................................
................................
................

7

3.1

T
RAFFIC
G
ROOMING

AND BANDWIDTH PROVIS
IONING

................................
................................
......................

7

3.2

S
URVIVABILITY AND
F
AULT
-
TOLERANT
N
ETWORK
P
ROVISIONING

................................
................................
..

9

3.3

D
ISTRIBUTED
S
PACE
T
IME
S
CHEDULING

................................
................................
................................
..........

11

3.4

L
OW
L
ATENCY AND
F
AULT
-
TOLERANT
S
IGNALING
P
LANE

................................
................................
.............

14

3.5

S
CALABLE

AND
F
AULT
-
T
OLERANT
N
ETWORK
P
RIMITIVES AND
I
NTELLIGENT
S
ERVICES

..............................

15

4. COLLABORATION AND

APPLICATIONS

................................
................................
................................
....

16

5. STATEMENT OF WORK

................................
................................
................................
................................
...

17

6. REFERENCES

................................
................................
................................
................................
.....................

18

7. APPENDIX A: BUDGE
T

................................
................................
................................
................................
.....

20

8. APPENDIX B: BIOGR
APHICAL INFO
RMATION

................................
................................
.......................

21




3


1. DOE Large Science Applications


The next generation supercomputers hold an enormous promise for meeting the demands of a number of large
-
scale
scientific computations from fields as diverse as earth science,

climate modeling, astrophysics, fusion energy
science, molecular dynamics, nanoscale materials science, and genomics
(see Table 1 [DOE03] for a list of other
application and their characteristics)
.
Among

the DOE sponsored large
-
science application
s, a spe
cific example

is
the Genomes To Life (GTL) Program. The goal of GTL is to use DNA sequences of microbes and higher
organisms, including humans, as starting points to systematically answer questions related to the fundamental
underlying processes of living
systems. Towards this end, the key goals of the GLT program [GTL03] are to (1)
identify the protein machines that carry out critical life functions, (2) characterize the gene regulatory networks that
control these machines, (3) explore the functional repe
rtoire of complex microbial communities in their natural
environments to provide a foundation for understanding and using their remarkably diverse capabilities to address
DOE missions, and (4) develop the computational capability to integrate and understan
d these data and begin to
model complex biological systems.


Table
1

Characteristics of various large
-
scale science applications [DOE03].

Science

Areas

Current

End2End
Throughput

5 years

End2End
Throughput

5
-
10 Years

End2End
Throug
hput

General
Remarks

High Energy
Physics

0.5 Gbps E2E

100 Gbps
E2e

1.0 Tbps

high throughput

Climate Data &
Computations


0.5 Gbps E2E

160
-
200
Gbps

n

Tbps

high throughput


SNS

NanoScience


does not exist

1.0 Gbps

steady state

Tbps &
control
ch
annels

remote control
& high
throughput

Fusion Energy

500MB/min

(Burst)

500MB/20se
c

(burst)

n

Tbps

time critical
transport

Astrophysics

1TB/week

N*N
multicast

1TB+ &
stable streams

computational
steering &
collaborations

Genomics Data &
Computations

1T
B/day

100s users

Tbps &
control
channels

high throughput
& steering



Consider the scenario in which a user participating in a genomics research project is running software like
mpiBLAST [DCF03], which at different stages of the computation, needs to mak
e repeated accesses to bio
-
databases that are scattered all over the country. These databases are very large and typically several gigabytes in
size and increasing at a rate faster than Moore’s Law [DOE03], i.e., doubling every 12 months rather than every

18
months. The research groups can currently install local copies of these large databases to save the data transfer
time. However, as the amount of bio
-
data increases exponentially due to the advanced capabilities in analytical
technologies for biology,

it will soon become unrealistic to keep local copies of all bio
-
databases in every single
biology lab. As a result
,

from time to time (during the computation), large chunks of data will be downloaded from
a number of different bio
-
databases.
In addition,

future applications may also require distributed collaborative

4

visualization, remote computational steering, and remote instrument control [DOE03].

This poses important and
challenging networking
research and development issues
, as highlighted

in the

work
shop report

[DOE03]
:


An ultra high
-
performance network with powerful and flexible provisioning and transport
modalities is needed to meet the demand of the DOE large
-
scale science application.


D
ynamic provisioning of ultra high
-
speed networks

is identif
ied

as
a critical area of
research for networking
technologies of

DOE large
-
scale science projects. In this proposal, we focus on

the

co
ntrol and provisioning of ultra
high
-
speed networks. To elaborate, we will
address

the following challenges:


1.

Traffic G
rooming and
Bandwidth
Provisioning:
Fiber
-
optic technology is the dominant choice for
building and operating long
-
haul backbone networks because of fiber's enormous

bandwidth capacity. A
single strand of fiber can support 160 wavelength channels, each ope
rating at 10 Gbps today using
commercial off
-
the
-
shelf components (and extendable to 320 channels and 40 Gbps/channel in the
foreseeable future). However, not all network nodes or interfaces (e.g., bio
-
databases, supercomputer
interfaces, or other large
-
sc
ience DOE applications) may need such a large capacity. Therefore, how to
efficiently provision high
-
capacity pipes of diverse bandwidth granularities (perhaps ranging from several
wavelength channels to a single wavelength to sub
-
wavelength channel capac
ity) between network nodes
and interfaces is a very important problem, and is known as the traffic
-
grooming problem. Concisely,
traffic grooming refers to the mechanisms for intelligently aggregating/de
-
aggregating and switching
lower
-
speed traffic stream
s between higher
-
capacity trunks (such as wavelength channels).

Based on our
pr
eliminary work, we plan to further develop

grooming strategies for large
-
science applications.


2.

Fault
-
tolerant Network Provisioning:

Reliability and fast restoration are highly

desirable features of a
network designed to support DOE large
-
science applications. Noting the huge capacity of a fiber and the
fact that network failures (particularly fiber cuts) do occur more often than we wish, it is imperative that
excellent protect
ion and restoration schemes be designed in the next
-
generation UltraScienceNet. Given
the diversity of bandwidth granularities of the various high
-
capacity pipes, there are several additional
research challenges, e.g., should each bandwidth pipe be protec
ted separately or

should protection be set up
at wavelength channel levels?
S
hould the spare capacity be set up for "dedicated protection" for a
connection or for "shared protection" which can be pooled for different connections? Should the
protection/re
covery be performed on a per
-
link basis or a connection's end
-
to
-
end basis or on the basis of a
"sub
-
path" of a connection?
S
hould the recovery paths be pre
-
computed (and periodically recomputed
based on current network state for efficiency) or should the
y be dynamically discovered after a failure
occurs? These methods will have different performance tradeoffs on reliability, restorability, restoration
time, etc. It is envisioned that, perhaps, all of these approaches may need to co
-
exist in the same net
work
because different applications may have different requirements on fault tolerance. We propose to
investigate the applicability of the above fault
-
management approaches on bandwidth provisioning for the
GTL application as well as other DOE large
-
scale

science applications.


3.

Space
-
Time Scheduling of Large Data Transfers:
The problem of aggregating large data files from
distributed databases and/or terrascale computing facilities will be a common task in many large science
applications.
This requires int
elligent distributed space
-
time scheduling for large data transfers. In this
project, we consider the problem of aggregating large data files from distributed databases and address the
corresponding challenges involved from a network architecture perspecti
ve.
The objective is to m
inimize
the total time delay for data aggregation.
The two dimensions of determining both the

path (space) and the
time make

the problem difficult and differentiated it from all machine
-
scheduling problems which have
been reported
in the literature [Pin02].
We formulated the problem as a Time
-
Path Scheduling Problem
(TPSP).
We showed that TPSP is NP
-
complete and developed heuristic algorithms for efficient scheduling.

In this project, we will extend our previous work and

further
de
velop
scheduling
strategies suitable for
large
-
science applications.




5

4.

Low
-
delay and Fault
-
tolerant
Signaling and Control Plane Architecture:

The
scheduled
transfer of
large data sets
will require a signaling and control plane architecture that can used to

setup the schedule as
well as manage and control the network resources [
BBM03
]. A key aspect of such an architecture will

be to
minimize the end
-
to
-
end
delay of the signaling and control messages. In the context of genomics
application, low
-
delay requirem
ents also ari
se in sending control messages
to supercomputers.
The
prediction or modeling tasks for biological sciences, such as the simulation of dynamics of bio
-
molecules
over large time
-
scales [DOE03] will be carried out in supercomputers at different
physical sites and will
require data to be transferred from bio
-
databases to supercomputers and various inputs and control
messages to be processed from the user running the simulations. To provide the computational power
needed for these long time
-
scale
simulations, tasks must be very tightly coordinated to ensure the effective
utilization of the supercomputers. Thus, it is important that end
-
to
-
end message delays be minimized over
the networks to ensure that the supercomputers do not idle waiting for co
ntrol messages.
We will explore
in
-
fiber
-
out
-
of
-
band signaling architecture and investigate the use of redundant signaling paths to meet the
low
-
delay and fault
-
tolerant requirements.


5.

Scalable Network Primitives and Services:
The high performance network
ing not only consists of the
infrastructure but also various other protocol and services. This will require new transport layer protocols
that can transport large amounts of data efficiently and with low latency. Algorithms must be developed to
mitigate re
ceiver
-
side bottlenecks that may arise when large amounts of data from a number of different
databases are aggregated at a client. Network primitives such as application
-
layer multicasting [Jan00],
caching [Ora01], intelligent data replication

[
PBB01
]
, dat
a bundling based on access patterns [KoG99], and
sharing of partial computation among experts will be required to

extend the capabilities of the UltraNet.



The remainder of the proposal is organized as follows. Section 2 outlines the current state
of the
networks for large
-
scale
science applications. In particular we discuss ESNet and the goals of the newly proposed UltraNet. Section 3
gives details of the research plan. Section 4 outlines collaborations and applications and Section 5 enumerates the
statem
ent of the work. The

references
, the
budget
, the biographical information of the PIs, are provided in
Sections 6 through 8.



2. Current State of the Networks


The Energy Sciences Network, or ESnet, (shown in Figure 1) is a high
-
speed

network serving th
ousands of DO
E
scientists and collaborators worldwide. A pioneer in providing high
-
bandwidth, reliable connections, ESnet enables
researchers at national laboratories, universities and other institutions to communicate with each other using the
collaborati
ve capabilities needed to address some of the world's most important scientific challenges. The newer
challenges of DOE large
-
scale science applications require capabilities that far transcend its production network
capabilities. Consequently, the next gen
eration network demands are simply beyond the capabilities of ESnet both
in terms of the required large bandwidths and the sophistication of the capabilities. First, there is no provision in
ESnet for testing Gbps dedicated cross
-
country connections with d
ynamic switching capability. Second, during the
technology development process, it is quite possible for various components of the network to be unavailable for
production operations; such situations cause undue disruptions for normal Esnet activities.



6


Figure
1

The ESNet backbone network.

A number of proposals have been recently funded to extend ESnet. The science UltraNet
[
RWD03
]
is one
important effort that this proposed research will be closely alig
ned to. The key goal of UltraNet is to eliminate the
ever
-
widening performance gap between link speeds and application throughputs.

While
optical technologies
promise lambda switched links at Tbps rates
, they do not provide
provisioning and transport techn
ologies to deliver
this performance to
the
application

layer
.

L
egacy protocols, including the most widely deployed transport protocols,
namely Transmission Control Protocol (TCP), and other network components (that are optimized for low network
speeds) can
not easily scale to the unprecedented optical link bandwidths. UltraNet is exploring innovative scalable
architectural options that use a minimum number of layers that make wavelengths available directly to the
applications

[
RWD03
]
.


UltraNet will provide
a rich environment to explore high
-
performance transport protocols that will achieve
throughputs of the order of available capacity in the optical core networks. TCP was designed and optimized for
low
-
speed data transfers over congested IP
-
based networks.
However, its effectiveness in ultra high
-
speed networks
based on the emerging all
-
optical networks is being seriously questioned, especially in the transfer of petabytes data
over intercontinental distances

[PFD03,STP03,F
lo01,SCTP
]
. Another key issue to be

addressed by UltraNet is
traffic engineering. While MPLS has recently been extended to IP
-
based DWDM networks to take advantage of the
optical bandwidths to address congestion problem in the IP layer, unfortunately, the required advanced traffic
engineeri
ng methods have not been widely deployed in operational networks because they involve complex inter
-
domain signaling and costing. UltraNet will provide an excellent environment to prototype the needed practical
traffic engineering methods within the contex
t of DOE networking environments.



Clearly the goal of UltraNet is to develop the infrastructure and networking technologies required to support the
needs of DOE large
-
scale science applications. The purpose of this propose
d

research is to extend the cap
abilities
of UltraNet by enabling it

with

scalable and fault
-
tolerant network service
s
and primitive
s

that will allow rapid
deployment of large science applications.


7


3
.

Research Plan


In this project, we focus on the control and provisioning of
ultra
-
hig
h speed networks for
large
-
scale science
applications
.

We expect our project to complement and
extend

the research of Ultranet.
O
ur research proposal
includes (a) traffic grooming and
bandwidth provisioning
,

(b) survivability and fault
-
tolerant network pro
visioning,
(c) distributed space
-
time scheduling of large data transfer over ultra
-
high speed network,

(
d
)
low
-
delay and fault
-
tolerant
signaling and control plane architecture, (
e
)
scalable network primitives and services.

Figure 2 shows the
roadmap of th
e proposed research and its
potential impact on

Ultranet and ESnet.





Figure
2

Roadmap of the proposed research.


3.1
Traffic Grooming

and Bandwidth Provisioning


We envision that large
-
science applica
tions and the next
-
generation communication infrastructure will employ
high
-
bandwidth optical networks as the dominant backbone technology. Optical networks based on wavelength
-
division multiplexing (WDM) technology have the ability to satisfy the bandwidt
h requirements of the large
-
science applications and future Internet infrastructure, by scaling up its existing capability (particularly its
bandwidth) by 2 or 3 orders of magnitude! Under WDM, the optical transmission spectrum is carved up into a
number o
f non
-
overlapping wavelength (or frequency) bands, with each wavelength supporting a single
communication channel operating at whatever rate one desires, e.g., peak electronic speed. By allowing users to
transmit simultaneously on different WDM channels,
the huge opto
-
electronic bandwidth mismatch problem is
solved and the aggregate traffic carried by the network is increased.


Point
-
to
-
point WDM transmission technology is quite mature today, while the corresponding switching
technologies (optical crosscon
nects (OXCs)) are still maturing. But bandwidth is precious, especially for large
-

8

science applications. Once WDM transmission technology is deployed on the network backbone, efficiently
utilizing the huge bandwidth at our disposal is of paramount importanc
e.


While a single fiber strand has over a terabit
-
per
-
second bandwidth and a wavelength channel has over a gigabit
-
per
-
second transmission speed, the network may still be required to support traffic connections at rates that are
lower than the full wavel
ength capacity. The capacity requirement of these low
-
rate traffic connections can vary in
range from STS
-
1 (51.84 Mbps or lower) up to full wavelength capacity. In order to save network cost and to
improve network performance, it is very important for the

network operator to be able to mux/demux multiple low
-
speed connections onto/from high
-
capacity circuit pipes, and intelligently switch them at intermediate nodes. This
is referred as traffic grooming problem [ZhM02,ClG02,Gro99,ToN94,ZhS00,FTU02,OZM02].


For traffic grooming, a node should switch traffic at wavelength granularity as well as finer granularity. Figur
e 3
shows the logical view of a
simplified grooming
-
node architecture. (In this figure, Mux/Demux form the
transmission system, while the othe
r blocks form the switching system.) This hierarchical grooming node consists
of a wavelength
-
switch fabric (W
-
Fabric) and a grooming fabric (G
-
Fabric). The W
-
Fabric performs wavelength
routing; the G
-
Fabric performs multiplexing, demultiplexing, and swit
ching of low
-
speed connections. A portion
of the incoming wavelengths to the W
-
Fabric can be dropped to the G
-
Fabric through the grooming
-
drop ports for
sub
-
wavelength
-
granularity switching. The groomed traffic can then be added to the W
-
Fabric through t
he
grooming
-
add ports. The number of grooming ports determines the grooming capacity of a node.





Figure 3 Grooming
-
node architecture and the corresponding auxiliary graph.


We propose a generic graph model for traffic groomi
ng [ZhM02]. This model uses an auxiliary graph to represent
the different grooming node architectures and current network state, and takes into account various resource
constraints, such as the number of free wavelengths on each fiber and the number of ava
ilable grooming ports at
each node. Fig
iure

3

shows

the grooming
-
node architecture

(left)

and its corresponding auxiliary graph (right).
W
-
Fabric is modeled as the


layer consisting of input vertex
1


I

and output vertex

O
; G
-
Fabric is modeled as the
access layer consisting of input vertex
A
I

and output vertex
A
O
; grooming
-
add port is modeled by an edge from



1
For clarity, we refer to node and link in the auxiliary graph as vertex and edge.


9

vertex
A
O

to vertex

O
; and grooming
-
drop port is modeled by an e
dge from vertex

I

to vertex
A
I
. A
unidirectional fiber is represented as an edge from vertex

O

at the source node to vertex

I

at the destination node
of the link. A lightpath layer consisting of in
put vertex
L
I

and output vertex
L
O

is added to model existing
lightpaths sourced/sunk at a node. A lightpath is represented as an edge from vertex
L
O

at the source node to vertex
L
I

at the destination

node. Every edge is associated with two attributes: one indicating the available capacity and
the other indicating the cost of the resource which the edge represents.


Given a connection request, by computing the shortest path from the access
-
layer output

port (
A
O
) at source node
to the access
-
layer input port (
A
I
) at destination node, we can determine how to set up lightpath(s) and how to route
the connection onto the these lightpath(s) and/or some existing lightpath(s).


Give
n a traffic demand
T
(
s
,
d
,
g
,
m
), we need to determine how to route the traffic under the current network state
.
In
general, for a traffic demand
T
(
s
,
d
,
g
,
m
) in a network, there are four possible operations that can be used to carry the
traffic without alterin
g the existing lightpaths.



Operation

1
: Route the traffic onto an existing lightpath directly connecting the source
s

and the destination
d
.



Operation

2
: Route the traffic through multiple existing lightpaths.



Operation

3
: Set up a new lightpath d
irectly between the source
s

and the destination
d

and route the traffic
onto this lightpath. Using this operation, we set up only one lightpath if the amount of the traffic is less than
the capacity of the lightpath.



Operation

4
: Set up one or more lig
htpaths that do not directly connect the source
s

and the destination
d
,
and route the traffic onto these lightpaths and/or some existing lightpaths. Using this operation, we need to
set up at least one lightpath. However, since some existing lightpaths ma
y be utilized, the number of
wavelength
-
links used to set up the new lightpaths is probably less than that of wavelength
-
links needed to
set up a lightpath directly connecting the source
s

and the destination
d
.


The different ordering of the possible ope
rations forms different grooming policies [ZhM02]. A grooming policy
determines how to carry the traffic in a certain situation. It reflects the intentions of the network operator. In this
project, we plan to compare the properties of various grooming poli
cies a
nd develop the policies
based on the
characteristics

of GTL and other

large
-
science applications.




3.2 Survivability

and Fault
-
tolerant Network Provisioning


Reliability and fast restoration are highly desirable features of a network designed to s
upport DOE large
-
science
applications.
However, network failures do occur more often than we wish.
Table 2 shows some typical data on
network component (transmitter, receiver, fiber link (cable), etc.) failure rates and failure
-
repair times according to
Be
llcore (now Telcordia). In Table

2
,
FIT

(failure
-
in
-
time) denotes the average number of failures in
10
9

hours,
Tx

denotes optical transmitters,
Rx

denotes optical receivers, and
MTTR

means mean time to repair. Although the
problem of how the
connection availability is affected by network failures is currently attracting a lot of interest
[HoM02,ACQ02,RaM02,WSM02,WSM02FOEC], we still lack a systematic methodology to quantitatively
estimate a connection’s availability, especially when protection

schemes are used.
I
t is imperative that excellent
protection and restoration schemes be designed in the next
-
generation UltraScienceNet.

The reliabili
ty requirement
for these application
s may not be identical because of their diverse service characteristi
cs
. In the commercial
network, the availability requirements using
Service Level Agreement (SLA), which is a contract between the

10

network operator and a customer.

Usually, service reliability is represented by
connection availability
, which is
defined as t
he probability that the connection will be found in the operating state at a random time in the future.

Table 3 show
s

some typical

SLA
s
.

Connection availability can be computed statistically based on the failure
frequency and failure repair rate, reflectin
g the percentage of time a connection is “alive” or “up” during its entire
service period.


Table 2: Failure rates and repair times (Bellcore).


Metric

Bellcore Statistics

Equipment MTTR

2 hrs

Cable
-
Cut MTTR

12 hrs

Cable
-
Cut Rate

4.39/yr/1000 miles

T
x

failure rate

10867
FIT

Rx

failure rate

4311
FIT


There are two types of fault
-
recovery mechanisms. If backup resources (routes and wavelengths) are pre
-
computed
and reserved in advance, we call it a
protection

scheme. Otherwise, when a failure occurs,
if another route and a
free wavelength have to be discovered dynamically for each interrupted connection, we call it a
restoration

scheme.
Generally, dynamic restoration schemes are more efficient in utilizing network capacity because they do not
allocate
spare capacity in advance, and they provide resilience against different kinds of failures (including multiple
failures); but protection schemes have faster recovery time and they can guarantee recovery from disrupted services
they are designed to protect
against (a guarantee which restoration schemes cannot provide).



Table 3: Illustrative service classes.


Service Type

Availability

Down Time/Year

Basic

99%

87.6 hours

Premium

99.5%

43.8 hours

Silver

99.9%

8.76 hours

Gold

99.99%

52.56 mins

Platinum

99.999%

5.26 mins




Protection schemes can be classified as ring protection and mesh protection. Ring
-
protection schemes include
Automatic Protection Switching (APS) and Self
-
Healing Rings (SHR). Both ring protection and mesh protection
can be further di
vided into two groups: path protection and link protection. In
path protection
, the traffic is rerouted
through a link
-
disjoint backup route (
backup path
) once a link failure occurs on its working path (
primary path
).
2

In
link protection
, the traffic is r
erouted only around the failed link. While path protection leads to efficient
utilization of backup resource and lower end
-
to
-
end propagation delay for the recovered route, link protection
provides faster protection
-
switching time. Recently, researchers ha
ve proposed the idea of
sub
-
path protection

in a
mesh network by dividing a primary path into a sequence of segments and protecting each segment separately.
Compared with path protection, sub
-
path protection can achieve high scalability and fast recovery t
ime for a modest
sacrifice in resource utilization.




2
Node failures can also be considered by calculating node
-
disjoint routes. However, one should also note that carrier
-
class optical

crossconnects (OXCs) in
network nodes must be 1+1 (master/slave) protected in the hardware for both the OXC’s switch fabric and its control unit. The

OXC’s port cards, however,
don’t have to be 1+1 protected since they take up the bulk of the space (perha
ps over 80%) and cost of an OXC; also a port
-
card failure can be handled as link
and/or wavelength channel failure(s). However, node failures are important to protect against in scenarios where an entire no
de (or a collection of nodes in a
part of the netw
ork) may be taken down, possibly due to a natural disaster or by a malicious attacker.


11

Link, sub
-
path, and path protection schemes can be dedicated or shared. In
dedicated protection
, there is no sharing
between backup resources, while in
shared protection
, backup wavelengths can be shared

on some links as long as
their protected segments (links, sub
-
paths, paths) are mutually diverse. OXCs on backup paths cannot be configured
until the failure occurs if shared protection is used. So, recovery time in shared protection is longer but its res
ource
utilization is better than dedicated protection.


Dynamic restoration


can also be classified as link, sub
-
path, or path based depending on the type of rerouting. In
link restoration
, the end nodes of the failed link dynamically discover a route ar
ound the link, for each connection
(or “live” wavelength) that traverses the link. In
path restoration
, when a link fails, the source and the destination
node of each connection that traverses the failed link are informed about the failure (possibly via me
ssages from the
nodes adjacent to the failed link). The source and destination nodes of each connection independently discover a
backup route on an end
-
to
-
end basis. In
sub
-
path restoration
, when a link fails, the upstream node of the failed link
detects t
he failure and discovers a backup route from itself to the corresponding destination node for each disrupted
connection. Link restoration is fastest and path restoration is slowest among the above three schemes. Sub
-
path
restoration time lies in between. F
igure 4 summarizes the classification of protection and restoration schemes.


Figure 4: Different protection and restoration schemes in WDM mesh networks.


In summary, g
iven the diversity of bandwidth granularities of the vario
us high
-
capacity pipes, there are several
additional research challenges, e.g., should each bandwidth pipe be protected separately or

should protection be set
up
at wavelength channel levels? S
hould the spare capacity be set up for "dedicated protection"
for a connection or
for "shared protection" which can be pooled for different connections? Should the protection/recovery be
perf
ormed on a per
-
link basis or
a connection's end
-
to
-
end basis or on the basis of
a "sub
-
path" of a connection?
S
hould the reco
very paths be pre
-
computed (and periodically recomputed based on current network state for
efficiency) or should they be dynamically discovered after a failure occurs? These methods will have different
performance tradeoffs on reliability, restorability,
restoration time, etc. It is envisioned that, perhaps, all of these
approaches may need to co
-
exist in the same network because different applications may have different
requirements on fault tolerance. We propose to investigate the applicability of the
above fault
-
management
approaches on bandwidth provisioning for the GTL application as well as other DOE large
-
science applications.



3.
3

Distributed Space Time Scheduling


To support various system
-
level tasks of the large
-
science applications, it will
be necessary to form
interdisciplinary centers consisting of experts located at various academic, government, and industrial research labs
in different geographical locations. The data related to different aspects of the system will be collected, processed
,
analyzed, and stored at different locations. This data must be cooperatively accessed and analyzed by teams of
experts. It will be cost
-
effective to support such efforts over ultra high
-
speed networks. This requires intelligent

12

distribted space
-
time sche
duling of large data transfers. In this project,
we consider the problem of aggregating
large data files from distributed databases and address the corresponding challenges involved from a network
architecture perspective. We believe that an Optical Burst
-
Switched (OBS) network is a suitable candidate for this
application
.

The problem is modeled as one of identifying a time
-
path schedule (TPS) in a graph representation of
the network
, as described in the following
. The TPS problem (TPSP) is proven to be NP
-
complete [BSZ04].
Thus,
we propose a

Mixed Integer Linear Programming (MILP)
-
based approach and three heuristics to solve TPSP.


We first formulate the TPSP problem.
Let us consider an OBS mesh network topology. The mesh network can be
represented as a gr
aph
G(V, E)
, as shown in Figure 5. Vertices
V

represent OBS nodes, and the edges
E

represent
optical links connecting the OBS nodes. The assumption is that all the optical links have the same capacity C (say
OC
-
192). For simplicity of exposition as well as

for application to a non
-
WDM burst
-
switched network, let each
optical link have one wavelength only, since the problem can be easily extended for incorporating WDM.


Each
(
Genome
)

data warehouse is connected to a OBS node through a dedicated link of capa
city C. There may be
multiple data warehouses connected to one OBS node. A supercomputer is connected to the OBS node through
dedicated links, so that there is no bandwidth bottleneck from the OBS node to the supercomputer. All the above
links being dedica
ted are not represented in the graph.


Figure

5. Graph representation of the TPSP.

At a certain step in the computation, the supercomputer may require data aggregated from multiple data warehouses
before it resumes computation
. This process is modeled as the transfer of
files

which require to be sent from the
source OBS node (to which the corresponding data warehouses are connected) to the destination supercomputer. It
should be noted that one OBS node may be connected to sever
al data warehouses. A query is first issued by the
supercomputer to all data warehouses to determine the file size required from each warehouse. Alternatively, based
on how some of these applications develop in the future, the file
-
size information may alr
eady be available at the
supercomputer. The file size provides information on its expected
transmission delay
, as the file is transferred from
the source node to destination. The time that it takes to transfer a file along a route,
,

is the sum of the
transmission delay, the propagation delay, and the overhead
.

Because the file size (
) is typically large (typically
greater than 5 Gbytes, and perhaps as large as Petabytes in some (future) applications), the tran
smission delay
dominates
.




13

In our graph model, at each OBS node
, there exist a set of files

whose

is pre
-
computed
and denoted by the set
, which is the time to transfer each file.
The OBS node that the
supercomputer is connected to is modelled as
, where all the files are destined to.

Th
e objective is to determine the following:

1.

Route
: The path through which a file should be transferred from the source to the destination.

2.

Time schedule
: The time at which a file has to transmitted in a single burst so that it can be transferred
through t
he route determined in Step 1. This is important because two files which share a link on their
routes should not be transmitted at the same time to avoid collision due to the constraints of an OBS
network described below.

In an OBS network, although limit
ed data buffering at OBS nodes is currently possible using fiber delay lines, it is
inadequate for buffering very large files as they exist in our case. Hence, once a data warehouse starts transmitting a
file, it must reach the destination in a single burs
t, and there is no possibility of buffering it along the path. We
assume that the files cannot be fragmented. This simplifies the burst
-
assembly process, reduces the overhead of
burst regeneration at the destination, and eliminates the possibility of error
s arising due to misaligned fragments.
Hence, we utilize only a single path from the source to the destination. We also assume that this path may contain
no cycles. OBS switches do not have the ability to multiplex two different incoming data streams onto
the same
outgoing link. Therefore, each link can transfer only one file at a time.


The aim is to minimize the total time for data aggregation. This is assuming that the last file to reach the destination
is indeed the bottleneck, since computation cannot
begin unless all the data is accumulated. The two dimensions of
determining both the path and the time makes this problem exceptionally hard, and differentiates it from all
machine
-
scheduling problems which have been reported in the literature [Pin02].
Thu
s, we formulated the problem
as a Mixed Integer Linear Program (MILP) [BSZ04], which can be solved using a MILP solver such as CPLEX
[CPL]. However, t
he size of the MILP grows rapidly with the number of files because a set of several equations is
created f
or every pair of files. Hence, the MILP is not very efficient for solving larger problems. Therefore, we
propose efficient heuristics to solve the problem, and we use the MILP for only a comparative study.
Thus, we also
proposed three heuristic algorithms
to yield close
-
to
-
optimum solutions for TPSP as summarized in the following:


LONGEST
-
FILE
-
FIRST (LFF) SCHEDULING
: This heuristic is based on the intuition that the longest file
(having the largest transfer times) is the bottleneck for scheduling, because
it requires more resources in terms of
the amount of time required to be free on the links for it to be transferred. Therefore, the LFF algorithm aims at
scheduling the longest files first, so that they get priority on the network’s resources and get sched
uled earlier. For
choosing the path over which to transfer a file, the algorithm chooses the best path among K randomly chosen
paths. The overall worst
-
case running
-
time complexity of LFF is
,

where

r

is the path length and
f
is the

number of files.


DISJOINT
-
PATH (DP) SCHEDULING
: This heuristic is based on the intuition that files can be transferred
along link
-
disjoint paths in parallel. The idea is to compute the maximum number of disjoint paths from the sources
of the files to d
estination
. The above can be computed through an implementation of the Max
-
Flow algorithm
[CLR01] on the following modified graph. All the links have unit capacity. A dummy source node is connected to
all the nodes which have files

not scheduled as yet, with link capacity as the number of files. The destination is
connected to a dummy destination with capacity as the number of files yet to be scheduled. The Max
-
Flow
algorithm then identifies the disjoint paths to consist of links wi
th unit flow.
T
he worst
-
case running
-
time
complexity of the DP heuristic is
.


MOST
-
DISTANT
-
FILE
-
FIRST (MDFF) SCHEDULING
: This heuristic is based on the intuition that files
which are most distant in terms of number of links from
the destination occupy more links and are hence the
bottleneck for scheduling. The heuristic aims at scheduling these files first when the network is relatively resource
-
free.
T
he worst
-
case running time is
.


14


These approaches are
compared through simulations on a 24
-
node topology

in Figure 6
. The Longest
-
File
-
First
(LFF) heuristic performs very well when the number of files to be aggregated is small, while the Disjoint
-
Paths
(DP) heuristic should be preferred for a large number of
files. Also, LFF performs close to the MILP for small
networks where the MILP can provide a solution in a reasonable computing time.
W
e plan to extend
our
existing
heuristic algorithms

and further develop algorithms

that are
tailored
for large
-
science app
lications.
We also plan to
test our algorithms in real large
-
science projects
, e.g., GTL applications on Ultranet.





Figure 6: Performance of the heuristics


with lower bound on
finish time.





3.4 Low
-
delay and Fault
-
tolera
nt Signaling and Control Plane Architecture


The need for low
-
delay and fault
-
tolerant signaling and control plane architecture for a network such as UltraNet
arise for many reason. First, supercomputers at different physical sites will be harvested to pro
vide the
computational power needed for these long time
-
scale simulations, which must be tightly coordinated to ensure the
effective utilization of computers. It is important that the end
-
to
-
end message delays be minimized over the
networks to ensure that
the supercomputers do not idle waiting for control messages. It is important to note that a
single second of idle time represents the loss of several teraflops of compute power [DOE03]. The end
-
to
-
end delay
minimization represents a significant challenge t
o networking technologies. Such problems are currently addressed
in a limited way in overlay networks and daemons, but highly focused research and development efforts are needed
for an effective solution to this class of applications. We will identify the
needs of large
-
science applications and
propose corresponding schemes.



15

Second, in order to implement a space
-
time schedule of data transfers between end hosts, supercomputers, and
databases, it is necessary to have a signaling and control network that c
an used to setup the schedule and manage
and control the network resources. One issue is how to implement the signaling network. Current approaches
employ in
-
fiber
-
in
-
band techniques which are simple but do not provide the capability to do deploy sophist
icated
scheduling algorithms. As part of this research, we will investigate
in
-
fiber
-
out
-
band signaling.

To address the
fault
-
tolerance issue we will investigate multipath approaches for signaling and control messages.


Third, in order to manage and contr
ol the network resources of the ultra
-
high speed network, it
is
necessary to
develop a very fast, reliable, and powerful control plane. The control plane must guarantee delivery of control
messages with m
inimum delay. As part of the
research we will invest
igate the requirements of the control plane
architecture for large
-
science applications and build upon the knowledge plane architecture proposed in [CPR03].


3.
5

Scalable and Fault
-
Tolerant Network Primitives and Intelligent Services


In this research pr
oject we will investigate how application
-
layer multicasting, caching, and intelligent data
replication can be used to implement a high
-
performance network infrastructure for GTL applications. Toward
s this
end, we will build upon
the pseudo
-
serving paradig
m proposed in [KoG99].
Pseudoserving is a P2P file sharing
system comprising two components: a superserver and a set of pseudoservers. The former grants the latter access
to files in exchange for some amount of network and storage resource, specified thro
ugh a contract between the
system and each user. Under normal circumstances, no resources are requested and the superserver acts as a
regular server. As demand begins to exceed the superserver’s ability to provide service, the superserver offers a
contra
ct to the requesting pseudoserver. In it, the pseudoserver is obligated to serve the file it will retrieve to N
other requesters within T seconds. It is released from it
s

obligations should it service N other requesters before T
seconds or should T secon
ds have passed without it having serviced N requesters. In exchange for this resource
contribution, the superserver gives to the requesting pseudoserver a referral to another pseudoserver. This other
pseudoserver is the one closest to the requester known

to contain the file and is obligated to provide service as part
of its contractual obligations. The pseudo
-
server provides a framework for sharing local storage and bandwidth and
even partial computation across organizational boundaries.


The current app
lications of pseudo
-
serving to dissipate the flash
-
crowd problem is somewhat limiting because what
is transferred is the same file; in GLT applications there will be small number of users who will large amounts of
data. This restriction can be removed by
organizing files into sets of files, or packages. Before we see how
packages work, we first examine why pseudo
-
serving may not work well when more than one file is requested from
the super
-
server which for the GTL application is a meta
-
data controller for
all the bio
-
databases in the GTL
application. Suppose there are many files on a bio
-
database. There is nothing that prevents pseudo
-
serving from
working on a per
-
file basis, so that contracts are set based on the incoming rate of request for individual f
iles. The
problem arises when this per
-
file rate of request is low but the
cumulative

rate of request for all the files on the
super
-
server is high. Under such circumstances, the super
-
server may not be able to handle the incoming stream of
request alone
, and pseudo
-
servers are not able to satisfy contracts set by the super
-
server and so pseudo
-
serving is
ineffective. Now, suppose files are organized into packages, where each package is a group of N files. Moreover,
users are allowed to retrieve only pac
kages. Under this arrangement, the rate of request for each package is N times
the rate of request for individual files. The file holding time is therefore reduced by a factor of N. Using this
packaging mechanism, contracts previously too difficult to s
atisfy because of their long file holding times can be
made more attractive to the user. The problem with packaging, of course, is that users now need to download a file
that may be significantly larger than the original file. Depending on how load on the

bio
-
databases, from a user's
point of view, retrieving packages may or may not be more attractive than retrieving only the file directly from the
super
-
server. The optimal package size strikes a good balance between making contracts reasonably satisfiable

and
providing sufficient benefit to the user in reducing the total download time. This topic will be investigated as part
of this research.




16

4
. Collaboration and Applications


This project will be executed in close collaboration with the DOE UltraScien
ceNet project at Oak Ridge National
Laboratory (ORNL), in particular with the UltraScienceNet PIs
--

Dr. Nagi Rao, Dr. Bill Wing, and their
colleagues.



The PI of our propose
d project, Professor Biswanath Mukherjee, has been cooperating with Dr. Nagi Rao, Dr. Bill
Wing, and others over the past 1.5 years towards defining the research challenges for the bandwidth
-
provisioning
problems for DOE Large
-
Science Applications. In fac
t, Professor Mukherjee was invited by Dr. Nagi Rao and Dr.
Bill Wing to co
-
chair (along with Dr. Wing) the "Provisioning Group" of the "DOE Workshop on Ultra
-
High Speed
Transport Protocols and Provisioning for Large Scale Science Applications" held at Argo
nne National Laboratory
in April 2003 [DOE03]. Professor Mukherjee made important contributions to the workshop by co
-
leading the
discussions on provisioning. Then, he contributed to the final workshop report through his ideas on: (1) dynamic
provisionin
g of high
-
capacity pipes of various bandwidth granularities ranging from multiple wavelengths to a full
wavelength to sub
-
wavelength capacity; (2) how to employ generalized multi
-
protocol label switching (GMPLS) to
facilitate the dynamic provisioning; (3)
survivable bandwidth provisioning; (4) separated control channel with
deterministic or bounded delay and jitter for control
-
loop operations; etc.



We propose to build up on t
he relationship that has been set up between Professor Mukherjee and the
UltraScienceNet team. Specifically, we plan to extend it to a true research collaboration to ensure that our research
team at UC Davis is working on important research problems (w.r.
t. the missions of the UltraScienceNet team and
the DOE). We anticipate that our research results will complement (and extend the knowledge gained from) the
UltraScienceNet, and our research results can be tested on the UltraScienceNet platform as well.




One of our Co
-
PIs, Professor Dipak Ghosal, also has a long working relationship with Dr. Nagi Rao. They share
common interests in transport
-
layer protocols and application
-
l
ayer research problems. Our collaboration will build
up on this relationship as well.



It should also be worth mentioning that Professors Mukherjee and Ghosal have an ongoing coll
aborative research
project with Dr. Wu
-
Chun Feng of Los Alamos National Laboratory using a UC
-
LANL seed
-
grant project entitled

"Wide
-
Area Transport and Signaling Protocols for Genome
-
to
-
Life (GTL) Applications"; $45,214; 9/1/03
-

8/31/04. We propose to ex
ploit this collaboration also for successful execution of the proposed project.



Our additional collaborators in the DOE community include Professor Ghosal's research collabor
ation with Dr.
Rose P. Tsang of Sandia National Laboratory. This relationship will also be utilized, if necessary, for the proposed

project.



The following are the expected outcomes of this proposed research:

1.

A report on the networking issues
for the DOE GTL program. This report will discuss the specific
requirements of the GTL program and the corresponding requirements on the networking infrastructure and
protocols.

2.

This research will develop various heuristics to perform space
-
time schedulin
g of large file transfers that
minimize the total data aggregation delay
.

3.

The research will compare and contrast various methods to mitigate receiver
-
side congestion that will arise
when large data is simultaneously transferred from the various bio
-
databa
ses to a client. The analysis will
be done using a combination of simulation and analytical models.

4.

The research will develop the requirements for a low
-
delay and fault
-
tolerant signaling and control plane
architecture.

5.

This research will
investigate

traf
fic grooming algorithms for efficient network bandwidth utilization.

6.

We will
propose

reliable network provisioning strategies that meet the requirements for large
-
science
applications.


17

7.

The research will investigate and design a framework using which users

can share local storage and partial
computation. The applicability of application layer multicasting, caching, and data replication in the GTL
applications will also be determined.

8.

We will fully t
est the proposed algorithms and protocols in UltraNet in c
ollaborations
UltraNet researchers.




5
. Statement of Work


The main components of this proposal include
(a) traffic grooming and bandwidth provisioning, (b) survivability
and fault
-
tolerant network provisioning, (c) distributed space
-
time scheduling of
large data transfer over ultra
-
high
speed network, (d) low
-
delay and fault
-
tolerant signaling and control plane architecture, (e
)
scalable network
primitives and services.


Research, engineering, and application milestones:


Year 1:



Identify and analyze s
uitable tra
ffic grooming algorithms for GTL

application
s
.



Develop and extend existing heuristic
space
-
time scheduling algorithms under dynamic network states
.



Propose and study
suitable
network
survivability strategies for large
-
science applications.


Y
ear 2:



Technology transfer to Ultranet

o

Test proposed
space
-
time scheduling
and grooming algorithms on Ultranet (at ORNL) with the
collaboration of UltraNet researchers



Develop low laten
c
y signaling strategies

to control GTL simulations at remote
DOE terras
cale computing
facili
ties
.



Develop a robust application

layer
multicasting framework for simultaneous downloading of very
large

datasets to multiple clients.


Year 3:



Technology transfer to

Ultranet

o

Test proposed
low latency signaling strategies and app
lication
-
layer multicasting algorithms
on
Ultranet (at ORNL) with the collaboration of UltraNet researchers

o

Fully test algorithms developed for a se
t of large
-
science applications, in addition to GTL.



Develop the requirement for a fault
-
tolerant in
-
fiber
-
o
ut
-
of
-
band signaling architecture
.



Develop a scalable control plane architecture for UltraNet taking into
account the requirement of the
various large
-
science applications.




18


6
. References
3


[ACQ02] V. Anand, S. Chauhan, and C. Qiao, ``Sub
-
path protec
tion: A new framework for optical layer
survivability and its quantitative evaluation,'' Dept. of Computer Science and Engineering, State
University of New York at Buffalo, Tech. Report 2002
-
01, Jan. 2002.


[BBM03] Alessandro Bassi, Micah Beck, Terry Moor
e, James S. Plank, Martin Swany, Rich Wolski, and Graham
Fagg,
“The Internet Backplane Protocol: A Study in Resource Sharing,”
Future Generation Computing
Systems
, 19(4), May 2003, pp 551
-
561. Elsevier.


[BSZ04] A. Banerjee, N.Singhal, J. Zhang, C. N. C
huah and B. Mukherjee, “A Time
-
Path Scheduling Problem
(TPSP) for Aggregating Large Data Files from Distributed Databases using an Optical
-
Burst Switched
Network”, accepted for presentation and publication in the proceedings of International
Communications

Conference (ICC 2004), Paris, France.


[ClG02]

M. Clouqueur and W. D. Grover, ``Availability analysis of span
-
restorable mesh networks,''
IEEE J.
Selected Areas in Communications
, vol. 20, pp. 810
--
821, May 2002.


[CLR01] T. Cormen, C. Leiserson,
R. Rivest and C. Stein, “Introduction to Algorithms,” Second Edition, MIT
Press, 2001.


[CPL]
http://www.ilog.com/products/cplex/product/suite.cfm



[CPR03]

D. D. Clark, C. P
artridge, J. C. Ramming, J. Wroclawski,

A Knowledge Plane for the Internet.

ACM
SIGCOMM

2003.


[DCF03]

A. Darling, L. Carey, and W. Feng, “The Design, Implementation, and Evaluation of mpi
BLAST,”
ClusterWorld 2003
, Best Paper Award, June 2003.


[DOE03] DOE Workshop on Ultra
-
High Speed Transport Protocols and Provisioning for Large Scale Science
Applications. Argonne National Lab, Argonne, IL, 2003.
http://www.csm.ornl.gov/ghpn/wk2003_workshops.html


[Flo01] Internet Engineering Task Force, ICSI Center for Internet Research, Berkeley,
California.http://www.icir.
org/floyd/papers/draft
-
floyd
-
tcp
-
highspeed
-
01.txt


[FTU02]

A. Fumagalli, M. Tacca, F. Unghvary, and A. Farago, ``Shared path protection with differentiated
reliability,'' in
Proc. IEEE ICC
, pp. 2157
--
2161, April 2002.


[GTL03]

DOE Genomes to Life Program.
http://doe
genomestolife.org/
.


[Gro99] W. D. Grover, ``High availability path design in ring
-
based optical networks,''
IEEE/ACM Trans.
Networking
, vol. 7, pp. 558
--
574, Aug. 1999.


[HoM02] P.
-
H. Ho and H. Mouftah, ``A framework for service
-
guaranteed shared pr
otection in WDM mesh
networks,''
IEEE Communications Mag
., vol. 40, pp. 97
--
103, Feb. 2002.





3

There exist

a vast amount of references in the area of networking, optical, and applications that are related to this proposal.
Far from a complete set of references,
we can only list a small sample of the literature to highlight the presentation.


19

[Jan00] Jannottti, J., et al.
Overcast: Reliable Multicasting with an Overlay Network
. in
Fourth Symposium on
Operating Systems Design and Implementation (OSD
I 2000)
. 2000. San Diego, California, USA.



[KoG99] K. Kong and D. Ghosal,
Mitigating Server Side Congestion Through Pseudo
-
Serving.

IEEE/ACM
Transactions on Networking, 1999.
7(4)
.


[Muk97] B. Mukherjee, “Optical Communication Networks,” McGrawHill
, pp. 259

288, 1997.


[Ora01] A. Oram, Peer
-
to
-
Peer: Harnessing the Benefits of a Disruptive Technology. 2001: O' Reilly &
Associates.


[OZM02] C. Ou, H. Zang, and B. Mukherjee, ``Sub
-
path protection for scalability and fast recovery in optical
WDM m
esh networks,'' in
Proc. OFC
, p. ThO6, Mar. 2002.


[Pin02] M. Pinedo, “Scheduling: Theory, Algorithms, and Systems,” Second Edition, Prentice Hall, 2002.


[PBB01] James S. Plank, Alexander Bassi, Micah Beck, Terence Moore, D. Martin Swany, and Rich
Wolski,
“Managing Data Storage in the Network,”
IEEE Internet Computing
, 5(5), September/October 2001,
pp. 50
-
58.


[PFD03] PFDLNet, First International Workshop on Fast Long
-
Distance Networks, Cern, Geneva, Switzerland,
2003.


[RaM02] S. Ramamurthy an
d B. Mukherjee, ``Survivable WDM mesh networks, Part II
--

restoration,'' in
Proc.
IEEE ICC
, pp. 2023
--
2030, June 1999. (Also,
IEEE JLT
, to appear, 2002.).


[RWD03] N. S. Rao, W. R. Wing, T. H. Dunigan, DOE UltraScience Net: Experimental Ultra
-
Scale Netwo
rk
Research Testbed. http://www.csm.ornl.gov/ultranet/UltraNet_ORNL_Prop.pdf


[SCTP] Stream Control Transmission Protocol (SCTP).
http://www.sctp.de


[STP03]

Scheduled Transfer Protocol (ST), High
-
Performance Parallel Interface Standards Group.
http://www.hippi.org/cST.html
.


[ToN94] M. To and P. Neusy, ``Unavailability analysis of long
-
haul networks,''
IEEE J. Selected Areas in
Communications
, vol. 12, pp. 100
--
109, Jan. 1994.


[WSM02] J. Wang, L. Saha
srabuddhe, and B. Mukherjee, ``Path vs. sub
-
path vs. link restoration for fault
management in IP
-
over
-
WDM networks: Performance comparisons using GMPLS control signaling,''
IEEE Communications Mag
., vol. 40, pp. 2
--
9, Nov. 2002.


[WSM02FOEC] J. Wang, L. Sa
hasrabuddhe, and B. Mukherjee, ``Fault monitoring and restoration in optical WDM
networks,'' in
Proc. National Fiber Optic Engineers Conference
, Sep. 2002.


[ZhM02] K. Zhu and B. Mukherjee, “On
-
line Provisioning Connections of Different Bandwidth Granular
ity in
WDM Mesh Networks,”
Proc., IEEE/OSA Optical Fiber Communication Conference (OFC) ’02
,
Anaheim, CA, March, 2002.



[ZhS00] D. Zhou and S. Subramaniam, ``Survivability in optical networks,''
IEEE Network
, vol. 14, pp. 16
--
23,
Nov./Dec. 2000.


20



7
.
Appendix A:
Budget


The cost of this project is $K in the first year, $K in the second year, and $K in the third year (for FY2004 through
2006). Cost for each year includes graduate student assistantship and tuition fees for five graduate students, one
month salary for each PI. In each year, one post
-
doctoral fellow at full
-
time will be supported at UC Davis. The
post
-
doctoral fellow will be the involved in all aspect of the research plan and will work closely with the graduate
students and the PIs. The
budget includes travel money for travel to DOE project meetings, meetings for technology
transfer to UltraNet, and attending conferences and workshops to present some of the relevant research results.
Finally, the budget also includes equipment money to bu
y desktop PCs/workstations for graduate students and the
postdoc.


Graduate Student Fees for Research Assistants including non
-
California
-
resident tuition. The University of
California, Davis campus does not have an “Out of State” or “Non
-
Resident” Tuiti
on Remission program.
However, we are requesting approval to charge the tuition for 3 students in the first year.


Technical support salary is requested for computer technical support directly related to the scientific research
objectives of this proposed
project. This will include including installation, troubleshooting and maintenance of
specialized software and/or networking capabilities required for simulation software such as ns as well as
installation of hardware and /or research instrumentation requi
red to meet the scientific research objectives of this
project.







21

8
.
Appendix B:
Biographical Information



BISWANATH MUKHERJEE


Department of Computer Science


Phone: +1
-
530
-
752
-
4826; FAX: +1
-
530
-
752
-
4767

University of California




Electronic mail:
mukherjee@cs.ucdavis.edu

Davis, CA 95616, USA




WWW: http://networks.cs.ucdavis.edu/~mukherje/


EDUCATION

1987 Ph.D.


Electrical Engineering, University of Washington, Seattle

1980 B.Tech. (Hons.)

Electronics and Electrical Communications En
gg., IIT Kharagpur (India)


ACADEMIC APPOINTMENTS

1995
-

Professor/Computer Science, University of California, Davis

1997
-
00

Department Chair/Computer Science, University of California, Davis

1992
-
95

Associate Professor/Computer Science, University of Calif
ornia, Davis

1987
-
92

Assistant Professor/Computer Science, University of California, Davis

1984
-
87

Research & Teaching Assistant/Electrical Engineering, University of Washington


CURRENT RESEARCH INTERESTS

Lightwave Networks; Wireless Networks; Network Sec
urity


AWARDS

1984
-
85

GTE Teaching Fellowship, University of Washington

1986
-
87

General Electric Foundation Fellowship, University of Washington

1991

Co
-
winner, Best Paper Award, 14th National Computer Security Conference, for the paper "DIDS
(Distributed
Intrusion Detection System
\
(mi Motivation, Architecture, and an Early Prototype."

1994

Co
-
winner, Paper Award, 17th National Computer Security Conference, for the paper

"Testing Intrusion Detection Systems: Design Methodologies and Results from an Early
P
rototype."


RESEARCH PUBLICATIONS

Please visit B. Mukherjee's website (
http://networks.cs.ucdavis.edu/~mukherje/
) for details on his research
publications.

A. List of up to Five Publications Most C
losely Related to Proposed Project:

1.

B. Mukherjee, Optical Communication Networks, New York: McGraw
-
Hill, July 1997.

2.

B. Mukherjee, "WDM
-
Based Local Lightwave Networks
--

Part I: Single
-
Hop Networks; Part II: Multihop
Networks," IEEE Network, vol. 6: P
art I: no. 3, pp. 12
-
27, May 1992; Part II: no. 4, pp. 20
-
32, July 1992.
(Nominated for IEEE and IEEE Communications Society Paper Awards. Also, revised/updated version
published in Encyclopedia for Telecommunications as an Invited Article.)

3.

L. Sahasrab
uddhe and B. Mukherjee, ``Light
-
Trees: Optical Multicasting for Improved Performance in
Wavelength
-
Routed Networks,'' IEEE Communications Magazine, vol. 37, no. 2, pp. 67
-
73, Feb. 1999.

4.

D. Datta, B. Ramamurthy, H. Feng. J.P. Heritage, and B. Mukherjee,
"Impact of transmission impairments on
the teletraffic performance of wavelength
-
routed optical networks," IEEE/OSA Journal of Lightwave
Technology, vol. 17, no. 10, pp. 1713
-
1723, Oct. 1999.

5.

B. Mukherjee, ``WDM Optical Communication Networks: Progress
and Challenges" (Invited Paper), IEEE
Journal on Selected Areas in Communications (Special Issue on ``Protocols and Architectures for Next
Generation Optical WDM Networks"), vol. 18, no. 10, pp. 1810
-
1824, Oct. 2000.

B. List of up to Five Other Significant

Publications.

1.

B. Mukherjee and J. S. Meditch, "The p(i)
-
persistent protocol for unidirectional broadcast bus networks," IEEE
Transactions on Communications, vol. 36, pp. 1277
-
1286, Dec. 1988.


22

2.

B. Mukherjee and J. S. Meditch, "Integrating voice with t
he p(i) persistent protocol for unidirectional broadcast
bus networks," IEEE Transactions on Communications, vol. 36, pp. 1287
-
1295, Dec. 1988.

3.

B. Mukherjee, D. Banerjee, S. Ramamurthy, and A. Mukherjee, "Some principles for designing a wide
-
area
optica
l network," IEEE/ACM Transactions on Networking, vol. 4, pp. 684
-
696, Oct. 1996. (Originally
appeared in IEEE Infocom '94, was selected by the IEEE Infocom '94 conference program committee as one of
the top few papers (out of 449 submissions), recommended
to the IEEE/ACM Transactions on Networking, and
published in the journal after its own independent review.)

4.

B. Mukherjee, L. T. Heberlein, and K. N. Levitt, "Network intrusion detection," IEEE Network, vol. 8, no. 3,
pp. 26
-
41, May/June 1994.

5.

B. Guha

and B. Mukherjee, "Network security via reverse engineering of TCP code: Vulnerability analysis and
proposed solutions," IEEE Network, vol. 11, no. 4, pp. 40
-
49, July/August 1997.


PROFESSIONAL SERVICE

Editor, IEEE/ACM Transactions on Networking (1994
-
200
0)

Editor
-
at
-
Large, Optical Communicationa and Networking, IEEE Communications Society (1999
-
2000)

Technical Program Chair, IEEE INFOCOM '96 Conference

Member of the Editorial Board and Senior Technical Editor, IEEE Network (1997
-
2000)

Member of the Editor
ial Board, Journal of High
-
Speed Networks

Member of the Editorial Board, ACM/Baltzer Wireless Information Networks (WINET) journal

Member of the Editorial Board, Photonic Network Communications journal

Member of the Editorial Board, Optical Networks journa
l

Proposal Evaluation Panel: National Science Foundation (1993
-
present)

NSF Panels/Workshops: (1) All
-
Optical Networks (Jan. 93); (2) Optical Commun. and Networks (March 94); (3)
CISE International Cooperation (Oct. 97); (4) Ultra
-
High
-
Capacity Optical Net
works (Oct. 02).


Member of the Technical Program Committee, IEEE INFOCOM 89
-
90, 92
-
99 conferences; IEEE GLOBECOM
92; ACM SIGCOMM 93; (and many other conferences)


Reviewer of proposals for: National Science Foundation; NASA/HPCC; State of California MICRO

Program;
Hong Kong Research Grants Council; Govt. of Singapore; Israel Science Foundation


Founder, Chairman, and Chief Technology Officer, Summit Networks, San Jose, CA (Feb. '00
-

Aug. '02): A
startup specializing in building optical networking equipme
nt.


Member, Board of Directors, IPLocks, San Jose, CA (Feb.'02
-

present): building computer security products.


Graduate Students and Postdoctoral Researchers Supervised:

Subrata Banerjee
, PhD (Cisco; previously Director of Software, Accordion Networks;
Asst. Prof. at Stevens Tech,
Phillips Research);
Feiling Jia
, PhD (Atoga Systems, previously at SBC/Pacific Bell);
Shao
-
kong Kao
, PhD
(Foundry Networks; previously Sun Microsystems, Alidian);
Michael S. Borella
, PhD (3Com; previously Asst.
Prof. at DePaul
Univ.);
Dhritiman Banerjee
, PhD (VP/cofounder of Internet Photonics; previously at Bell
Labs./Lucent);
Jason Iness
, PhD (Intel);
Byrav Ramamurthy
, PhD (Asst. Prof. at Univ. of Nebraska);
S. Ramu
Ramamurthy
, PhD (CIENA; previously at Tellium and Bellcore);
Jason Jue
, PhD (Asst. Prof. at University of
Texas
--
Dallas);
Laxman H. Sahasrabuddhe
, PhD (SBC; previously at Amber Networks);
Nick Puketza
, PhD
(Lecturer at UC Davis);
Xiaoxin Wu
, PhD (postdoc at Purdue; previously at Arraycom);
Wushao Wen
, PhD
(CIENA; pr
eviously at Mahi Networks);
Hui Zang
, PhD (Sprint Adv. Technology Lab.);
Shun Yao
, PhD (postdoc
at UC Davis);
Jian Wang
, PhD (Asst. Prof. at Florida Intl. Univ.);
L. T. Heberlein
, MS (Net Squared);
Justin
Doak
, MS (LANL);
Kui Zhang
, MS (Cisco);
Biswaroop G
uha
, MS (Hewlett
-
Packard);
Kirk Bradley
, MS (SRI);
plus 11 other MS degrees. Currently supervising approx. 13 graduate students, mostly PhDs.


PI's PhD Advisor:

Professor James S. Meditch, University of Washington, Seattle



23




DIPAK GHOSAL

Department of C
omputer Science

University of California

Davis, CA 95616

E
-
mail: ghosal@cs.ucdavis.edu

Tel. No.: (530) 754 9251

Fax: (530) 752 4767

WWW: http://networks.cs.ucdavis.edu/~ghosal


Education



Post
-
Doctoral Studies, Computer Science, Institute for Advanced Compu
ter Studies, University of
Maryland, USA, September 1990



Ph.D., Computer Science, The Center for Advanced Computer Studies, University of Louisiana, USA, July
1988.



M.Sc.(Engg.), Dept. of Computer Science and Automation, Indian Institute of Science, Bang
alore, India,
December 1985.



B.Tech., Dept. of Electrical Engineering, Indian Institute of Technology, Kanpur, India, May 1983.


Professional Experience



July 1999
-

Present: Associate Professor, Department of Computer Science, University of California,
D
avis, CA 95616.



January 1996
-

June 1999: Assistant Professor, Department of Computer Science, University of California,
Davis, CA 95616.



September 1990
-

December 1995: Member of the Technical Staff, Bell Communications Research, Red
Bank, New Jersey 077
01, USA.



September 1988
-

August 1990: Research Associate, Institute for Advanced Computer Studies, The
University of Maryland, College Park, MD 20742, USA.



July 1986
-

July 1988: Research Assistant, The Center for Advanced Computer Studies, University o
f
Louisiana, Lafayette, LA, 70504, USA.



August 1983
-

December 1985: Research Fellowship, Department of Computer Science and Automation,
Indian Institute of Science, Bangalore, India.



Recent Research Publications



Julee Pandya, Prasant Mohapatra, and Dipa
k Ghosal, “Asymptotic Analysis of a Peer Enhanced Cache
Invalidation Scheme,” WiOpt'04: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks
24th
-

26th of March, 2004, University of Cambridge
,
UK.



Stephen Mueller, Rose P. Tsang, and Dipak Gh
osal, “Multipath Routing in Mobile Ad Hoc Network


Issues and Challenges,” Invited paper. To appear in Lecture Notes in Computer Science, 2004.



Dipak Ghosal, Benjamin Poon, and Keith Kong, “P2P Contracts: A Framework for Resources and Service
Exchange"
accepted for publication in the special issue of Future Generation Computer Systems, 2004



S. Kovvuri, V. Pandey, B. Mukherjee, D. Ghosal, and D. Sarkar, ``A Call
-
admission Control (CAC)
Algorithm for Providing Guaranteed QoS in Cellular Networks,"
Intl. J
ournal of Wireless Information
Networks
, 2003{Preliminary version: S. Kovvuri, V. Pandey, D. Ghosal, B. Mukherjee, and D. Sarkar, ``A
call
-
admission control (CAC) algorithm for providing guaranteed QoS in cellular networks,''
Proc., IEEE
Wireless Access S
ystems
, San Francisco, CA, Dec. 2000}.



W. Wen, B. Mukherjee, S.
-
H. Gary Chan, and D. Ghosal, ``LVMSR
--

An efficient algorithm to multicast
layered video,"
Computer Networks
, March 2003 {Preliminary Version: W. Wen, S.
-
H. Gary Chan, D.
Ghosal, and B. Muk
herjee, ``LVMSR
--
An efficient algorithm to multicast layered video,''
Proc., IEEE ICC
2000

conference, New Orleans, LA, pp. 254
-
258, June 2000.}


24



B. Reynolds and D. Ghosal. STEM: Secure Telephony Enabled Middlebox. IEEE Communications
Magazine Special Issu
e on Security in Telecommunication Networks. October 2002.





J. Burns and D. Ghosal, “Design and Analysis of a New Algorithm for Automatic Detection and Control of
Media
-
Stimulated Focussed Overload, to appear in Telecommunication System, 2002.



J. Abramso
n, X
-
yan Fang, D. Ghosal, Analysis of an Enhanced Signaling Network for Scalable Mobility
Management in Next Generation Wireless Networks, in IEEE Globecom, November 2002.



B. Reynolds and D. Ghosal, “STEM: Secure Telephony Enable Middlebox, to appear in I
EEE
Communications Magazine Special Issue on Security Issues in Telecommunications Networks, October
2002.

Professional Service



Served in many NSF and UC Core panels



Program Committee Member of 1995 Distributed Computing Conference, Infocom 1995
-
1997, 200
0, 2001,
2003, Performance 1996, SDPS 1996, MASCOT 1994, 2001



Referee for NSF Proposals, IEEE/ACM transactions on Networking, IEEE Transactions on Computers,
IEEE Computer Magazine, IEEE Transactions on Software Engineering,



Member of IEEE Computer Society
, IEEE Communications Society, and ACM.


Grants and Award

1998
-
1999: MICRO Grant. Title “Emerging Customer Data Network Management.” (Industry support
committed from SBC). PIs: Biswanath Mukherjee and Dipak Ghosal.

1997
-
2002: NSF Career Award. Proposal Tit
le “A Career Development Plan for Research and Education in
High Speed Networks.” PI: Dipak Ghosal

1998
-
2003: NSF Award. Proposal Title “Complementing Internet Caching with Pseudo
-
serving to Mitigate
Network Congestion.” PIs: Dipak Ghosal and Louis S Haki
mi

2002
-
2003: HP Technology Award, Mobile Technology Solutions Grant, Pis: Prasant Mohapatra and
Dipak Ghosal

2003

2004: Sandia Labs. Title “Application of Mobile Ad Hoc and Sensor Networks for Facilities
Protection,” PI: Dipak Ghosal.

2003


2004: Los Al
mos National Labs. Title: Wide
-
Area Transport and Signaling Protocols for Genome
To Life (GTL) Applications. PIs: Biswanath Mukherjee, Dipak Ghosal and Wu
-
Fung Chung

2003


2005 NSF Award: Proposal Title: Security Architecture for IP Telephony. PIs: Dipak G
hosal and S.
Felix Wu.

2003

2004: California Institute for Energy Efficiency (CIEE). Proposal Title: Enabling Demand Response
with Vehicular Mesh Networks (VMesh). Status (pending) PIs: Chen
-
Nee Chuah, Dipak Ghosal, and
Michael H. Zhang.

Patents/Inventio
ns



Keith Kong and Dipak Ghosal, “A Self
-
Scaling Scheme for Avoiding Server
-
Side Congestion in the
Internet,” Approved October 2002, US Patent 6,473,401 B1


Names of graduate and post
-
graduate advisors, advisees, and collaborators



Ph.D. advisor: Dr. Laxmi

N. Bhuyan, Professor of Department of Computer Science, University of
California, Riverside.



Post
-
doctoral advisor: Dr. Satish K. Tripathi, Dean of Engineering and Johnsons Professor of Engineering,
University of California, Riverside.



Advisees: Jennfe
r Yick, Howard Cheung, Archana Bhratidhasan, Vijay Ponduru, Brennen Reynolds, Julee
Pandya, Jeremy Abramsom, James Xiao
-
yan Fang, Keith Kong, Vijoy Pandey, Sujatha Balaraman, Xiaoxin
Wu, Raja Mukhopadhaya, Ashok Swamy, Arijit Mukherji, Narana Kannappan.



Re
search Collaborators: Biswanath Mukherjee, Randy Katz, Rajeev Motwani, Matthew Caesar, T. V.
Lakshman, Tsong
-
Ho Wu, Gopal Mempat, Jonathan Chao, Debanjan Saha, Satish Tripathi, Erol Gelenbe,
Guiseppe Serazzi.


25






XIN LIU

Department of Computer Science

Un
iversity of California

Davis, CA 95616

E
-
mail: liu@cs.ucdavis.edu

Tel.: (530) 754
-
6907

Fax: (530) 752
-
4767

http://www.cs.ucdavis.edu/~liu



EDUCATION


2002 Ph.D.


Electrical & Comp. Engineering, Purdue University

1997 M.S.

Electrical Engineering, Xi’an Jiaotong University

1994 B.S. Electrical Engineering, Xi’an Jiaotong University


ACADEMIC APPOINTMENTS

2003
-

Assistant Professor/Computer Science, University of California, Davis

2002
-
2003

Post
-
doctoral Research Associate, Univ. of Illinois, Urbana
-
Champaign


CURRENT RESEARCH INTERESTS

Wireless Networks; Network Security


AWARDS

2003

Best Paper Award, Computer Networks (Elsevier) Journal, for the paper "A Framework for
Opportunistic Schedul
ing in Wireless Networks."


RECENT RESEARCH PUBLICATIONS

1.

X. Liu, E. K. P. Chong, and N. B. Shro

, “A Framework for Opportunistic Scheduling in Wireless
Networks,” Computer Networks, vol. 41, no. 4, pp. 451
-
474, March, 2003.

2.

X. Liu, E. K. P.

Chong, and N.
B. Shro

, “Opportunistic Transmission Scheduling with Resource
-
Sharing
Constraints in Wireless Networks,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 10,
pp. 2053
-
2064, October, 2001.

3.

X. Liu, E. K. P. Chong, and N. B. Shro

, “Joint Sche
duling and Power
-
Allocation for Interference
Management in Wireless Networks,” Proceedings of the 2002 IEEE Vehic
ular Technology Conference,
Vancouver, Canada, September, 2002, vol. 3, pp. 1892


1896.

4.

X. Liu, E. K. P. Chong, and N. B. Shro

, “E

cient Sch
eduling in Wireless Networks,” Proceedings of the
2001 IEEE INFOCOM, Alaska, April 2001, vol. 2, pp.

5.

X. Liu and R. Srikant, The Timing Capacity of Single
-
server Queues with Multiple Input and Output
Terminals," To appear in the proceedings of the DIMACS W
orkshop on Network Information Theory,
2004.


PROFESSIONAL SERVICE



Member, Technical Program Committee, IEEE INFOCOM 2003
-
2004



Referee for IEEE/ACM transactions on Networking, IEEE Transactions on Computers, IEEE Computer
Magazine



Member of IEEE Computer
Society, IEEE Communications Society, and ACM.


26




Grdaute and Postdoctoral Advisors



Edwin K. P. Chong, Dept. of Elec. & Comp. Engr., Colorado State University



Ness B. Shro

, Dept. of Elec. & Comp. Engr., Purdue University



R. Srikant, Dept. of Elec. & Co
mp. Engr., University of Illinois