Chapter 1 Review of Current Transport Protocols - Argonne National ...

fullgorgedcutNetworking and Communications

Oct 24, 2013 (3 years and 10 months ago)

82 views

GridFTP Pipelining

Solution to the LOSF Problem

John Bresnahan, Mike Link, Rajkumar
Kettimuthu, Dan Fraser, Ian Foster


Argonne National Laboratory

University Of Chicago

GridFTP


Fast and robust data transfer protocol


Two channel protocol


Control channel


Used to request data transfers


Command response protocol


Data channel


Used to stream data


Transfer requests are serialized


One at a time

Lots Of Small Files Problem


Performance is best on large data sets


Overhead has less impact


Data set can be partitioned into many files


GridFTP is traditionally only fast with large file
partitions


Overhead added on a per file basis


Bad for a large data set partitioned into many
small files


LOSF Issues


Transfer request latency


1 control channel RTT between transfers


1) Transfer request, 2) completion acknowledgment


Data transfer is idle between requests


Less overall time spent transferring data


TCP issues


Idle data channel can cause TCP window to
close


Must go through slow start on each file

Tradition Transfer Session

Sender

Receiver

Client

Data

Pipelining


Allow many outstanding transfer requests


Send next request before previous completes


Latency is overlapped with the data transfer


When one finishes the next has already traversed
the network


Cache data channels


Once established it is used for all transfers


Not a feature of pipelining

Implementation


Backward compatible


Wire protocol doesn’t change


Client side sends commands sooner, but server
reads them at the same time


TCP handles flow control


Commands sent early may just sit in TCP kernel buffers


Minimal server sides changes


None required for performance benefits


Changes to protect against possible DOS

Traditional vs. Pipeline

Traditional

Pipeline

Experiments


TeraGrid 64bit Machines


LAN: UC <
-
> UC


WAN: UC <
-
> SDSC


With and without security


Memory to memory transfers (no disk IO)


1 GB data sets


Partitioned into an increasing number of smaller
files


LAN No Security

LAN Security

WAN No Security

WAN Security

Results


Benefits more pronounced on WAN


Longer control channel RTT


Effects of security


Shown in differences between cached and
standard


Adds additional latency encrypting responses


Begins to effect pipelining at a file count
threshold

Availability


globus 4.1.2


globus
-
url
-
copy


GridFTP server


globus_ftp_client library


Java cog kit


In cvs branch