20080128_30_F2F_HTP

clumpsmackoverΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 16 μέρες)

100 εμφανίσεις

Domain
Workspace
Requirements
and Impacts

Bulk Data Transfer

Martin Morgan,
mtmorgan@fhcrc.org

Shannon Hastings,
hastings@bmi.osu.edu


January 2008

Agenda


ICR use cases and requirements


Data types, ‘workflows’, implementation


Available solutions


Endpoint references, WS
-
Enumeration, GridFTP


Significant cross
-
cutting issues


VCDE: binary type descriptions, endpoint reference metadata


ARCH: endpoint reference lifetime and choreography


caGrid: integrated security, development
-

and deployment
-
friendly


Emerging caGrid solutions and directions


Discussion


Next Steps


Team Members


ICR HTP Workgroup


Lead: Martin Morgan


Key participants: Shannon Hastings, Patrick McConnell, Elaine Freund
(ICR Workspace Lead), Juli Klemm


caGrid


Tony Pan, Ashish Sharma, Justin Permar, and Scott Oster

ICR use cases and requirements


Data transfer and parsing requirements


Very large objects, each with very simple structure


Numerous complex objects


Binary file transfer


‘Workflow’ requirements


Interactive


at least the illusion of responsiveness


Stateful


repeated data transformations


Cooperative


large data transfer between data / analytic services, not
directly apparent to end
-
user


Implementation requirements


caGrid security


Strongly typed


Interoperable


Reusable beyond the original application

Available solutions


Endpoint references


Light
-
weight ‘proxy’ for actual results


Defined methods and some metadata


Provides stateful services


WS
-
Enumerations


Iterate over a large object, each iteration returning a (strongly typed)
portion of the object


Provides illusion of interactivity


some results appear quickly


Reduces requirement to parse very large objects


‘Pipeline’ development tasks, e.g., separate data base access from
transfer


GridFTP


‘Out
-
of
-
band’ file transfer


E.g., compressed XML; binary files


Efficient file transfer

Significant cross
-
cutting issues


VCDE


Endpoint reference metadata


Strongly typed binary file transfer


ARCH


Endpoint reference lifetime / choreography


Allowable or required sequence of operations on an endpoint reference


caGrid


Packaged interface


Transport
-
layer details should be hidden from the end user


Endpoint reference and GridFTP security integration


Endpoint reference implementation


Several methods in a single service returning
different

endpoint references


Ongoing attention to ‘authoritative’ documentation

caGrid: response to HTP

Many user groups in caGrid have expressed the need to transfer large data files
in the grid without paying the penalty of serialization or deserialization or
having to have the entire data in core memory. Early on in the project we
created support for utilizing GridFTP for solving these usage scenarios.
Several issues with this current approach have left us searching for another
solution. Some of the issues are as follows:



GridFTP server is not cross platform (currently on C platform builds of
Globus).


GridFTP requires a separate Globus
-
C installation as caGrid uses the
JavaWS Core installation of Globus.


GridFTP has to be extended to be able to make authorization callouts to a
java based middleware such as caGrid.


GridFTP installation and configuration is quite advanced for our user
community.


Globus does not currently support SOAP attachments.

caGrid: new requirements

In order to better serve our user group we have come up with the following
requirements for an alternative non
-
grid high performance delivery
mechanism:



Cross platform (utilize the container that will be hosting JavaWS Core
Globus).


One click/command install with no required configuration.


Work within the same web application container as caGrid is deployed.


Utilize GSI sockets for securely transporting the data using the same proxy
certificates issued in caGrid.


No deserialization or serialization required on server or client.


No minimum requirement for core memory.


Support upload and download of data.

caGrid Transfer


caGrid Transfer will address all of the preceding requirements:


Simple, utilizing a Transfer Service and Transfer Servlet which will work
together in the same container to help deliver the data stream over an
HTTP/HTTPS connection.


Secure, the data, if in a secure container, will only be streamed over a GSI
Socket where the credentials match the same credentials of the caller which
created the data.


Common, Introduce Extension.


No configuration installation (simply run ant deploy)


Metadata extendable to add in recommendations from BDF WG.



Will be released with caGrid 1.2



caGrid will currently be supporting both techniques (GridFTP and HTTP)



http://www.cagrid.org/mwiki/index.php?title=CaGridTransfer

Discussion points


Important but omitted requirements?


Comments on caGrid Transfer?


Additional significant cross
-
cutting issues?


Cross
-
cutting directions?


Binary types


Endpoint reference metadata


Endpoint reference choreography



Next steps (content from discussion)


VCDE



ARCH



caGrid


Resources


caGrid wiki


http://cagrid.org


‘How
-
to’ and tutorial documentation


Grid security


Endpoint reference creation and management


ICR HTP Working Group wiki


http://ccis1716.duhs.duke.edu/wiki/index.php?title=HTP


Use cases and requirements


Case studies using currently available solutions


Significant issues