Connecting arbitrary data sources to the grid

batterycopperInternet and Web Development

Nov 12, 2013 (3 years and 7 months ago)

84 views

Connecting arbitrary data
sources to the grid

Shunde Zhang

Australian Research Collaboration Service
(ARCS)

eResearch SA

School of Computer Science, University of
Adelaide

Background


Australian Research Collaboration
Service


A successor of APAC


Services


HPC


Data


Collaboration tools: AccessGrid, EVO,
Plone, drupal, Sakai

ARCS Data Fabric

ARCS Data Fabric (cont.)


A national service


Provided to all Australian
researchers


Based on iRODS

The Problem


Interoperability with “The Grid”


“The Grid”: Globus, gLite, condor, etc.


Data sources


GridFTP
-
compatible: dCache


Non GridFTP
-
compatible: iRODS, SRB


Possible solutions


“Manual” copy (or do it in PBS script)


Copy queue

The Problem (cont.)


Movement of massive data


Both ends use same software (talks
same protocol)


Different systems are used (talks
different protocol)


Efficiency


Possible solutions


Transfer via an intermediate point

A solution
-

old fashioned


AWS Import/Export for Amazon S3


Ship the hard
-
disks by courier
company

Our Solution
-

GridFTP


De facto standard


Compatible with the Grid, and many grid
clients


Efficiency


Parallel transfer


Data channel reuse


Large file transfer
-

in small blocks


Compatible with many file transfer
services


Monitoring


Scheduling

An overview of GridFTP
protocol


Based on FTP with extensions


Third
-
party transfer


Intermediate point not needed


Security
-

GSI


Extended block mode


Parallel transfer


Striped transfer


Partial transfer


Reliable and restartable


TCP and UDP

The Architecture

GridFTP interface

Generic File System Framework

Data Source Plugin

Data Source

Generic File System Framework

FileSystem

FileSystemConnection

FileObject

RandomAccessFileObject

creates

creates

creates

FileSystem interface

public String getSeparator();

public void init() throws IOException;

public FileSystemConnection


createFileSystemConnection(GSSCredenti
al credential) throws


FtpConfigException, IOException;

public void exit();

FileSystemConnection interface

public FileObject getFileObject(String
path);

public String getHomeDir();

public String getUser();

public void close() throws IOException;

public boolean isConnected();

public long getFreeSpace(String path);

FileObject interface

public String getName();

public String getPath();

public boolean exists();

public boolean isFile();

public boolean isDirectory();

public int getPermission();

public String getCanonicalPath() throws IOException;

public FileObject[] listFiles();

public long length();

public long lastModified();

public RandomAccessFileObject getRandomAccessFileObjec(String


type) throws IOException;

public boolean delete();

public FileObject getParent();

public boolean mkdir();

public boolean renameTo(FileObject file);

public boolean setLastModified(long t);

RandomAccessFileObject
interface

public void seek(long offset) throws IOException;

public int read() throws IOException;

public int read(byte[] b) throws IOException;

public int read(byte[] b, int off, int len) throws
IOException;

public void close() throws IOException;

public String readLine() throws IOException;

public void write(int b) throws IOException;

public void write(byte[] b) throws IOException;

public void write(byte[] b, int off, int len) throws
IOException;

public long length() throws IOException;

The Implementation
-

Griffin

GridFTP interface

Generic file system framework

GridFTP
client

Grid job submission
system

Data transfer
service

Adaptor for
iRODS

Adaptor for local
file system

Other
adaptors

iRODS

Local File System

Other data source


Griffin

Features


GridFTP protocol version 1


Java
-
based


Spring framework


OS
-
independent


Lightweight, stand
-
alone, self
-
contained


No need to install Globus Toolkit


Two plugins included


iRODS plugin


Local file system plugin


Open source (Apache 2 & GPL)

Parallel transfer with Griffin






Client






Griffin


Data Source

WAN

LAN/localhost

Authentication


GSI


iRODS plugin


User mapping


local file system plugin


XML file


Maps GSI authentication (certificate DN)
to internal user management system

Use case


Integration of the Grid and Data
Fabric


iRODS plugin for Data Fabric


Third
-
party transfer to cluster (Globus
GridFTP)


Tested with


Globus.org


Globus
-
url
-
copy (5.0 and 4.x)


Globus GridFTP GUI

Performance Evaluation


Server: Two quad
-
core Xeon 3.16GHz
CPU, 16GB memory


Client: IBM xSeries 346 with two hyper
-
threaded Intel Xeon 3.20GHz CPUs, 4GB
memory


Network: 1Gbps LAN


WAN: two 10Gbps links


Transfer: 256MB, 512MB, 1GB, 2GB,
4GB, 8GB, 16GB


iCommands


Globus
-
url
-
copy

Evaluation Set up
-



Griffin vs iCommands

Client

iRODS

Local File System

Griffin

Jargon Adaptor

globus
-
url
-
copy

iCommands

Evaluation Result Chart
-



Griffin vs iCommands

Evaluation Set up
-


Griffin vs Globus GridFTP

Client

Globus GridFTP
server

Local File System

Griffin

Local FS Adaptor

globus
-
url
-
copy

Evaluation Result Chart
-



Griffin vs Globus GridFTP

Related work


Client library


SAGA/jSAGA


Commons
-
vfs


Data transfer service


Stork


PAFTP


Globus


XIO


DSI

Griffin vs. Globus GridFTP

Griffin

Globus GridFTP

Java

C

OS
-
independent

*nix

Simple, standalone

complex

Conclusion


A generic solution to connect
arbitrary data sources to the grid


Data in/out of the grid


Data transfer between different data
sources


Java
-
based implementation


Standalone, lightweight


Plugable


Not depend on Globus

Future work


Currently working on a plugin for
MongoDB


Java NIO


UDP


Striped transfer

MongoDB plugin


MongoDB


NOSQL database


Stores JSON
-
style documents


GridFS component


Stores files


Plugin for griffin


Read/write files via GridFS

Acknowledgements


ARCS funded

Current Status


ARCS production service


Used to transfer data in/out of
ARCS Data Fabric


Website


https://projects.arcs.org.au/trac/griffin

Thank you!

Questions/Comments?