grid poster

obtainablerabbiData Management

Jan 31, 2013 (4 years and 6 months ago)


The Global Land Cover Facility is sponsored by NASA and the University of Maryland.

The GLCF is a founding member of the Federation of Earth Science Information Partners.

The Global Land Cover Facility

Integrating Earth Science Data With Grid Technologies



Storage Resource Broker (SRB)

SRB Architecture

The Earth Science community generates large quantities of scientific
data and being able to bring and use these data together presents a
difficult challenge. Data are typically stored at many small,
distributed repositories which have different access methods and
query mechanisms. A typical workflow involves downloading all of
the required data before processing or to work with data in iteration
using small and manageable batches.

By using data grid technologies, data can be viewed as belonging to
one large

repository. Data access and retrieval is performed by
referencing the data itself in a uniform namespace, independent of the
data’s physical location and underlying storage format and medium.
Data can be obtained from the grid when needed and services required
to access the data grid can be provided through one access method.
This simplifies working with data in a distributed environment.

Through funding from a SEEDS grant, the University of Maryland,
University of New Hampshire, and George Mason University are
working together to create a data grid test
bed using the Storage
Resource Broker (SRB) developed by the San Diego Supercomputer

Data is registered and organized in the SRB as objects and
collections. Files are considered objects, the fundamental storage
item. These objects then can be grouped together using collections
and collections can be grouped together in another collection. In
addition to these constructs, containers can be used to group
together small files that should be read and written as one unit. For
example, thumbnails and metadata files for a remote
imagery can be placed in a container since these are typically
retrieved together when requested.

The figure on the left shows how Landsat satellite imagery could be
organized inside the SRB.

Data is registered into the SRB data grid with the MCAT server.
Clients requesting data contact the MCAT to find where the needed
data resides and directs the client to the appropriate SRB master.
Each site with data runs an SRB master which serves data to the

The MCAT stores all of the necessary catalog information in
database. Many popular databases are supported including Oracle,
DB2, PostgreSQL, Informix, and Sybase. Data registered in the
MCAT can also be loaded with relevant metadata. This allows clients
to not only access data by name but also by querying and filtering
against attributes.

SRB masters support various native access methods to retrieve the
data at each site. Data can be served transparently to the client
without exposing the underlying storage
from simple filesystems to
complex storage hierarchical systems like HPSS.

Additional Features

Parallel Transfers
: Retrieval of data can be faster through the
SRB as compared to traditional methods by using parallel
transfers from one or many sites.

: Data registered can be replicated to multiple sites
for reliability through redundancy or copied and moved to
other sites transparent to the clients.

: The Grid Security Infrastructure provides a
common authentication mechanism and allows for
granular access control on data stored in the grid.

: Many utilities, such as inQ, mySRB, and the

command suite, already exist for the input and output
of data to the SRB and APIs for many of the popular
languages provide solutions for custom applications.

Planned Work

At this time, the partners in this project have set up a basic test
bed with some minimal data registered. We plan to expand our
holdings and make more data available in the current grid. The
SRB has support for basic metadata information but is lacking in
the storage and querying of spatial information. We plan on
creating extensions to the SRB to integrate the MCAT with
standard spatial database extenders such as the Informix Spatial
Datablade. The GLCF also plans on modifying its current version
of the Earth Science Data Interface (ESDI) to create a web
query and download tool for all data registered in this test