(Introduction to gCube (web site)x)

fortnecessityusefulSoftware and s/w Development

Dec 14, 2013 (3 years and 4 days ago)

76 views

Introduction to gCube:

promoting an ecosystem approach to controlled resource sharing

Pasquale
Pagano

Pasquale.pagano@isti.cnr.it

gCube
-

FAO's Information Systems
Architecture Forum


FAO, Rome

25 January 2011

www.d4science.eu

2

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Outline


D4Science
-
II challenges


gCube identity: starting point


gCube e
-
Infrastructure enabler: VRE innovation

gCube interoperability framework: the challenge



The assumptions


The vision


The approach


The solutions

3

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

D4SCIENCE INFRASTRUCTURE

Hadoop

EGEE/EGI

INSPIRE

DRIVER

GENESI
-
DR

AquaMaps

D4Science Ecosystem Challenges


Heterogeneous
resources



Heterogeneous
computational
platforms



Rich set of
legacy
applications



Multiple
administrative
domains



Evolving
communities



FAO
Geonetwork

FAO FIGIS

Group A

Group B

Group C

Portal

D4Science
-
II Challenges

Group C

4

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

D4Science II Current Status


File based Data:


64 large data collections gathering
164087 objects


Tabular Data:


Time series (catch statistics)


Environmental authority
(properties of ~250K marine
areas)


Species environmental envelope
(environmental description of 11k
species)


Species assignment (assignment
of a species to cell areas, ~2.75
billion records)


D4Science
-
II Challenges


Geospatial Data:


Environmental (SST, salinity, sea
ice concentration, distance to land,
etc)


Layers and several thousand
species distribution layers (~25k
layers)


Others Data Resources:


Metadata collections in multiple
schemas (163)


Full text, forward, and geo spatial
indexes (165)


Transformation programs (41)

Data Resources

HW and SW Resources

5

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

From a
testbed

to a production ecosystem

Diligent

D4Science

D4Science II

Oct .’04

Nov.’07

Jan.’08

Dec.’09

Oct .’09

Sept.’11

Testbed

Empower the grid
middleware to:

>
manage data
and
metadata

as primary
resources

>
virtualise

the VO
environment


Production

Stabilize
gCube

by
supporting two large
user communities:

> FARM

> EM



Production

Promote
interoperability
across e
-
Infrastructures by
empowering large
user communities



Prototype

=>
gCube

0.9

Software
Framework

=>
gCube

1.6

(stable and open source)


=> d4science e
-
Infrastructure


Open Platform

=>
gCube

2.0

(
feature reach and
interop
.)


=> d4science
ecosystem


D4Science
-
II Challenges

6

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

gCube

Open Platform

gCube is physically distributed across



import, collect, store, index, transform, search,
describe, manage, and annotate data.


server
-
side libraries, client
-
side libraries,
plugins



interactive components,

mediators


Designed for working
at large scale


over wide
-
area links and across administrative domains


to cope with the computational demands

can be easily deployed in a single site


Services

..

..

..

Libraries

..

..

..

Portlets

..

..

..

gCube identity: the starting point

7

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

gCube Release Cycle Procedure

Preparing
Cycle

Delivering
Components

Delivering
Subsystems

Releasing to
Integration

Building &
Packaging

Deployment
Testing

Functional
Testing

Releasing to
Production

Bug Fixing

Patching the
Production

Release
2.2.2:

-

23 subsystem

-

307 software


packages


-

22 full
-
time


developers

-

4 testers

gCube identity: the starting point

8

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

gCube

Software License: EUPL

EUPL licensing makes software Open Source
(or more
generally “Free /
Libre

/ Open Source Software


FLOSS)
because the EUPL ensures the following rights to the licensee:


Obtain the source code from a free access repository


Modify the software, and/or make derivative works out of it


Reproduce (copy, duplicate) the software


Use the software in any circumstance and for all usage


Communicate the software to the public by using it through
a public network or by distributing services based on it


Distribute the software or copies thereof to other users


Lend and rent the software or copies thereof


Sub
-
license rights in the software or copies thereof.

gCube identity: the starting point

9

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

gCube e
-
Infrastructure

A gCube
e
-
Infrastructure
promotes effective consumption of
shared resources:


hardware resources


data resources


software resources


to facilitate research collaborations that span institutions,
disciplines, and countries
within a coherent model, regardless
of the location of their research facilities


It extends the e
-
Infrastructure concept
by promoting sharing and
collaboration and enforcing policies


It increases flexibility in the organization
of community resources with
Virtual Research Environments


gCube e
-
Infrastructure enabler: the VRE innovation

10

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Virtual Organization

A
Virtual Organization
(VO) specifies how a set of users can access a set
of resources


what is shared


who is allowed to share


the conditions under which

sharing can occur


Is the VO adequate to represent a growing aggregation of resources
tailored to satisfy the evolving needs of the user community?


NO, it is not !


Common scenarios


Data needs to be assessed before to make it publically exploitable by
the VO members.


Restricted set of users have to collaborate to refine processes and
implement show cases.


Products generated through elaboration of data or simulation have to
be validated by expert users.

gCube e
-
Infrastructure enabler: the VRE innovation

11

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Virtual Research Environment

Virtual Research Environment (VRE)
is



a distributed and
dynamically created
environment


where subset of resources can be
assigned to a subset of users via
interfaces


for a limited timeframe


at little or no cost for the providers of
the infrastructure


VRE 2

VRE 1

VO

gCube is a first example of a VRE management system

gCube e
-
Infrastructure enabler: the VRE innovation

12

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Each community (VO) registers its own resources
under its domain, registers and
authorises

its users.

Starting from this set of resources (hardware, data
and applications) VREs can be dynamically set up
and activated

Each user logins to the VO’s personalized
environment and from there, the user will search,
elaborate and store shared and personal
information.

Later on the community administrators can
dynamically add or remove resources and users
from their domain.

How does it work ?

gCube e
-
Infrastructure enabler: the VRE innovation

13

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Why sharing through VREs is a key?

A
Virtual Research Environment
(VRE) supports cooperative
activities


Metadata cleaning, enrichment, and transformation by exploiting
mapping schema, controlled vocabulary, thesauri, and ontology


Processes refinement and show cases implementation (restricted to
a set of users);


Data assessment (required to make data publically exploitable by
VO members);


Expert users validation of products generated through data
elaboration or simulation.



gCube e
-
Infrastructure enabler: the VRE innovation

14

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

VREs integrated environment put at disposal a functionality
set
to support and perform activities
:



the ability to integrate heterogeneous data and services


the ability to process information on
-
demand ingesting
the results,


to share data and process with other users,


to customize collection of information,


to store user actions and exploit them for further use,


to aggregate relevant information into ad
-
hoc information
sources and keeping them updated.


VREs integrated environment put at disposal a functionality
set
to support and perform activities
:



the ability
to integrate
heterogeneous data and services


the ability
to process
information on
-
demand ingesting
the results,


to share
data and process with other users,


to customize
collection of information,


to store user actions
and exploit them for further use,


to aggregate
relevant information into ad
-
hoc information
sources and keeping them updated.


Why sharing through VREs is a key?

gCube e
-
Infrastructure enabler: the VRE innovation

15

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Why sharing through VREs is a key?


Through the VRE, groups of users have
controlled access

to distributed data and services integrated under a
personalised

interface
.



gCube e
-
Infrastructure enabler: the VRE innovation

16

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Building Virtual Research Environments

17

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011



Transformation

Storage

VRE Facilities

Tools supporting
specific
tasks

A virtual
live document
to
describe
research results

A virtual desktop to organize the
working environment

Workspace

Species

Maps Generation

Time Series
Management

Report

Management

Search

Annotation

Visualisation

Search

Annotation

Visualisation

Annotation

Search

Storage

Visualisation

Transformation

Transformation

Storage

gCube e
-
Infrastructure enabler: the VRE innovation

18

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Workspace


A collaboration
-
oriented
suite
providing for


seamless access

and
organisation
facilities on a
rich array of
objects

(e.g. Information Objects, Queries, Files, Templates)


mediation between external world objects, systems and
infrastructures (
import
/
export
/
publishing
)


support common file manager (
drag & drop
,
contextual menu
)


support an effective rich object
sharing
facility

gCube e
-
Infrastructure enabler: the VRE innovation

19

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

AquaMaps

is an application*


tailored to predict global distributions of marine species
initially designed for marine mammals and subsequently
generalised to marine species,


that generates
color
-
coded species range maps using a
half
-
degree latitude and longitude blocks


by interfacing several databases and repository providers


Species Distribution Maps Generation

* A
lgorithm

by
Kashner

et al. 2006

gCube e
-
Infrastructure enabler: the VRE innovation

20

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

AquaMaps

execution is based on the
gCube Ecological Niche
Modelling Suite

which allows the extrapolation of known
species occurrences

Species Distribution Maps Generation


to determine environmental
envelopes (species tolerances)



to predict future distributions by
matching species tolerances
against local environmental
conditions (e.g. climate change
and sea pollution)

Very

large

volume

of

input

and

output

data
:

HSPEC

native

range

56
,
468
,
301

-

HSPEC

suitable

range

114
,
989
,
360

Very

large

number

of

computation
:

One

multispecies

map

computed

on

6
,
188

half

degree

cells

(over

170
k)

and

2
,
540

species

requires

125

millions

computations

(Eli

E
.

Agbayani
,

FishBase

Project/INCOFISH

WP
1
,

WorlFish

Center)

gCube e
-
Infrastructure enabler: the VRE innovation

21

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Time Series Management

Offers a set of tools to manage capture statistics


Supports the complete TS lifecycle


Supports validation,
curation
, and analysis


Provides support for data reallocation


Produces uniform data
-
set

gCube e
-
Infrastructure enabler: the VRE innovation

22

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Time Series


Offers a set of tools to operate on capture statistics


Multiple key families support


Filtering, grouping, and aggregation


Union


Mining








Produce automatically provenance information

gCube e
-
Infrastructure enabler: the VRE innovation

23

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Report Management


A collaboration
-
oriented
suite
providing for


template
-
oriented
,
feature
-
rich

and
flexible
document format
definition


effective and
infrastructure
-
integrated

report compilation (drag &
drop workspace items)


collaborative
and
distributed
editing (workspace based)


standard
-
based

report
materialisation

(HTML,
OpenXML
)

gCube e
-
Infrastructure enabler: the VRE innovation

24

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

gCube model

Wide
-
area computing based on shared computing, data, and service resources.


provision as Federation but resources can be acquired by the infrastructure


added value for consumers and providers


ownership is decentralised but control is autonomic


resources are heterogeneous


security is pervasive but mostly hidden by
gCube

middleware


Application model is dominantly resource
-
oriented


VREs

profiled as aggregation of resources dynamically deployed, executed, and
terminated


are interactive


are built on shareable resources (including workflow) in their own right


are published and discoverable


may integrate storage elements sited at communities site


may host applications that can also be executed by interfacing classic grid and cloud


Deployment model


dynamic and autonomic


Development platform


complete service programming abstraction

gCube e
-
Infrastructure enabler: the VRE innovation

25

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Interoperability: Assumptions

Consolidated facts:


Very rich applications and data collections are currently maintained by
a multitude of authoritative providers


Different problems require different execution paradigms: batch, map
-
reduce, synchronous call, message
-
queue, …


Key distributed computation technologies exist: grid (
gLite

and Globus),
distributed resource management (Condor), clusters (
Hadoop
), …


Several standards are adopted in the same domain


Societal observations


A rich variety of protocols, models, and formats


Create barriers in the usage of resources


Delay dramatically new exploitation patterns


Technical observations


Protocols, models, and formats heterogeneity
increases load,


Load increases failures




gCube interoperability framework: the challenge

26

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011









Interoperability: Landscape

Resource
Discovery

Data
Storage

Data
Discovery

Data
Access

Data
process

Unstructured Data: blob (binary), and textual files

Structured Data: tabular, statistical, geospatial, temporal, and textual data

Compound Data: data composed by unstructured and structured data entities

gCube interoperability framework: the challenge

security

27

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Interoperability:
gCube

Vision

gCube

objectives:


hide heterogeneity
, i.e. abstract over differences in
location, protocol, and model;


embrace heterogeneity
, i.e. allow for multiple locations,
protocols, and models;


Technical goals


no bottlenecks
: scale no less than the interfaced
resources


no outages
: keep failures partial and temporary


autonomicity
: system reacts and recovers


gCube interoperability framework: the vision

28

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Hiding Heterogeneity


Heterogeneous resources are virtually accessible in a
common ecosystem of resources


despite their locations, technologies, and protocol


Different communities have access to different views


according to the conditions under which the sharing can occur

gCube interoperability framework: the challenge


Each community can define
its own VRE


for a limited timeframe and
at
no cost for the providers of the
resource


Several VRE can coexist


without interfering each other
even by competing for the
same resources

29

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Embracing Heterogeneity

Approaches and solutions to achieve interoperability :


Blackboard
-
based


asynchronous communication between components in a system


one protocol to R/W and one language to specify messages


Wrapper/ Mediator
-
based


translates one interface for a component into a compatible interface


Proxy
-
based


exposes the same interface but allows additional operation over received calls


Adaptor
-
based


provides a unified interface to a set of other components interfaces and
encapsulates how this set of objects interact


Broker
-
based


Specialises

an Adaptor by coordinating communication


gCube interoperability framework: the approach

30

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Each resource is represented by a profile (metadata)
characterising
:


the interface


the state


the list the dependencies


the run
-
time status


the policies


the configuration


the pending tasks to execute



A Resource profile


is published by the resource owner


is discovered by the resource consumers asynchronously through a
common resource
-
independent protocol


gCube offers a distributed and scalable Information System
(
blackboard
) to store, discover, and access resource profiles

Interoperability Approaches:

Resource Discovery

gCube interoperability framework: the solution

31

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Interoperability Approaches:

Content Interoperability

gCube

Open Content Management Architecture

(OCMA)



Assumption


data stored in different storage back
-
ends


diverse locations, models, access types


few common primitives: documents, collections, repositories



gCube allows to


reach content that lies outside system


expose content (reachable from) inside system


perform coarse
-
grained as well as fine
-
grained retrieval, update, and
addressing



Runtime scalability


autonomic read
-
only state replication,


maximize throughput, minimize response time: discovery
-
time load balancing


reduce latencies



Software


plugin
-
based architecture to reduce development costs


gCube interoperability framework: the solution

32

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Interoperability Approaches :

Data Discovery and Access

gCube

offers


Several index types


Forward indexing, which supports ultra fast lookups on tabular typed
metadata;


XML indexing, that supports
semistructured

lookups on content metadata;


Textual field indexing, that supports full text and qualified lookups on textual
(mainly) metadata;


Metadata full text indexing, that enables full text lookups on metadata;


Content full text indexing, that enables full text lookups on text extracted by
content;


Geospatial/temporal indexing, that enables geospatial proximity and
coverage queries to be executed over geospatial/temporal metadata;


Feature indexing, that enables high
-
dimension vector indexing, for feature
lookup (currently the feature is inactive);



Runtime scalability
-

WORM (Write Once


Read Many) behavior
pattern


multiple readers (Lookups in gCube lingo)


single updater for each index


Autonomic sync under a dynamically expanding/shrinking

gCube interoperability framework: the approach

33

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Interoperability Approaches :

Data Representation and Manipulation

gCube

offers


Open transformation service framework


Extendible with specific source
-
target mediators


To use for metadata and data crosswalk transformations


Tailored for statistical, geospatial, temporal, and textual data



Rich set of reference data


Extendible with domain
-
specific reference data


To reuse in services for data
curation

and harmonization



Support for geospatial services


To capture, manage, analyze, and display all forms of data that can
be geographically referenced



Integrated resources registry


Format agnostic


To support discovery and access


gCube interoperability framework: the approach

34

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

gCube offers solutions to:


Decouple the business domain and infrastructure specific
logic from the core “execution” functionality



Invocate a wide range of logic components: SOAP and REST
WebServices
, Shell Scripts, Executable Binaries, POJOs,




Support most of the execution paradigms: batch, map
-
reduce,
synchronous call



Bridges key distributed computation technologies: grid (
gLite

and
Globus), Condor,
Hadoop



Control and monitor the execution of a processing flow



Staging of data among different storage providers



Streaming data among computation elements



Interoperability Approaches :

Process Execution [1/2]

gCube interoperability framework: the approach

35

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Interoperability Approaches :


Process Execution [2/2]

By using adaptors that


operate on a specific
third party language
and translate them
into native constructs,



allow for the creation
of
complex
workflows that
exploit several
diverse technologies
deployed on different
infrastructures

gCube interoperability framework: the approach

36

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Conclusions

gCube System:


Stable software being improved over the last 5 years


Powerful Ecosystem management system equipped with
advanced infrastructure management functionality


gCube

offers a variety of patterns, tools, and solutions


to delivery interoperability solutions and interconnect


Heterogeneous digital content


Heterogeneous repository systems


Heterogeneous computation platforms


to decrease the cost of adoption


to reduce the time to market of new ideas


to deal with plethora of standards


37

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Supported Standards

WSRF Specifications


WS
-
ResourceProperties

(WSRF
-
RP)


WS
-
ResourceLifetime

(WSRF
-
RL)


WS
-
ServiceGroup

(WSRF
-
SG)


WS
-
BaseFaults

(WSRF
-
BF)


JSR


168 : Simple Portlets


286 : 186 update


160 : JMX


WSN Specifications:


WS
-
BaseNotification


WS
-
Topics


(WS
-
BrokeredNotification
)


WS
-
* Standards


SOAP


WSDL


WS
-
Addressing


ISO:


ISO3166 countries


ISO4217 currencies


ISO1915 geo
-
location


X
-
*


XML


XSD


XSL


XSLT


xPath


xQuery



OGC


Web
Coverage

Processing Service


Web
Coverage

Service


Web
Feature

Service


Web
Map

Context



Web
Map

Service


Web
Map

Tile

Service


Web Processing Service


Web Service Common


OGF Standard:


Glue Schema (2)


……….


Comply with:


OAI
-
PMH


OAI
-
ORE



38

www.d4science.eu

Introduction to gCube

Rome, 25 January 2011

Find us

www.gcube
-
system.org

www.d4science.eu


Pasquale
Pagano

D4Science
-
II Technical Director

pasquale.pagano@isti.cnr.it



Thank You For
Your Attention