Accessing Biodiversity Resources in Computational Environments from Workflow Application

fallsnowpeasInternet and Web Development

Nov 12, 2013 (3 years and 11 months ago)

106 views


Accessing Biodiversity Resources in
Computational Environments from Workflow
Application


J. S. Pahwa
, R. J. White, A. C. Jones, M. Burgess, W. A. Gray,

N. J. Fiddian, T. Sutton, P. Brewer, C. Yesson, N. Caithness,

A.
Culham, F. A. Bisby, M. Scoble, P. Williams and S. Bhagwat


WORKS 2006, Paris



Overview


The Biodiversity World (BDW) Project


The three exemplars chosen for BDW


BDW Architectural Components

a)
Resource Wrappers

b)
BiodiversityWorld
-
GRID Interface (BGI) Communications Layer

c)
BDW Datatypes

d)
The Metadata Repository (MTR)


Using BDW for bioclimatic modelling


Access to computational resources in BDW environment


Further Work & Conclusions




The BDW System


A framework for biodiversity problem
-
solving


provides access to widely dispersed, disparate
data sources and analytical tools


Intended particularly for analysis and modelling of
biodiversity patterns


Provides access to resources originally
designed for use in isolation


Resources may be composed into complex
workflows

BDW Exemplars


A.
Biodiversity richness analysis and
conservation evaluation

B.
Bioclimatic modelling and global climate
change

C.
Phylogenetic analysis and biogeography


Biodiversity Richness Analysis and
Conservation Evaluation

Aim:


analysis of biodiversity richness patterns for a
particular taxon (e.g. group of species) around the
world

The BDW System enables:


Taxonomic verification using the Species 2000
Catalogue of Life service


Composition of distribution datasets for the chosen
taxon from various sources around the world


Use of the WorldMap System to


visualise the distribution datasets, and


help identify priority areas for biodiversity conservation

Bioclimatic Modelling and Global Climate
Change

Aim:


Understand impact of global climate change on
distribution and diversity of plant & animal species


Identify climatic & ecological conditions under which
a single species lives, extrapolating from known
occurrences


Hence
calculate a potentially wider set of areas
where the species might occur, or predict future
distribution under anticipated climatic conditions


A bioclimatic modelling workflow example follows
later

Phylogenetic Analysis and Biogeography

Aim:


Discover ancestral relationships between groups of
organisms using methods of
phylogenetic analysis


Estimate ages of species


Use estimates of historical climate to produce
plausible estimates of geographical distributions


Assess historical relationships between changing
climate and development of new species

The BDW System provides (1):


A flexible and extensible problem solving
environment (PSE)


Means of


bringing together heterogeneous, globally distributed,
biodiversity
-
related resources & analytical tools


assembling resources into workflows to perform complex
scientific analyses


Consistent mechanisms to achieve interoperability
of system components

The BDW System provides (2):


Uniform interfaces for heterogeneous
resources (resource wrappers)


Mechanism for data packaging & transfer


Compatibility with the Triana Workflow
System for assembling and executing
workflows


Web Services
-
based Grid middleware for
accessing remote computational resources

The BDW System Architecture

BDW architectural components (1)

Resource Wrappers




Provide consistent interface to local & remote resources, and
standard resource access/invocation mechanism


Insulate the core BDWorld System from resource
heterogeneity


Wrap various kinds of resources and analytical tools and can
be deployed in Grid/Web Services environment.


Give consistent form to data retrieved by encapsulating them
into BDWorld data types


Resources wrapped include AVH, GBIF, OpenModeller, etc.


Resource Wrapper Architecture

BDW architectural components (2)

BDW
-
GRID Interface (BGI) Layer



Provides standard mechanisms for invoking operations on
heterogeneous resources


Acts as an integrated mechanism for accessing all resource
wrappers


Isolates resource wrapper implementation to a separate layer
to enable the use of web services/grid technologies


BDW architectural components (3)

BDW Datatypes



Encapsulate different types of data and sub
-
datatypes for
transporting data between end points


Can be transformed into xml representations which can be
easily serialised


Flexible enough to encapsulate user
-
defined xml documents
or data in a string representation


Extensible; new datatypes can be incorporated

BDW Datatypes



BDW architectural components (4)


BDW Metadata Repository


A specialised BDWorld resource



Provides information such as:


Available resources


Operations supported by each resource


Data types used by operations


Location of resource wrapper



Stores semantic information in the BDWorld ontology,
to answer questions such as


‘Which resources can provide me with species data?’


‘Which available operations can accept the outputs from a
specific operation?’


Bioclimatic Modelling (1)


By using the known localities of a species, a
climate preference profile is produced by
cross
-
referencing with present day climate
data



This climate preference profile is then used to
locate other areas where such a climate
exists, indicating areas climatically suitable for
the species

Bioclimatic Modelling (2)


Using present
-
day climate:


assess areas under threat from invasive species,
or


those that may benefit from the introduction of a
new crop


Using climate predictions for the future:


assess possible effects of global climate change
on the distribution of study species


Using climate predictions for the past:


assess changes caused by natural factors in the
past


Bioclimatic Modelling Workflow performed by
Triana workflow package in BDW system

Example model output for the clover species Trifolium patens Schreber (a member of the bean
family). The map shows areas (shaded regions across Central and Eastern Europe, South America,
Asia and Australia) predicted to be suitable for the species in the 2050’s using the bioclimatic
modelling algorithm GARP and the Hadley Centre climate model using the SRES A1F climate
scenario.

The Current BDW Architecture:

Enables execution of BDW workflow tasks in
remote nodes but with a limited scope.


-


Lacks in giving sufficient control and

flexibility to the user.


-


Does not provide the functionality of

distributing user jobs across several

nodes.


-


Dependent on libraries at the client side.

The new BDW System architecture (1):



Provides user with access to:



-

Biodiversity resources.



-

Computational resources.


Use the existing mechanism of invoking
operations on remote resources via resource
wrapper web services.


It also uses condor middleware for utilising
computational resources and distributing
workload across available nodes.



The new BDW System architecture (2):



Provides access to the condor pool via the web
service interface.


Gives user to flexibility to choose available
computational node by using Ganglia cluster
monitoring toolkit.


Enables matching of workflow task with preferred
resource(s).



The new BDW System architecture (2):

Conclusions and Further Work


BDW brings together varied, distributed resources and
analytical tools for biodiversity researchers and analyse
biodiversity patterns


Disparate resources can be accessed in the Web
-
Service
enabled BDW PSE.


The BDW PSE has uniform access to heterogeneous
resources


BDW allows linking of tools and resources in a workflow to
automate different activities of an experiment


Three current exemplar study areas


The new BDW architecture also provides access to
computational resources.


Security


Shibboleth/chroot



Acknowledgements


BDW team


Species 2000


OpenModeller Community (including CRIA)


BBSRC