to get the file - Équipe de Sismologie de l'IPGP

towerdevelopmentData Management

Dec 16, 2012 (4 years and 7 months ago)

888 views






Dissemination and Exploitation of Grids in Earth
Science




DEGREE
-
IST
-
2005
-
034619


PUBLIC


1

/
43



DEGREE


S
URVE Y OF E X I S T I NG DA
T A
T E CHNOL OGI E S I N E ART
H S CI E NCE
AND DAT A US AGE P OL I C
I E S



DELIVERABLE
:
D
2.1




Document identifier:

DEGREE_
D
2
.1
.doc


Date:

18/03/2013


Activity:

WP2:


Lead Partner:

FhG/SCAI


Authors

Horst Schwichtenberg, Ute Karabek,
Monique Petitdidier, Marek Ciglan
,

Andrey Poliakov, Gerald Vetois, Jisamma
Kallumadikal


Document status:

Final
Deliverable


Document link:



Abstract
:
This document conta
ins the results of T
ask 1 and 2 of W
P2. The task is to gather and
analyze existing data technologies and data usage policies in ES.







DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


2

/
43


Delivery Slip




Name

Partner/Ac
tivity

Date

Signature

From

Ute Karabek
,

Horst Schwichtenberg
,

Jisamma Kallumadikal


Marek Ciglan

Monique Petitdidier,

Cathy Boonne, Mathieu Lonjaret

Andrey Poliakov

Vetois Gerald



FhG SCAI

UI SAV


CNRS

GC RAS

CGG


3/18/2013



Reviewed by

Wim Som de Cerff

KNMI


19.6.2007



Approved by

Ladislav Hluchy

UISAV/ PM


22.6. 2007




Document Log


Is
sue

Date

Comment

Author/Partner

1

01.02.
20
07

Data Policy

C. Boonne, M. Petitdidier

2

12.03.
20
07

Update after 1. internal review

H.
Schwichtenberg

3

16.03.
20
07

Integration of WP1 questionnaires

M. Ciglan (IISAS)

4

20.03.
20
07

Merge versions and update CGG

part.

G. Vetois (CGG)

5

18.
0
5.
20
07

Revised version

M. Petitdidier

6

31.05.2007

Revised Version

H.Schwichtenberg, J.Kallumadikal

7

20.06.2007

Final Version

H.Schwichtenberg, J.Kallumadikal














Document Change Record


Issue

Item

Reason for C
hange





DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


3

/
43


Table of contents

1

INTRODUCTION

................................
................................
................................
................................
........

4

1.1

P
URPOSE
................................
................................
................................
................................
.................

4

1.2

D
OCUMENT ORGANI
Z
ATION

................................
................................
................................
...................

4

1.3

R
EFERENCES

................................
................................
................................
................................
..........

4

1.4

D
OCUMENT AMENDMENT PR
OCEDURE

................................
................................
................................
...

5

1.5

T
ERMINOLOGY

................................
................................
................................
................................
.......

5

2

EXECUTIVE SUMMARY

................................
................................
................................
..........................

8

3

APPLICATIONS AND PRO
JECTS

................................
................................
................................
.........

10

4

TYPICAL SCENARIOS AN
D USE
CASES

................................
................................
............................

15

5

DATA PROVISION AND D
ATA FLOW

................................
................................
................................

19

5.1

D
ATA PROVISION

................................
................................
................................
................................
..

19

5.2

D
ATA FLOW DURING WORK
FLOW

................................
................................
................................
..........

20

5.3

D
ATA FLOW DURING COMP
UTATION

................................
................................
................................
.....

21

5.4

R
ELATION TO
G
RID
E
NVIRONMENTS

................................
................................
................................
.....

22

6

DATA ACCESS

................................
................................
................................
................................
..........

24

6.1

D
ATA ORGANIZATION

................................
................................
................................
...........................

24

6.2

I
NFORMATION SYSTEMS

................................
................................
................................
........................

25

6.
3

D
ATA FORMATS

................................
................................
................................
................................
....

28

6.4

D
ATA FILE
S
IZES

................................
................................
................................
................................
...

31

6.5

N
ETWORKED DATA PROTOC
OLS

................................
................................
................................
............

31

6.6

D
ATA ANALYSIS AND VIS
UALIZATION CLIENTS
................................
................................
......................

32

7

DATA POLICIES

................................
................................
................................
................................
.......

39

8

AUTHORIZATION, AUTHE
NTICATION, ACCOUNTIN
G

................................
...............................

40

9

CONCLUSION AND FUTUR
E PLANS

................................
................................
................................
..

40

10

QUESTIONNAIRE

................................
................................
................................
................................
....

41



DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


4

/
43



1


INTRODUCTION

1.1

PURPOSE


The p
urpose of this document is to
give an overview of the current status

of the data technologies and
data usage policies used in Earth Science. This document was developed on the basis of the answers
of the questionnaire which was send to DEGREE partners and

several members of the Earth Science
community. The document will give an impression about today's typical data management in Earth
Science applications.

It will describe ES data management technologies in use
,

which

will be used as

starting point for
Task 2.3.


The
Questionnaire

was proposed in
the first project month in parallel to the activity of
Work Package

1. Later
,

the
Questionnaires

of
Work Package

1 and 2
were

merged and
the
results in

context of data
management are

included in this document.



1.2

DOCUMENT ORGANI
Z
ATION


The document
is organized as follows. Following a list of references and terminologies used in this
deliverable. This is

continued by an executive summary in which we are

going to line out first results
about data technologies and

data usage policies. The answers in the questionnaire are focused on
specific applications. Section 3 contains a brief description of these applications. Section 4 lists
typical scenarios and use cases in ES. Section 5 deals with the
origin of data and th
eir flow during
computaion and workflow.
The ways to access the data described in section 6 leads to the policies for
data usage. Authentication, authorization and accounting mechanisms
, connected to the data policy

are described in section 7

and 8
.

Finall
y the
conclusion and future plans for Earth Science are pointed
o
ut which is then followed by the concluding remarks.

1.3

REFERENCES

Table
1
: Table of references

No.

Title

Web address (if applicable)

R1

Federation of Digital Seismogra
ph Network

http://www.fdsn.org


R2

Earthquake Research Institute, in Japan

http://www.eri.u
-
tokyo.ac.jp/

R3

DEGREE D1.1 ES family of applications and their Grid
requirements

http://www.eu
-
degree.eu/DEGREE/internal
-
section/wp1/DEGREE
-
WP1
-
D1
-
1.3.pdf/view

R4

K
-
Wf Grid

http://www.kwfgrid.net/

R5

MEDIgRID

http://www.eu
-
medigrid.org/

R5

C3
-

GRID

http://www.c3grid.de/

R6

Open Geospatial Consortium

www.openspatial.com

R7

Semantic OGSA

http://www.semanticgrid.org/GGF/ggf16/papers/OntoGrid
-
GGF16
-
SemGrid
-
Wrkshp.pdf


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


5

/
43


R8

Computational Seismology

http://www.geodynamics.org/cig/software/packages/seismo/

R9

ADAGUC

http://adaguc.knmi.nl

R10

GeoTIFF

http://www.remotesensing.org/geotiff/geotiff.html

R11

CEOS

http://www.ifremer.fr/cersat/en/data/manuals/ce
os.htm


1.4

DOCUMENT AMENDMENT P
ROCEDURE


Comments concerning this document should be sent directly to the authors during the review period.


1.5

TERMINOLOGY


This subsection provides the definitions of terms, acronyms, and abbreviations required to properly
in
terpret this document.


Definitions


Storage Element

Interface to physical data
repositories

in grid environments ex. EGEE

Worker Node

Compute Node in a cluster

User Interface

Access point to the EGEE Grid is the User Interface (UI). This can be any
m
achine where users have a personal account and where their user
certificate is installed.

Replica Location Service

A service in Replica Management System

Metadata

Additional information about ES product data and geolocation

quality information, quick br
owse image, user help, keywords, processing
parameters, e.g. related to image calibration, algorithm information, e.g.
programs, interactive services, documentation

Virtual Organisation (VO)

A VO is an entity which typically corresponds to a particular
or
ganization

or group of people in the real world

Announcement of
Opportunity (AO)

These are regularly issued by ESA. Interested sc
ientists respond to these
AO with a written proposal that is then evaluated by ESA. If accepted
scientists have access to data
/resource/facilities as described in the related
AO.



DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


6

/
43




Glossary and abbreviations


A
AA

Authorization
, Authentication, Accounting

ACL

Access Control List

BUFR

Binary Universal Form for the Representation of meteorological
Data

C3

Collaborative Climat
e Community

CDF

Common Data Format

CF

Climate Forecast

CGG

Compagnie Générale de Géophysique, CGG
-
Veritas

CMT

Centroid and Seismic Moment Tensor

CSDGM

Content Standard for Digital Geospatial Metadata

DG

Directory General

DIF

Directory Interchange Fo
rmat

EC

European Commission

ECMWF

European Centre for Medium
-
Range Weather Forecasts

EGEE

Enabling Grids for E
-
sciencE

EO

Earth Observation

ES

Earth Science

ESSE

Environmental Scenario Search Engine

GOME

Global Ozone Monitoring Experiment

G
-
POD

Gri
d Processing on Demand

GRIB

Gridded Binary

GRIMI

Grid
MIPAS

GTS

Global Telecommunication System

HDF

Hierarchical Data Format

HIRLAM

High Resolution Limited Area Model

K
-
Wf

Knowledge based Work Flow

MARS

Meteorological Archiving System

MEDIgRID

Med
iterranean grid of Multi
-
Risk Data and Models

MERIS

Medium Resolution Imaging Spectrometer

MIMOSA

Modélisation Isentrope du transport Méso
-
échelle de l'Ozone
Stratosphérique par Advection (French)

MIPAS

Michelson Interferometer for Passive Atmospheric S
ounding

MPI

Message Passing Interface

NCEP

National Centers for Environmental Prediction

NL
-
SCIA
-
DC

Netherlands Sciamachy Data Center

NetCDF

Network Common Data Form

NOAA

National Oceanic and Atmospheric Administration

NWP

Numerical Weather Predictio
n

OAI

Open Archive Initiative


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


7

/
43


OGC

Open Geospatial Consortium

OGSA
-
DAI

Open Grid Service Architecture
-
Data Access and Integration

OpenDAP

Open Data Access Protocol

RLS

Replica Location Service

SAR

Synthetic Aperture Radar

SCIAMACHY

Scanning Imaging A
bsorption Spectrometer for Atmospheric
Chartography

SCIAPLUS

Project name for experimenting the coupling of NL
-
SCIA
-
DC and
G
-
POD infrastructure

SE

Storage Element

SEED

Standard for Exchange of Earthquake Data

SGBD

Sistema de Gestión de Bases de Datos (
Spanish: Database
Management System)

SPECFEM3d

Free
Seismic

Simulation Co
de,

see
R8

SPIDR

Space Physics Interactive Data Resource

UI

User Interface

USGS

United States Geological Survey

VOMS

Virtual Organization Management System

WMO

World Meteorologi
cal Organization

WN

Worker Node

WSC

Web Service Center

WSRF

Web Service Resource Framework


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


8

/
43


2

EXECUTIVE SUMMARY


Th
is

report provides an analysis of the
current
Data Management
technologies

used

in ES community.
Method of analysis include
s

a review regar
ding the
different ES related
applications, their use cases,
data provision

and
data flow
,
data access
,

authorisation
,
authentication
,
accounting
and
data policies
.

The conclusions were based on evaluating the answers of the questionnaire which was mainly

focussed on the current

data technologies and data usage policies used by the ES community.

The
Earth Science
community has

a large
variety

of applications and it is not possible to cover every
domain.

Therefore

we opted out the most important and widel
y used applications and
explained them
in detail.


First, a brief description of e
ach of the applications are
given
.

All the
applications

vary in their nature

and cover most of the application areas related to ES.

Few among the applications are new to the

grid
environment where as the others are already related to the Grid. Th
is follows by defining the
use cases

o
f the application

which gives a clear picture on the flow of data in these applications
. It starts mainly
by gathering the input, processing it a
nd registering the output. The above procedure is similar
in

most
cases where as the internal process
v
aries depending on the application.


The
requirement of the Earth science community addresses

different developers. These include for
example the data m
anagement

middleware developers, developers of onto
logies and
Meta

data
models
, database developers

etc
.

They are dealing with a large amount of data
.
Hence p
rovision of
this dat
a

an
d their data flow

during workflow and computation

in different application

is defined.

This
includes

p
re
-
processing of the data
,
their storage and
p
ost
-
processing.

Data flow during computation
requires the transfer of data before the computation starts.


ES deals with huge amount of data which is scattered all around the globe a
nd needs to be organized.
Flat files or databases can be used to
organize

this huge amount of data. Selection of technology
depends on the requirements of the user.
Data retrieval is an important issue and a metadata catalog is
necessary to be defined and
automatically be filled, after result data is stored in a simulation step of a
complex scenario. Middleware developers are addressed to provide services to handle those scenarios.



A brief description of the different information system is given which foc
uses on the organization of
the different ES related observations, analysis and output. This section cove
rs the Data Information
System and the
Geographic Information System which gives a detailed view on the storage, analysis
and visualization of the geog
raphical information.
As ES deals with different data formats i
t
might

be
useful to
have interoperability between the different data formats and a common meta data format
.

Although this will be very complex, as every domain within earth science has one or

more data format
standards, it is worthwhile investigating.

The different data formats like GeoTIFF, FITS, ESRI
Shapefile, GRIB etc are described briefly.


Concerning the amount of data there is no commonness to be seen. The size of the data and the
numb
er of the files depend
s on the application
.

A further important point is the
development of
interfaces, platforms and layers

for distributed storage and access of data using web
service

s
tandards
.
This

does not seem to be established for the complete earth

science community, many different

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


9

/
43


solutio
ns coexist. For comparison

we mention that such approaches are established in the geo
community (sub domain of earth science). The OGC
will

establish standards

by a defined

OpenGIS
Specification.
Now a days a large

variety of Web
Service
s
are available

for data organization, data
exchange

and
d
ata search. OpenDAP and the Open Archive Initiative (OAI) are other examples of
standards used in the climate sub domain of earth science.

As ES deals with huge amount of scie
ntific
data this can be easily analysed using different visualization clients.

Few visualization packag
es
relevant to this community are

described.


AAA plays a significant role for ES providers of ES data.
This
security
mechanism has to be provided
especi
ally for workflows.

This is
followed

by
conclusion which points out the requirements and future
plans
.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


10

/
43



3

APPLICATIONS AND PRO
JECTS


A brief description of all the applications
and the partners involved in the project are

given below.
Mentioned here are up

to now mainly those applications which were results of the WP2 questionnaire.
The applications which were gathered by WP1
are also

added
.

New applications will be added and
analyzed
, when

they provide information to the DEGREE project.


Application/projec
t

Description

Hydrogeological modeling

at the University of

Neuchatel

Partners:

University of Neuchatel


Project description:

Description of the current practice of
hydrogeological modeling at the University of Neuchatel.

Flood Application in

K
-
Wf Gri
d

Partners:

Institute of informatics, Slovak academy of sciences (II
SAS)


Project description:

The Flood Forecasting Simulation Cascade
(FFSC) is a hydro
-
meteorological application, which tries to predict
oncoming floods based on series of simulations of
meteorological,
hydrological and hydraulic conditions in the target area. The
application consists of several simulation steps (with more choices of
used models for each step), attached pre
-
processing and post
-
processing and visualization tools.

All the re
quired simulation models are available and already tested.
Input data for the application’s first step


t桥hm整敯e潬潧i捡l f潲散慳t


i猠 慬獯s 慶慩l慢a攠 潮o 愠 r敧畬慲 扡bi猠 fr潭 t桥h 卬潶慫
ey摲潭整敯e潬潧i捡l f湳nit畴攮eAll 潴桥h i湰畴 摡d愠i猠敩t桥h 慬r敡d

慶慩l慢a攠e漠of⁓ 匠潲⁩猠s潭灵p敤⁦r潭⁥硩獴i湧⁩湰畴⁤慴愮a

T桥hh
-
坦 dri搠獹獴敭 潰oimiz敳 t桥h捯c灯獩ti潮o 慮搠數e捵ci潮o 潦
捯c灬數e w潲kfl潷猠 i渠 愠 dri搠 w敢e 獥rvi捥
-
扡b敤e 捯c灵pi湧
敮eir潮o敮e 批 r敵ei湧 t桥h k湯nl敤e攠 g慴桥h敤e t桲潵o栠 t桥h
m潮ot潲i湧 慮a

灥pf潲m慮a攠 慮aly獩猠 潦 灲敶i潵獬y 數e捵c敤e
慰灬i捡ti潮猺 t桥h 獥m慮ai挠 k湯nl敤e攠 扡b攠 潦f敲猠 慮a i湮潶慴iv攠
慰灲潡o栠 i渠 m慮agi湧 i湦潲m慴i潮o 慮搠 獵灰潲ti湧 rfⰠ w桩l攠 t桥h
m潤畬慲 慲捨ct散t畲攠慬l潷猠慮a敦f散tiv攠捵ct潭iz慴i潮o潦 t桥h獹獴敭
t漠獵灰潲t⁷潲kfl
潷⁢慳敤⁡灰pi捡ti潮猠o渠摩ff敲敮e⁳ 敮eri潳o


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


11

/
43


Application/projec
t

Description

MEDIgRID

Risk assessment of natural
disaster

Partners:
Flood application
-

Institute of informatics, Slovak academy
of sciences (II SAS)
,
Forest fire simulations
-

Associaçao para o
Desenvolvimento da Aerodin
âmica (ADAI), Entente
Interdépartementale en vue de la Protection de la Foret et de
l'Environnement contre l'Incendie (EIPFEI/CEREN), Tecnoma
SA,

Algosystems SA

Landslides and erosion simulations
-

School of
Civil Engineering and Geosciences, University of

Newcastle Upon
Tyne

Vegetation regrowth models
-

Algosystems SA


Project description:

"
Several hazards are triggered by the occurrence
of the fire in the forest in case extreme post
-
fire weather phenomena
should occur in areas dominated by vulnerable soil

and appropriate
geomorphological conditions.

Flash floods, debris flow, landslides, soil erosion and deforestation are
natural hazards directly associated with the occurence and the
behavior of forest fires.

MEDIGRID is a R&D project of the DG Research of

the EC that
addresses the challenge of


providing a modular decision support
framework for assessing multiple hazards, based on Grid
-
enabled
applications and distributed data architecture
."

Thematic Campaigns

Project description:

I
n meteorological, atmos
pheric and oceanic
fields, campaigns of observations coupled with modelling and
simulations are organised on national or international base to better
understand a phenomena. Data management taking into account the
various data policy is an important point
to make available the raw
and/or pre
-
processed data to the relevant campaign partners.

CMT

The Centroid, Seismic

Moment Tensor application

Partners
:
Institut de Physique du Globe de Paris (IPGP)


Project descrip
tion:

CMT is a procedure to obtain the Centroid and
the Seismic Moment Tensor of a given earthquake. As soon as an
earthquake occurs, time and location (geographical coordinates) are
made available by the United States Geological Survey (USGS),
responsible

for monitoring, reporting, and researching earthquakes and
earthquake hazards. The data come from Geoscope that is a network of
seismometers monitoring seismic activity from all around the globe.
Then the information and the Geoscope data are used to run
the code.

Risk management of high
water in Dresden, Germany

Partners:

Dresdner Grundwasserforschungszentrum (DGFZ),
Fraunhofer Institute für Algorithmen und Wissenschaftliches Rechnen
(
SCAI
)
, Technische Universität Dresden (TUD)


Project description:

This

is a local project based on the topological
data of one town.

Factors include:

-

Water flow on surface,

-

Flow of ground water simulation and

-

Water flow in canals.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


12

/
43


Application/projec
t

Description

C3


Grid

Partners:

L
arge consortium of German research institutes
-

(Alfred
-
Wegene
r
-
Institute for Polar
-

and Marine Research, Bremerhaven; Zuse
Institute Berlin (ZIB); University Dortmund; Max Planck Institute for
Meteorology; University of Cologne; German Weather Service
(DWD); German Aerospace Center; Potsdam Institute for Climate
Imp
act Research (PIK))


Project description:

C3 Grid stands for Collaborative Climate
Community Data and Processing Grid. It will support the workflow of
Earth system and climate research. The Grid system will be built out
of existing Grid technology and use
and develop next generation Grid
technology especially in the area of constant description of data, new
storage mechanisms, integrated analysis tools, replication services and
new scheduling mechanisms.

Mosaic

Partners:

International Institute for Applie
d Systems Analysis
(IIASA)


Project description:

Prediction of the global forest development and
the amount of biomass for bio
-
energy production.

Mosaic calculates the annual biomass increment in forests and the
cash income and expenditure. It provides glo
bal maps of forest
biomass, amount of biomass for energy, afforested or deforested areas
and income from forestry under different prices and development
scenarios.

GOME

Partners:

European Space Agency (ESA, Frascati, Italy), the Royal
Netherlands Meteorol
ogical Institute (KNMI, de Bilt, Holland) and
Institute Pierre Simon Laplace (IPSL, Paris, France)

Project description:

The selected use case here involves processing
of the 7
-
years of global atmospheric ozone observations made by the
GOME instrument flyin
g on board of the European ERS satellite and
validating the ozone profiles

so obtained with lidar data.

Geocluster on the Grid, CGG

Partners
: CGGVeritas


Project description:

Geocluster on Grid offers geoscientists a
comprehensive set of processing modul
es and applications to deal with
all aspects of seismic processing. The software is a single portable
source code for all supported platforms, guaranteeing consistency
from workstations, supercomputers and grids.

The processing modules are designed to exec
ute single or complex
functions on a seismic data flow (batch mode). Some modules have
parallelized and optimized versions designed for certain types of
multi
-
processor computers, clusters or grids.

A set of graphic interactive applications provide users w
ith functions
such as flow design, data analysis, parameter definition, event picking,
velocity interpretation, refraction statics, attribute quality control, and
so on.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


13

/
43


Application/projec
t

Description

Environmental Scenario
Search Engine (ESSE)

Partners
: Geophysical Center, Russian Aca
demy of Sciences,
Moscow, Russia; Moscow State University, Russia; National
Geophysical Data Center, NOAA, Boulder, CO, USA; Microsoft
Research, Cambridge, UK


Project description:

Grid framework and software toolbox for the
parallel mining of very large d
istributed databases from multiple
environmental domains.

GRIMI
-
2 (GRIMI)

Partners:

ESA Centre for Earth Observation (ESA/ESRIN)


Project description:

GRIMI
-
2 is the scientific prototype of the
operational Envisat/MIPAS level 2 processor.

MERIS global m
osaic
(MERIS)

Partners:

European space agency (ESA)


Project description:

On
-
demand generation of true color Earth
images mosaics from Envisat data.

Monte Carlo Korba aquifer
(KORBA)

Partners:

Université de Neuchâtel


Project description:

Monte Carlo simu
lations of the flow and the
transport in the Korba aquifer, analysis of the uncertainty in the
pumping rates.

Modeling contamination
migration in heterogeneous
aquifers
(CONTAMINATION)

Partners:

Université de Neuchâtel


Project description
:

A set of softw
are that is used to characterize the
underground and predict the groundwater and contaminant movement
in order to design remediation scheme.

SPECFEM3d

Partners:

Institut de Physique du Globe de Paris


Project description:

Earthquake simulations in complex

three
-
dimensional geological structures.

Space Physics Interactive
Data resource (SPIDR)

Partners:

Geophysical Center Russian Academy of Sciences;
National Geophysical Data Center NOAA,
Colorado USA


Project description:

Distributed database and applicat
ion server
network, built to select, visualize and model historical space weather
data distributed across the Internet.

Stratospheric Ozone in polar
regions (MIMOSA)

Partners:

Institut Pierre Simon Laplace


Project description:

Modeling and validation of
the ozone
concentration in the stratospheric polar regions with the MIMOSA
simulation.

High Resolution Limited
Area Model (HIRLAM)

Partners:

The Royal Netherlands Meteorological Institute (KNMI, de
Bilt, Holland)


Project description:

The aim of the Hirla
m programme is to develop
and maintain a numerical short
-
range weather forecasting system for
operational use by the participating institutes.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


14

/
43


Application/projec
t

Description

NWP data processing

Partners:

The Royal Netherlands Meteorological Institute (KNMI, de
Bilt, Holland)


Project d
escription:

In order to get an overview and a better insight
in the workflow of the various ES applications, it should be analyzed
how the data is processed in the different steps of a typical workflow

SCIAMACHY product
exploitation and user service
(SCIA
PLUS)

Partners:

The Royal Netherlands Meteorological Institute (KNMI, de
Bilt, Holland)


Project description:

To facilitate the generation of consistent data
sets of SCIAMACHY products, sometimes in synergy with other
instruments, and to make them availabl
e.



DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


15

/
43




4

TYPICAL SCENARIOS AN
D USE CASES


A brief description
of
scenarios and use cases with respect to the data flow is given below. The

applications are divided in

three families:

Family 1:
Simple applications

Family 2:

Complex applications

Family 3:

Com
plex workflow applications


Family 1:
S
imple applications

By "simple" it is intended that these applications do not require any complex interaction with the user,
nor do they follow a complicated workflow. They usually are (massive) batch applications, whi
ch can
be easily executed by the means of a script.


Family

2: C
omplex applications

This family regroups all the applications, which have a special need, thus making them more complex
and adding difficulty to port them to a Grid. Th
is

complex
ity

may be

res
ult

of

various
requirements,
such as
:



Processing requiring user interaction



A complicated data access



A complicated authentication or security mechanism



Applications running as local memory processes in parallel which are based for example on
the Message
passing interface (MPI)



Real time or operational requirements, or license needs


Family 3: C
omplex workflow applications

Specific to this family is the complex orchestration of different parts of the application, the workflow.
Whenever an application does
not run in a linear or straightforward way, or performs different tasks
depending on some conditions,
the complexity increases significantly

because one has to take into
account all the possible scenarios. The applications of this kind usually are also com
plex in a Family
2 sense; h
owever, the applications
categorized in Family 3

have distinct requirements
.


Another characterization is whether the applications are already
,


-

on the grid (G)



whether their functions are completly or partly deployed on a g
rid


environment
,

-

not yet gridified (N).



DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


16

/
43


Application

Scenarios and/or use cases and/or typical data
management operations and/or typical workflow

Family

Grid

Hydrogeological modeling

-

Data collection

-

P
reprocessing

2,
G

Flood Application in K
-
Wf
Grid

-

Precondition: All necessary data is stored on the grid, data
are

registered in RLS, metadata stored in ontological store

-

Data discovery in ontological store

-

Retrieving data location

-

Copy files to local storage

-

Store result files and meta
data

2, G

MEDIgRID
-

Risk assessment of natural
disaster

-

Data discovery

-

Retrieving of data location

-

Transferring the data from data location

-

Registering of newly created data sets in Metadata
catalogue and file catalog

1, G

Thematic Campaigns

-

Transfer data from instruments to CD/DVD or directly to
storage

-

Data extraction

-

Data quality check

-

Adding of additional parameters

-

Store data at data centre

1, N

CMT

The Centroid, Seismic

Moment Tensor application

-

Data collection via email a
nd web page

-

Preprocessing including user interaction

-

Copy files to grid user interface

-

Results stored in archive files on the grid

1, N

Risk management of high
water in Dresden, Germany

-

Collect data from governmental organization and
additional m
easurements

-

Extraction of simulation data

-

Store extracted data on local storage

-

Store results on local storage

2, N

C3
-

Grid

-

Data search

-

User specific data preparation

-

Comparison of data of different types and data providers

-

Visualization

1,G

Mosaic

-

Precondition: Input data on local storage

-

Store results on local storage

1, N

EO Data Grid Application


-

Transfer input data to the grid

-

Store results, extract and store metadata

-

Retrieve result data for validation

-

Visualization

1, G

CGG

GeoCluster on the Grid

-

Transfer data from tape drives to the grid, and reverse way

-

Multiple transfer between UI,SE and WNs

-

Store result files and metadata on storage place or Oracle
database,

-

Data selection and transformation,

-

Visualiza
tion

2,

G

Environmental Scenario
Search Engine (ESSE)

-

User selects environmental data source, "probes"
(representing spatial locations of interest, e.g. Moscow),
3, G


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


17

/
43


Application

Scenarios and/or use cases and/or typical data
management operations and/or typical workflow

Family

Grid

environmental parameters and sets the fuzzy constraints on
them

-

Fuzzy search web
-
service

collects data from the data
source for the selected parameters and time interval,
performs the data mining, and returns to the ESSE web
application a ranked list of candidate events with links to the

event visualization and data export pages

-

Visualizati
on of interesting events and retrieval
of
the
event
-
related subset of the data for download

MERIS global mosaic
(MERIS)

-

Input products selection via the Web

Portal.

-

Product retrieving

-

Binning and aggregation of products

-

Mosaicking

-

Conversi
on to image formats and publishing

3,
G

Monte Carlo Korba aquifer
(KORBA)

-

Data gathering

-

Homogenization of the data in a spatial database (ArcGIS)

-

Simulation (Statistical analysis, conceptualization of the
groundwater, calibration of the 3D groundwa
ter model )

-

Set up of The Monte Carlo simulations

-

Geostatistical analysis

-

Write scripts (shell, Matlab) in order to format (flat files:
ASCII, TXT...) the pumping rates simulations and other
inputs of the groundwater model.

-

Run CODESA
-
3D on 100 gr
id nodes to solve flow and
transport using one
of
the 100 simulations available for the
pumping rate.

-

Post
-
process the outputs of the model (statistical analysis,
maps of probability drawing...)

3, G

Modeling contamination
migration in heterogeneous
aqu
ifers
(CONTAMINATION)

-

Data
preconditioning

-

Shell script controlled jobs workflow

-

Results extraction

-

Validation

3, G

SPECFEM3d

-

Edit input files

-

Compile

program (for memory optimization)

-

Run a binary to generate input files

-

Upload outputs on

one (or several) Storage Elements

-

Fetch generated files for second stage of computation (2
MPI binaries)

-

Results uploaded to Storage Element

2, G

Space Physics Interactive
Data
resource

(SPIDR)

-

Execute advanced search environmental archive queries
based not only on metadata but on the included data content;

-

Conduct content
-
based query and data retrieval from
virtual observatories.

-

Generate on
-
the
-
fly products interactively using existing
data and metadata, as well as conducting detailed analysis
;

-

Expand their ability to use and incorporate data from
3, G


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


18

/
43


Application

Scenarios and/or use cases and/or typical data
management operations and/or typical workflow

Family

Grid

disciplines other than their own.

Stratospheric Ozone in polar
regions (MIMOSA)

-

Gather and preprocess data (format
conversion
)

-

Execute MIMOSA
-
CHIM application

-

Run results
through

Matlab


潺潮攠o慰a

-

s慬i摡d攠e桥⁲敳畬t猠sn

捯c灡pi獯渠s漠獡t敬lit攠摡t愮

㈬⁇

High Resolution Limited
Area Model (HIRLAM)

-

Data preprocessing

-

Data assimilation

-

Integration of model equations

-

Horizontal and vertical interpolation of in
-

and output

-

Out
puts post processing

3, G

NWP data processing

-

Meteorological observations are screened on validity

-

Merged with a simulated atmosphere from physical model
equations

-

The results are archived.

-

Extract data from the Meteorological Archiving System
(MA
RS) at ECMWF

-

Copy the data to local workstation by FTP

-

Process the data and visualize the result.

2, N

SCIAMACHY product
exploitation and user service
(SCIAPLUS)

-

Pre
-
processing step, based on the date and time of the
input orbit (SCIAMACHY level 1b
or level 1c orbits as
input, initialization parameters files)

-

Computation

-

Processing in chronological order (one SCIAMACHY
measurement after the other)

-

The retrieval per measurement can be a fit to one or more
modeled spectra and can contain several
iteration steps.

2, G



DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


19

/
43



5

DATA PROVISION AND D
ATA FLOW


In this section we describe the data provision and data flow for the earth science applications and
activities whose descriptions were collected in initial tasks of DEGREE WP2.


The process of genera
ting results can be divided into four steps:

1


data flow step: data collection

2


data flow step: data search (meta data, ontologies), data preparation

3


data flow step: storage operations before and after computing steps

4


data flow step: data "exploitation/
evaluation" (postprocessing)

5.1

DATA PROVISION


The nature of the data in the different steps of the data flow gives information about the needs of
storage types and access types needed by the ES community. In this section we describe data
provision in ES ap
plication and ES activities.


We distinguish between applications which are already running in a grid environment and
applications which are not prepared for grid computing yet. Data provision and data preparation are
sometimes part of the
processing
flow
and sometimes they are not. Within this text we will consider
preprocessing (data preparation) as part of the
processing
flow if possible.
As a matter of fact it
depends on the
sources of data
and their location
. We'll see that the provision of the data ca
nnot
always be implemented as a part of the
processing
flow.

Considering the

origin of the input data we
can
distinguish between:

1

Measurements, radar or satellite images,

2

scientific knowledge (to add information to raw data in preprocessing step),

3

resul
ts or input data of prior computation/simulation/job


Measurements are usually stored at governmental or scientific sites. Data is provided via CD/DVD,
data bases and flat files on (computer) storage via direct access (login), grid middleware solutions or
web services. They can be stored locally or on remote sites.


In this application data gathered from 1 and 2 are treated separately due to their origin. This means
that they cannot be treated directly as compared to simulation data. Now a days lot many res
earches
are going on for treating this kind of data but this is not used that often. Data from 3 might be
integrated in a complex workflow. In general the data is not encrypted.


Depending on the scenario

and the origin
,

the data is collected in different
ways
.

Examples are given
below.



In the
CMT

scenario the question is
-

H
ow to get the initial data when an earthquake occurs
:
The user receives an alert from the USGS when an earthquake occurs. He/She can then look
for the seismic stations available dependi
ng on the distance to the earthquake location and the
time scale phenomenon pertinent for the algorithm. The user contacts the Geoscope data

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


20

/
43


centre using a NetDC request (email) about the earthquake. The data centre creates and makes
available the relevant

data under the SEED file format, even if the data come from another
network.



In the case of

KWF
-
Grid

for example, t
he necessary data is already stored on a storage
element.

The collection of data in a workflow is supported by Ontologie description.

5.2

DA
TA FLOW DURING WORKF
LOW


In order to get an overview and a better insight in the data flow of the various ES applications, it
should be analysed how the data
are
processed in the different steps of a typical workflow. In this
section we describe data flows

in ES application and ES activities.


Data search and preparation (Preprocessing)

In Earth Science a typical task in the preprocessing phase is the extraction of the data. Computations
usually need just a certain time period and a certain location. The or
iginal data which suits these
constraints must be found and extracted. As a second step the information is converted to a suitable
format and coordinate system for the application. In some cases it is necessary that the user interacts
during this process
.


Preprocessing

Data search

Data collection

Extraction based on time space constraints

Format conversion

Coordinate transformation

Diagnosis of additional variables

Mesh generation

Check of the data quality

Retrieving of relevant parameters


Exa
mples for preprocessing activities in the different applications:

Application

Preprocessing activities

C3
-
GRID

-

R
eduction in size by space
-
time
-
variable constraints

-

C
oordinate transformation

-

F
ormat conversion

-

D
iagnosis of additional variables

-

T
e
mporal mean

-

U
ser defined

K
-
wf
-
Grid

-

D
ata reduction

High water

-

D
ata extraction and reduction

-

M
esh generation

CMT

-

D
ata reduction

-

F
ormat conversion

-

C
heck of data quality

Thematic campaigns

-

E
stimation of the quality of data

-

R
etrieving of r
elevant parameters


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


21

/
43


Application

Preprocessing activities

-

A
dding of additional information

Hydrogeological modelling

-

D
ata collection

-

I
dentification of geometry

-

D
iscretization

Geocluster

-

Format transformation (from SEGY/SEGD to CGG formats)

-

Noise filtering

NWP data processing

-

Q
u
ality control

-

D
ata thinning

Monte Carlo Korba aquifer
(KORBA)

-

F
ormat and homogenize data;

-

F
ormat, interpolate and filter data in order to be adapted to the needs
of the various “sub
-
application” involved

䵯摥li湧⁣潮 慭i湡ni潮o
migr慴i潮⁩渠桥n敲潧
敮e潵猠
慱畩f敲猠sClNTA䵉NATflN)

-

䕸瑲慣t⁤慴愠ar潭慲g敲⁤慴愠a整猬⁦潲m慴⁡湤⁣潮o敲t⁤慴愠ar潭湥
慰灬i捡ti潮⁴漠o桥h桥h

䵅Mf匠pl潢慬潳oi挠
E䵅Mf匩

-

a慴愠a硴r慣ti潮⁢慳敤渠e敭灯p慬⁲慮a攠e湤⁧敯er慰桩捡l⁡ 敡
獥l散t敤⁢e⁴桥⁵h敲

-

䙯牭慴⁣潮 e
r獩潮

-

C潯o摩湡n攠e潮o敲i獯s


Storage operations

From the use cases we gathered
,

we can derive that the necessary storage operations does not differ
from the well known requirements and implementations which we find in grid solutions. As far as we
could
analyse the information we got
,

there are usually either no dependencies or linear dependencies
of the data produced in the different steps of a technical workflow.

The only difference concerns the
need of access control to data stored on a Grid storage ac
cording
to
the attached data policy.


Postprocessing

As postprocessing and follow
-
up activities we could find out the following


Postprocessing

visualization

validation
of
the results

registering results in metadata catalogue


5.3

DATA FLOW DURING COM
PUTAT
ION


Data are usually transferred to the execution site before the computation takes place. This procedure
is typically performed for each step of the workflow.



DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


22

/
43


Data flow during computation
could be
reduced to MPI messaging between parallel processes of
the
job. MPI is used in C3
-
Grid, risk management of high
water in Dresden, flood prediction application,
HIRLAM, NWP processing and Specfem3d.


5.4

RELATION TO GRID ENV
IRONMENTS



Some of the cases gathered in DEGREE WP tasks are already adapted for the grid e
nvironment.

The
following table shows a
ll the considered

application and
their relation
to grid.

WS
-

WebService, G
-

Grid, NG
-

Not yet Gridified, N
-

No, Y
-

Yes
, U
-

Unknown






Application/

Project

Domain

Family

1

Family

2

Family

3

G

NG

WS

University of

N
euchatel

Hydrogeological
modeling

N

Y

N

Y

N

N

K
-
Wf Grid

Flood Application

N

Y

N

Y

N

Y

MEDIgRID


Risk assessment of
natural disaster

Y

N

N

Y

N

Y

Thematic
Campaigns

Meteorology,
atmospheric and
oceanic

Y

N

N

N

Y

N

CMT

Earthquale

Y

N

N

N

Y

N

Dresden,
Ge
rmany

Risk management of
high water

N

Y

N

N

Y

N

C3


Grid

Climate

Y

N

N

Y

N

Y

Mosaic

Forest

Y

N

N

N

Y

U

EO DataGrid
Applications

ESA, KNMI,
IPSL

Space

Y

N

N

Y

N

U

CGG

Geocluster on the
Grid

N

Y

N

Y

N

Y


ESSE

Environmental
Scenario Search
N

N

Y

Y

N

Y


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


23

/
43


Engine

MERIS
global mosaic
(MERIS)

Generation of true
col
our earth images
from Envisat data

N

N

Y

Y

N

U

Monte Carlo
Korba
aquifer
(KORBA)

G
roundwater
exploitation

N

N

Y

Y

N

U

CONTAMIN
ATION

Modeling
contamination
migration in
heterogeneous
aquifers

N

N

Y

Y

N

U

SPECFEM3d

Numerical simulation
of Earth Quakes

N

Y

N

Y

N

U

SPIDR

Visualise and model
historical space
weather data

N

N

Y

Y

N

U

MIMOSA

Stratospheric Ozone
in polar regions

N

Y

N

Y

N

U

HIRLAM

Weather forecasting
system

N

N

Y

Y

N

U

NWP data
process
ing

Analysing data in a
typical workflow

N

Y

N

N

Y

U

SCIAMACH
Y
(SCIAPLUS)

Product exploitation
and user service

N

Y

N

Y

N

U



They differ in the way they benefit from the grid: C3
-
Grid and flood prediction applications exploits
the grid environment to us
e provided high
-
performance computing power. Atmospheric ozone
observations
,
seismic moment tensor application and risk management applications in MEDIgRID
project take advantage of the data sharing aspect of the grid infrastructure. Risk management
applic
ations in MEDIgRID use
s

the grid
-
like environment for sharing risk management applications
between the consortium institutes.


In experiments with Grid, many files were stored on SE. For GOME s
atellite the number of files were

approximately 76000 correspon
ding to 7
-
years worth of GOME ozone profiles for two algorithms.
The metadata catalogue relative to the GOME satellite and lidar files was managed by PostgresSQL
installed on an OGSA
-
DAI server. In order to find collocation between satellite and lidar stat
ion the
satellite orbits were described by polygon
s and geospatial requests

to find the relevant lidar and
satellite files.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


24

/
43



6

DATA ACCESS


6.1

DATA ORGANIZATION


Introduction

Like other sciences ES needs to deal with enormous amounts of data (both in size an
d number of
files) and large computational needs. But what makes ES diffe
rent from other Science domains is

ES
deals with geospatial data, with time components (4D) and consists of many different domains,
scattered among all countries in numerous institute
s, using complex work. For ES
,

e
-
science can be an
essential improvement in research, especially when Grid services can be coupled to specific ES
(legacy) web services.

The data comes from a very large variety of sources
, which can be divided into two cate
gories:


1.

I
nstruments
:
I
nstruments

providing 1D, 2D or 3D observations from ground as well as aboard
balloons, planes, rockets and satellites with various time resolutions, and

2.

Models:
N
umerical results from simulations or modeling. The variety of sources
has an
immediate impact on the data size, as a consequence on the data organization, flat files or
database, existing or not metadata. Those data concerned different ES communities that have
in general their specific standard format for data exchange.



Fl
at files

Flat file
is

the most common way to store data in Earth Science. According to the user community and
the origin of the data a description of the file content may exist via a metadata catalogue

or can be
included in the file itself
. The organizatio
n of the data into files is dependent on their size, the end
-
users, and the number of files created.

Different formats are used within each ES
domain;

common
examples are NetCDF (climate domain), HDF (4/5/EOS, satellite domain),

GRIB, BUFR
(meteorological
domain).

As an example, the data of ozone profiles observed by a given ground
-
based lidar are stored in a
monthly file containing daily profiles averaged over each night. The two parameters characterizing the

flat files are the
l
idar station and the date (
year and month). The GOME satellite data of total content
of ozone or ozone profiles are stored by orbit with 14.5 orbits a day. In this case the only parameter is
the date and time. The geographical location of the observation is done by geospatial searc
h
afterwards. If the geographical parameter is also a main parameter then every ozone profile retrieved
from GOME data is stored in a file that leads to a very large number of files to manage, approximately
20000

files

per day.

Taking into account user and

data parameters
,

flat files may have a very complex architecture.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


25

/
43



Database

ES d
ata
can also be

organized in databases. It depends on the needs of the end
-
users, the size of the
data and the organization of the data provider. Commonly there are
three
way
s of referencing data in
database:

1.

The first one is to divide metada
ta and data files into two sets.

M
etadata is integrated in
database tables and they reference the path to access data files
stored in flat files
. Generally,
this way of referencing is used

for huge data files.

2.

The second way is to integrate metadata and data files together in database tables. Generally,
data files are stored as blob, big vectors or as individual data. This way is to be used for small
or medium
-
size data files.

3.

The data is c
omposed of database records. No external files are needed. The data and metadata
are in the same database. Examples are time series of measurement data.


In the two cases the data request is related to the metadata catalog; the first case preserves data
in
tegrity (native data format organization), the second case obliges to re
-
build data files organization
after request but permits extraction or calculation on the data.


6.2

INFORMATION SYSTEMS


One of the main questions in ES
is

how Observation
s
, Output and An
alysis Data
are
organized and
stored whereas the challenges for Earth Science are the distribution and heterogeneity of data sources,
the size of the data and the a
nnotation of data
.


D
ata

Information systems

Metadata
stored in catalogs
are used to docum
ent and describe the attributes and contents of datasets,
databases, images, maps, print documents, interactive applications and other catalogs and collections
of resources that are available both on
-
line and off
-
line. A metadata record

of a database may

c
ontain
a detailed description of the resource, along with access and/or ordering information, much like a
library catalog describes the books in a library and where to find them. Metadata records enable the
discovery of data and information.


So we can dis
tinguish the Information
-

or
Repositories

by the different types of Metadata like:



Resource metadata to describe compute
r

and storage resources for match making (job and
data placement) and monitoring.



Discovery
metadata

to describe data objects which are
needed for scientific description for
example with ISO 19115/19139, Dublin core.



Use metadata describing data objects and
files needed

for access on data,
esp
ecially

on file
level (NetCDF).


Most often the Content Standard for Digital Geospatial Metadata
(CSDGM), Vers. 2 (FGDC
-
STD
-
001
-
1998) from the US Federal Metadata standard is used or the Directory Interchange Format (DIF)

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


26

/
43


from NASA. Standard of Meta data catalog attri
butes are: name of the experiment

and sensor, data
type (ground, airborne,
satellite
)
, parameter, temporal coverage, spatial coverage, altitude coverage,
data level, version, format, data files number and size

and

des
cription of the data production.


The main
RDBMS
used are MySQL,
M
icrosoft
SQL server,
PostgreSQL, and Oracle. The choice
de
pends on the number of files to list, their content complexity, the need of specific computations or
functionalities, and the community tradition.

Another characteristic of ES is the use of the same datasets for different applications on different
compute
rs. Then the metadata catalog is done only once, updated regularly and used by the different
groups. (The synchronization of all the copies is what makes difficult the duplication of the database
on another
RDBMS
.)


GEOGRAPHIC

INFORMATION SYSTEMS

Geographi
c Information System

(
GIS) provide functions and tools necessary for storage, analysis and
visualization of the geographical (spatial) information. The key components of

a GIS software product

are: tools to input and manage geographical information, a data
base backend, tools to support spatial
queries, analysis and visualization, and a graphic user interface.


Data representation

Data in GIS can be represented as a collection of thematic layers such as relief, cities, roads,
population density, etc. There
are two basic data types in GIS: vector and raster. The vector type is
used to manage points, lines and polygons. Raster type is used for imagery data such as satellite
images,
digital elevation model

(DEM), etc. Usually data can be converted by GIS

from o
ne type to
another. M
ost of the GIS used by Earth Science (ES) community support both data types. The most
common storage is a relational database
with spatial

queries support; for example MySQL,
PostgresSQL or Oracle. Spatial databases have a set of func
tions for finding distances between points
and polygons, location and topology, and they have a spatial index for speeding up those operations.
Data in GIS can be stored in a proprietary data format, e.g. shape files. A shapefile is a thematic layer
that i
s used to store vector data. The size of the data can vary from several kilobytes (city list) to
terabytes (DEM).




Data analysis

There are different areas of applications for GIS and geospatial data. GIS allows
analyzing

a wide
range of geological and

ecological information, and it has a great potential to manage dynamic models
of environmental processes. For example numerical models can predict watershed, urban grows,
forest fires, etc. Such models can be written as GIS plug
-
ins in computer languages
like Fortran
,
Python

or Java and they can be implemented within Grid infrastructure.


GIS software

GIS software can be commercial or open source. The major commercial GIS are: ESRI ArcGIS,
Autodesk MapGuide, GeoMedia and MapInfo. Open source systems inclu
de GRASS, Minnesota
MapServer, PostGIS. Networked GIS software has two types of architecture based on thick or thin

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


27

/
43


client technology: GIS viewers and Web API. Most of the GIS software supports
Open Geospatial
Consortium

(
OGC) standards for data access an
d exchange, but sometimes vendors use their own
network protocols.


Web Services

OGC

supports several standards for geospatial web services. The main specifications are Web Map
Services (WMS), Web Feature Services (WFS) and Web Coverage Services (WCS).

WM
S manages raster data type objects represented as images (digital maps). It has three types of
queries: GetCapabilities, GetMap
and
GetFeatureInfo. It produces a raster map in graphical format
such as GIF or JPEG. Operations with WMS can be performed with
JavaScript by submitting a URL
form via a web browser.

WFC manages vector data type objects described by
Geography Markup Language (GML).

It is
XML
-
encoded geospatial data that allows a user to create multilayer maps from different sources by
overlaying ve
ctor data. In applications WFC is used for visualization of waypoints, cities, etc.

WCS serves to describe, request and deliver multi
-
dimensional coverages or multilayer rasters. It
supports the exchange of geospatial data as “coverages” containing value
s or properties of geographic
locations. WCS can be used to dispatch netCDF or other binary datasets via network protocols to
client application.


GIS Viewers

Most of the GIS viewers are based on thick client technology. Those applications can read and
d
isplay different file formats, such as Geotiff, shapefiles. Some of them can access remote GIS
servers over
proprietary protocols
. The viewers (GIS Viewer, ArcExplorer) are usually free and have
basic functionality. They can perform zoom, select, query and

buffering operations. There are
advanced application:
NASA World Wind, Google Earth and Microsoft Virtual Earth 3D. These are
3D applications, which require a hardware openGL or DirectX support on client computer. They can
be extended by the communities t
hat provide their own toolboxes. For example NASA World Wind
can be extended with toolboxes written in C++, and it supports OGS standards. An example of an
“extension” for World Wind can be Environmental Scenario Search Engine (ESSE) toolbox. ESSE
toolbox
requests weather and satellite data from OpenDAP server and overlaps it over the 3D globe.
Google Earth uses advanced features called
Keyhole Markup Language (
KML) and GeoRSS. KML is
a tag
-
based XML structure which allows users to place their own data ov
er GIS data in the Google
Earth client. It is also used for modeling and storing geographic features such as points, lines,
polygons and images. GeoRSS is similar to KML, but it has advanced features such as visualization
styles, view angle and others. M
icrosoft Virtual Earth also supports GeoRSS standard and allows
users to create their own toolboxes and tile
-
servers. Those features are needed for data exchange and
make it possible to create new hybrid services (mashups) for the ES community.


Web API

Web API is based on thin client technology.
There are several implementations of Web API. In most
cases image rendering is done on the server side and a client has only a web browser with a
JavaScript, V
isual Basic for Applications (VB
A) or flash support.
Examples are

Google Maps,
Virtual Earth and Minnesota MapServer.
Use of browsers and scripts allows
creating

distributed GIS,
independent from the computer platform and the operating system of the client.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


28

/
43



6.3

DATA FORMATS


In ES, there is no single

data form
at. Each community or data provider builds his own “standard”
format either using the auto
-
descriptive format like NetCDF or HDF (ex. HDF
-
EOS) or without using
it.
Those „standard“ formats

may differ in many points. The data are stored in ASCII as well as
in
binary mode depending on their volume and the format chosen. As a matter of fact, many times the
data format is related to the database where data are stored at national, international or in a campaign
level. The same data can be stored in different for
mats according to their storage location. There is a
multitude of not auto
-
descriptive formats due to the large variety of parameters and the use of the
data.

A brief description of the different data formats used in ES
is

given below.


GeoTIFF

GeoTIFF is
a public domain metadata standard which allows georeferencing information to be
embedded within a TIFF file. Aldus
-
Adobe's public domain Tagged
-
Image File Format (TIFF) has
emerged as one of the world's most popular raster file formats. But TIFF remains li
mited in
cartographic applications, since no publicly available, stable structure for conveying geographic
information presently exists in the public domain.


The potential additional information includes projections, coordinate systems, ellipsoids, datums
, and
everything else necessary to establish the exact spatial reference for the file. The GeoTIFF format is
fully compliant with TIFF 6.0, so software incapable of reading and interpreting the specialized
metadata will still be able to open a GeoTIFF file
.


FITS

FITS stands for `Flexible Image Transport System' and is the standard astronomical data format
endorsed by both NASA and the IAU.

FITS is much more than just another image format (such as
JPG or GIF) and is primarily designed to store scientific da
ta sets consisting of multidimensional
arrays and 2
-
dimensional tables containing rows and columns of data.



A FITS file consists of one or more Header + Data Units (HDUs), where the first HDU is called the
`Primary HDU', or `Primary Array'. The primary a
rray contains an N
-
dimensional array of pixels, such
as a 1
-
D spectrum, a 2
-
D image, or a 3
-
D data cube. Five different primary data types are supported:
unsigned 8
-
bit bytes, 16 and 32
-
bit signed integers, and 32 and 64
-
bit single or double precision
floa
ting point reals. FITS can also store 16 and 32
-
bit unsigned integers.


Any number of additional HDUs may follow the primary array; these additional HDUs are called
FITS `extensions'. There are currently 3 types of extensions defined by the FITS Standard:




Image Extension
-

a N
-
dimensional array of pixels, like in a primary array



ASCII Table Extension
-

rows and columns of data in ASCII character format


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


29

/
43




Binary Table Extension
-

rows and columns of data in binary representation


ESRI Shapefile

The ESRI Shape
file is a popular geospatial vector data format for geographic information systems
(GIS) software. It is developed and regulated by ESRI as a (mostly) open specification for data
interoperability among ESRI and other software products. A "shapefile" common
ly refers to a
collection of files with ".shp", ".shx", ".dbf", and other extensions on a common prefix name (e.g.,
"lakes.*"). The actual shapefile relates specifically to files with the ".shp" extension, however this file
alone is incomplete for distribu
tion, as the other supporting files are required.


Shapefiles spatially describe points, polygons, polylines. These, for example, could represent water
wells, lakes and rivers, respectively. Each item may also have attributes that describe the items, such
as the name or temperature.


GRIB

The World Meteorological Organization (WMO) Commission for Basic Systems (CBS) Extraordinary
Meeting Number VIII (1985) approved a general purpose, bit
-
oriented data exchange format,
designated FM 92
-
VIII Ext. GRIB (GRIdde
d Binary). It is an efficient vehicle for transmitting large
volumes of gridded data to automated centers over high
-
speed telecommunication lines using modern
protocols. By packing information into the GRIB code, messages (or records
-

the terms are
synony
mous in this context) can be made more compact than character oriented bulletins, which will
produce faster computer
-
to
-
computer transmissions. GRIB can equally well serve as a data storage
format, generating the same efficiencies relative to information s
torage and retrieval devices.


Each GRIB record intended for either transmission or storage contains a single parameter with values
located at an array of grid points, or represented as a set of spectral coefficients, for a single level (or
layer), encoded

as a continuous bit stream. Logical divisions of the record are designated as
"sections", each of which provides control information and/or data.


BUFR

Binary Universal Form for the Representation (BUFR) is a World Meteorological Organization
(WMO) stand
ard binary code for the exchange and storage of data. The self
-
describing format is
designed to represent, employing a continuous binary stream, any meteorological data. A BUFR
"message" (or record) containing observational data of any sort also contains a

complete description
of what those data are: the description includes identifying the parameter in question, (height,
temperature, pressure, latitude, date and time, whatever), the units, any decimal scaling that may have
been employed to change the preci
sion from that of the original units, data compression that may have
been applied for efficiency, and the number of binary bits used to contain the numeric value of the
observation. This data description is all contained in tables which are the major part
of the BUFR
documentation.


CDF

The Common Data Format (CDF) is a self
-
describing data format for the storage and manipulation of
scalar and multidimensional data in a platform
-

and discipline
-
independent fashion. When one first

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


30

/
43


hears the term "Common Data

Format" one intuitively thinks of data formats in the traditional (i.e.
messy/convoluted storage of data on disk or tape) sense of the word. Although CDF has its own
internal self describing format, it consists of more than just a data format. CDF is a sc
ientific data
management package (known as the "CDF Library") which allows programmers and application
developers to manage and manipulate scalar, vector, and multi
-
dimensional data arrays.


HDF

Hierarchical Data Format, commonly abbreviated HDF, HDF4, or

HDF5 is a library and multi
-
object
file format for the transfer of graphical and numerical data between computers. It is created and
maintained by the NCSA. The freely available HDF distribution consists of the library, command
-
line
utilities, test suite
source, Java interface, and the Java
-
based HDF Viewer (HDFView).


HDF supports several different data models, including multidimensional arrays, raster images, and
tables. Each defines a specific aggregate data type and provides an API for reading, writing
, and
organizing the data and metadata. New data models can be added by the HDF developers or users.


HDF is self
-
describing, allowing an application to interpret the structure and contents of a file without
any outside information. One HDF file can hold a

mixture of related objects which can be accessed as
a group or as individual objects. Users can create their own grouping structures called "vgroups."


NetCDF

Network Common Data Form (NetCDF) is a machine
-
independent, self
-
describing, binary data format
standard for exchanging scientific data. The format is an open standard [
NetCDF
], which was
originally based on the conceptual model of the NASA CDF but has since diverged and is not
compatible with it.


The data format is "self
-
describing". This means tha
t there is a header which describes the layout of
the rest of the file, in particular the data arrays, as well as arbitrary file metadata in the form of
name/value attributes. The format is platform independent, with issues such as endianness being
address
ed in the software libraries. The data arrays are rectangular, not ragged, and stored in a simple
and regular fashion that allows efficient subsetting and hiperslabbing.

The NetCDF standard describes in detail syntactic and binary representations of the d
ata files,
including the network access protocol, but it does not specify in full the semantics of data stored.
There exist several conventions on how to store metadata (variable names, units of measure) in the
header of a NetCDF file, among them the COARD
S conventions sponsored by the Cooperative
Ocean/Atmosphere Research Data Service to be the most frequently used.


CEOS

The Committee on Earth Observing Systems (CEOS) has defined a structure for SAR data and the
headers that accompany it. A CEOS data set
consists of 3 files : a leader file, an image file, and a
trailer file. In addition, each tape has a volume descriptor.
This structure, called the CEOS format,
defines the file structure of the Spaceborne Imaging Radar
-

C (SIR
-
C)

data as it is found on CE
OS
format tapes.

The CEOS format tape is similar to data normally distributed by NASA.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


31

/
43


Software has been written to read this data from tape, write it to disk, decode the CEOS headers,
decompress the data, average the data, synthesize byte images, and con
vert between various other
common formats.


6.4

DATA FILE SIZES


Due to the large variety of instruments and model or simulation outputs, there is a large variety of data
file sizes ranging from several bytes to Tbytes. Many times a compromise is found between

a large
number of files and the corresponding sizes.

Different ES specific applications needs fast transfer of
large files and fast transfer of large amount of smaller files. An infrastructure has to be implemented
for this kind of transfers.


6.5

NETWORKED D
ATA PROTOCOLS


OpenDAP

OPeNDAP, an acronym for "Open
-
source Project for a Network Data Access Protocol", is a data
transport architecture and protocol widely used by earth scientists. The protocol is based on HTTP
and the current specification is OPeNDAP 2
.0 draft. OPeNDAP includes standards for encapsulating
structured data, annotating the data with attributes and adding semantics that describe the data.

An OPeNDAP client could be an ordinary browser, although this gives limited functionality. Usually,
an

OPeNDAP client is a graphics program or web application linked with an OPeNDAP library. An
OPeNDAP client sends requests to an OPeNDAP server, and receives various types of documents or
binary data as a response. One such document is called a DDS (receive
d when a DDS request is sent),
that describes the structure of a data set. A data set, seen from the server side, may be a file, a
collection of files or a database. Another document type that may be received is DAS, which gives
attribute values on the fie
lds described in the DDS. Binary data is received when the client sends a
DODS request.


McIDAS ADDE

McIDAS (Man computer Interactive Data Access System) is a suite of sophisticated software
packages that perform a wide variety of functions with satellite
imagery, observational reports,
numerical forecasts, and other geophysical data. Those functions include displaying, analyzing,
interpreting, acquiring and managing the data.


ADDE (Abstract Data Distribution Environment) is a remote data access protocol
originally
developed for geolocated data that communicates requests from client applications to servers, which
then return data objects back to the client. Supported satellites include AVHRR from POES, NOAA,
and MODIS from Terra and Aqua, NASA.



DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


32

/
43


McIDAS sof
tware package has a built
-
in ADDE support. A Java interface (part of VisAD) is available

for non
-
McIDAS clients, such as VisAD, Unidata IDV, MATLAB, IDL. These free open source
ADDE servers are a subset of the servers included with the full version of McI
DAS.


6.6

DATA ANALYSIS AND VI
SUALIZATION CLIENTS


In this section we will present data analysis and visualization programs relevant for the scientific data
formats and network servers, mentioned earlier in the report. There are two main commercial general
pu
rpose data analysis and visualization packages, Matlab and IDL, as well as a series of high
-
quality
open
-
source software packages and applications, capable to work with the OPeNDAP and WMS data
services and with the GeoTIFF/NetCDF/HDF data files. Among the
m we will distinguish
Vis5D/VisAD client and library from SSEC at the University of Winsonsin
-
Madison, IDV interactive
data viewer from Unidata Corp., ODC data connector from the OPeNDAP consortium, and NASA
World Wind 3D globe viewer.


MATLAB

MATLAB is a
numerical computing environment and programming language. Created by The
MathWorks, MATLAB allows easy matrix manipulation, plotting of functions and data,
implementation of algorithms, creation of user interfaces, and interfacing with programs in other
la
nguages. MATLAB is a proprietary product of The MathWorks, so users are subject to vendor lock
-
in.


MATLAB is built around the MATLAB language, sometimes called M
-
code. The simplest way to
execute M
-
code is to type it in at the prompt, >> , in the Command

Window, one of the elements of
the MATLAB Desktop. In this way, MATLAB can be used as an interactive mathematical shell.
Sequences of commands can be saved in a text file, typically using the MATLAB Editor, as a script or
encapsulated into a function, ext
ending the commands available. There are many packages in M
-
code
for different computer science disciplines, such as Databases, Image Processing (supports WMS),
Mapping (supports GeoTIFF), etc. Starting from version 6 the MATLAB engine is implemented in
Ja
va. Now it is possible to add Java classes and data structures to the MATLAB environment. Open
-
source NetCDF and OPeNDAP MATLAB toolboxes (
http://mexcdf.sourceforge.net/

and
http://www.opendap.org/download/ml
-
structs.html
) are available. The most resent MATLAB version
uses HDF data format to serialize Workspace data to hard disk when the data file size exceeds 4 Gb.


IDL and ENVI

IDL, the Interactive D
ata Language, is popular software for data analysis, visualization, and cross
-
platform application development. IDL is vectorized, numerical, and interactive, and it is commonly
used for interactive processing of large amounts of data, especially image pro
cessing. The syntax
includes many constructs from Fortran and some from C. IDL is very fast doing vector operations
(sometimes as fast as a well
-
coded custom loop in FORTRAN or C) but quite slow if elements need
processing individually. Hence part of the
art of using IDL for numerically heavy computations is to
make use of the inbuilt vector operations. IDL has very rich set of data viewing functions for
publication
-
quality graphs, images and maps.



DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


33

/
43


ENVI, the Environment for Visualizing Images, allows eart
h scientists to easily process, analyze and
display multispectral, hyperspectral or radar remote sensing data. Because ENVI has no limits on file
size or number of bands, you can efficiently access and analyze any size or type of file. ENVI's open
architec
ture seamlessly handles data from leading providers such as NASA, NOAA, and ESA. ENVI
is a menu
-
driven application based on the IDL data processing engine. ENVI software combines a
complete image processing package with the most advanced yet easy
-
to
-
use sp
ectral tools and data
drivers for the global remote sensing community. ENVI also provides geometric correction, terrain
analysis, radar analysis, raster and vector GIS capabilities and extensive support for images/ from a
wide variety of sources.


Both IDL

and ENVI are proprietary products of the ITT, so users are subject to vendor lock
-
in.


VisAD and Vis5
D

VisAD is a Java component library for interactive and collaborative visualization and analysis of
numerical data. The name VisAD is an acronym for "Visu
alization for Algorithm Development". The
system combines:




The use of pure Java for platform independence and to support data sharing and real
-
time
collaboration among geographically distributed users. Support for distributed computing is
integrated at th
e lowest levels of the system using Java RMI distributed objects.



A general mathematical data model that can be adapted to virtually any numerical data, that
supports data sharing among different users, different data sources and different scientific
disci
plines, and that provides transparent access to data independent of storage format and
location (i.e., memory, disk or remote). The data model has been adapted to netCDF, HDF
-
5,
FITS, HDF
-
EOS, McIDAS, Vis5D, GIF, JPEG, TIFF, QuickTime, ASCII and many other

file
formats.



A general display model that supports interactive 3
-
D, data fusion, multiple data views, direct
manipulation, collaboration, and virtual reality. The display model has been adapted to
Java3D and Java2D and used in an ImmersaDesk virtual real
ity display.



Data analysis and computation integrated with visualization to support computational steering
and other complex interaction modes.



Support for two distinct communities: developers who create domain
-

specific systems based
on VisAD, and users o
f those domain
-
specific systems.

VisAD is designed to support a wide variety of user interfaces, ranging from simple data browser
applets to complex applications that allow groups of scientists to collaboratively develop data analysis
algorithms.


Vis5D v
isualization system and VisAD Java library were written by programmers at the SSEC
Visualization Laboratory at the University of Wisconsin
-
Madison Space Science and Engineering
Center.


Vis5D is a system for interactive visualization of large 5
-
D gridded d
ata sets such as those produced
by numerical weather models. One can make isosurfaces, contour line slices, colored slices, volume
renderings, etc of data in a 3
-
D grid, then rotate and animate the images in real time. There's also a

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


34

/
43


feature for wind traje
ctory tracing, a way to make text anotations for publications, support for
interactive data analysis, etc.


Originally written in 1994 for OpenGL visualization in Unix X
-
Windows graphics environment, the
Vis5D client was ported to Windows platform by our g
roup at the Geophysical Center RAS in 1999.
The client sources are freely available from. Fig.
1

shows a screen shot of Vis5D generating a spread
sheet display of four members of an ECMWF ensemble forecast.



Figure
1
Vis5D weather visualization screensho
t

IDV

The Integrated Data Viewer (IDV) from Unidata is a Java
-
based software framework for analyzing
and visualizing geoscience data. This IDV release includes a software library and a reference
application made from that software. It uses the VisAD libra
ry and other Java
-
based utility packages.


The IDV is developed at the Unidata Program Center (UPC), part of the University Corporation for
Atmospheric Research, Boulder, Colorado, which is funded by the National Science Foundation. The
software is freely
available under the terms of the GNU Lesser General Public License.


The IDV "reference application" is a geoscience display and analysis software system with many of
the standard data displays that other Unidata software (e.g. GEMPAK and McIDAS) provide.
It
brings together the ability to display and work with satellite imagery, gridded data (for example,
numerical weather prediction model output), surface observations, all within a unified interface. It

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


35

/
43


also provides 3
-
D views of the earth system and allow
s users to interactively slice, dice, and probe the
data, creating cross
-
sections, profiles, animations and value read
-
outs of multi
-
dimensional data sets.




Figure
2

IDV view of Hurricane Charlie, August 13, 2004

integrating satellite, radar, model an
d geopolitical data

The IDV can display any Earth
-
located data if it is provided in a known format
.


Data Type

Description

Supported Formats

Access method

Gridded data


Numerical weather prediction
models, climate analysis, gridded
oceanographic datase
ts,
NCEP/NCAR Reanalysis


netCDF GRIB Vis5D

local files,
HTTP, TDS
servers

Satellite imagery

Geostationary satellite imagery,
MODIS, derived satellite products


ADDE, McIDAS
AREA

ADDE servers,
local files

GIS data

Data typically used in Geographic
Info
rmation Systems (GIS)


ESRI Shapefile, USGS
DEM, GeoTIFF
(limited support)

local files,
HTTP

The Network Common Data Form ( netCDF) provides a common data access method for Unidata
applications. This format can be used to store a variety of data types th
at encompass single
-
point

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


36

/
43


observations, time series, regular grids, and satellite and radar images. The mere use of netCDF by
itself is not sufficient to make data "self
-
describing" and meaningful to the IDV.


Generally, the IDV requires that datasets in n
etCDF format use CF, COARDS or NUWG metadata
conventions to be able to fully understand and geolocate the dataset.


ODC

The OPeNDAP Data Connector (ODC) is a program which allows searching for and retrieving
datasets published by OPeNDAP data servers. OPe
NDAP servers located at major institutions around
the world serve a wide variety of data including: climatic data, satellite imagery, and ocean sensor
results. The ODC allows a user to find these datasets, download them to local machine, save them,
and imp
ort them into client applications like IDL, Matlab, MS Excel, or into databases such as MS
Access and Oracle, and plot them with advanced graphics capabilities.


Fig.
3
shows a pseudocolor Sea Surface Temperature product from the NASA Terra satellite plott
ed
with standard configuration using the ODC viewer.



Figure
3

Sea Surface Temperature data plotted by ODC from an OPeNDAP data source


World Wind

World Wind is a free open source virtual globe developed by NASA and open source community for
use on pers
onal computers running Microsoft Windows. The program overlays NASA and USGS
satellite imagery, aerial photography, topographic maps and publicly available GIS data on 3D models
of the Earth and other planets. Apart from the Earth there are several worlds
in World Wind: Moon,
Mars, Venus, Jupiter (with the four Galilean moons of Io, Ganymede, Europa and Callisto) and SDSS
(imagery of stars and galactics).


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


37

/
43



Users interact with the selected planet by rotating it, tilting the view, and zooming in and out.
Pla
cenames, political boundaries, latitude/longitude lines, and other location criteria can be displayed.
World Wind provides the ability to browse maps and geospatial data on the internet using the OGC's
WMS servers (version 1.4 also uses WFS for downloading

placenames), import ESRI shapefiles.
Other features of World Wind include support for .X (DirectX 3D polygon mesh) models and
advanced visual effects such as atmospheric scattering or sun shading. Microsoft has allowed World
Wind to incorporate Virtual Ea
rth high resolution data for non
-
commercial use.


World Wind uses digital elevation model (DEM) data collected by NASA's Shuttle Radar Topography
Mission. This means one can view topographic features such as the Grand Canyon or Mount Everest
in three dimen
sions. In addition, WW has bathymetry data which allows users to see ocean features,
such as trenches and ridges, in 3D.




Figure
4
. NASA WorldWind 3D viewer


Low resolution Blue Marble datasets are included with the initial download; as a user zooms in
to
certain areas, additional high resolution data is downloaded from the NASA servers. The size of all
currently available data sets is about 4.6 terabytes.


All images/movies created with World Wind using Blue Marble, Landsat or USGS public domain
data c
an be freely modified, re
-
distributed, and used for commercial purposes.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


38

/
43



Despite being open source, World Wind is still restricted to Windows, relying on the .NET libraries
and DirectX. Future version of World Wind will be developed in Java with JOGL. The

new version
will have an API
-
centric architecture with functionalities 'off
-
loaded' to modular components, leaving
the API at the core. The intent is to allow plugins to be used as interchangeably as possible (i.e. via
Python). This refactoring exercise w
ill also allow World Wind to be accessed via a browser. A
preview of the World Wind Java SDK was released on May 11, 2007 during Sun Microsystem's
annual JavaOne conference.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


39

/
43



7

DATA POLICIES


There is a large variety of data policy in Earth Science dependin
g on the user, the
ir usage of data
,
their source and the organization delivering the data. However there are general points to take into
consideration.

Whatever data policy is,



Only the owner can distribute the data, with possible restriction provided by
his/her
organization. In no case re
-
distribution of the data from one user to another one is
allowed.



The data policy has to be taken into account for the use of the data, especially in case of
confidentiality, and the publication of the results (co
-
author
, acknowledgment, reference).

Even if the access to the data is free on a web site it does not mean that there is no data policy and
restriction, especially for operational, commercial or industrial use. Many times you need to identify
yourself via a login
, IP of your machine or justify the use of the data or software. The same situation
is faced if the data are bought. Their use can be restricted to the buyer. Other features are that for
academy the cost is only related to the cost of handling the data, fo
r company the cost also includes
the cost of the data.

Other characteristics of data policy in Earth Science are related to data from thematic campaigns and
satellite experiments. In most campaigns the access of the data is li
mited to the participants for

around
2 years. After this period, other scientists can access the data after acceptance of the data policy. The
data policy for satellite data is different for ESA and NASA. For ESA satellite, only the scientists,
whos
e Announcement of Opportunity
(AO) on

the data usage

are accepted, can access the data for the
proposed use. For NASA, all the
data are available for those who needs

them nevertheless there is a
data policy for publication, sometimes registration is required, or use justification.

The most re
strictive data policy comes from the European Centre for Medium
-
Range Weather
Forecast (ECMWF). The authorization is given to a person or to an institute. However a person from
one institute is not authorized to use the same product but extracted by anothe
r institute without
authorization of ECMWF.

In conclusion this large variety of data policy leads to some constraints to the data management on
Grid.



Restriction of the access to metadata and data to a person, or a group on the Storage
element



Encryption

of the data in some limited cases



Need of confidentiality of new data registered on Grid or used by member of a given VO
towards the other VO or group members. Different teams, using the same set of data, are
competing on nearly the same topic then the kn
owledge of the subset of data selected is
too informative.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


40

/
43



8

AUTHORIZATION, AUTHE
NTICATION, ACCOUNTIN
G


There are different ways to access data in today's Earth Science.
An important issue for the access of
public

or non public

data is authorization and au
thentication.
Furthermore

resource providers

have to
provide evidence of the use of the data they provide in order to
justify their

fundings
.
Then following
the data policy an accounting may be installed, especially for data with charge.

To be able to use
data there are two needed steps, authorization and authentification

As it was indicated in the paragraph on data policy only the data provider can authorize the use of a
data set to a given user. The way, the authorization is given, depends on the data an
d the provider
responsible of the set of data. It
is

given to those who

accept
s

the data policy,
is a participant of a
project,
belong
s

to Academy or Industry, make
s

a proposal for the data use.

The agreement is
personal, except if it has been asked for a
group (team, institutes...).

The authentification depends on the organisation
and the different mechanisms are by using a login
ID,

a
uthorized IP of the machine, certificate

etc
.

In any way there
may be

a control of the user via the
log.

In a grid environ
ment
for a given data set
the
same
data policy must be
applied
.

P
ersonal certificate,
proxies and virtual organisation

take care of the authorization and authentification. However a
mechanism has to be implemented to control access of the data stored on Gr
id storage or to encrypt
the data. Encryption is not commonly used in Earth Science.

The accounting for data has not yet be taken into account in Grid infrastructure. The only possibility
will be to store the data on a specific server with control access,
like for licensed algorithm

(Geocluster)
.


The table shows some examples:


Application

Authorization and authentication mechanism

C3
-
Grid

Not central via user certificates, e.g. Shibboleth

KWF
-
Grid

Standard globus toolkit, proxies

High water, Hydrogeol
ogical
application

None

Medigrid

Based on Globus WSRF framework, ACL

EO Data Grid Application,
CMT

VOMS proxies based on globus, central control

CGG, Geocluster on the Grid

Standard globus toolkit, ACL, login password and software central
control


9

CONC
LU
SION AND FUTURE PLAN
S


This document presents a realistic overview of data management and data policy in Earth Science.

While
examining

the answers of the questionnaires and the different applications and the technologies

described in this deliverable,

p
rioritization goes to the
u
se of common standards for ES

data and

DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


41

/
43


standards as defined in context of Web services like OASIS for
ES

applications.
However some ES
fields are missing
in
this deliverable
like geoinformatics, large data
server which

introduce
s

new
requirements for Grid.

The deliverable of WP1 contains a detailed list of requirements
which is
related to

data management

too
.
Hence we have ended up in a set of requirements which were
concluded
from the

overall analysis of this deliverable

and from

WP1
.

These requirements are stated
below:




Support for Metadata intensive applications in distributed environments



User/role based access control to Metadata and data has to be supported




Interfaces for access to heterogeneous and federated data resources

(OGSA
-
DAI or AMGA
are examples)



The fast transfer of large files and a large number of different files has to be taken into
account



Accessing data from different locations esp. from locations not directly included in grid
infrastructures like EGEE



Web s
ervice based interfaces to data resources, esp. on the basis of the OGC service
definitions



Data management should not be limited to UNIX based systems, esp. in the GIS domain,
Microsoft.NET

based applications are used.



For complex workfl
ows, robust and fa
st replication

of data is indispensable



Ontology technologies should be avai
lable for the data in

ES
specific
domains



Support for interoperability between query languages and data models, ex: WSC, SQL,
OPeNDAP



For easy sharing of data be
tween ES domains, c
onversion

tools have to be provided and
supported by the
middleware
, ex: conversion service f
or atmospheric data and GIS see [R9]


Many of the technologies proposed in Semantic OGSA [
R7
]

will
address some of the
above
requirements, but is not yet
integrate
d
in today’s classical Grid Middleware’s.


The future plan includes analysis of the results of Task 2.1 and Task 2.2

with regard to grid based
collaboration. Available grid solutions (e.g. OGSA DAI, AMGA etc) for distributed data management
will be analyz
ed and mirrored to the requirements of ES applications and practices.
We will compare
existing Middleware tools with the status and requirements described in this document in Task 3 of
the Work Package.



10

QUESTIONNAIRE


Basis of the information collected i
n this paper is the following questionnaire.


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


42

/
43



WP2:
Existing data technologies and data usage policies in the ES community


What is a typical data flow in complex scenarios?


In order to get an overview and a better insight in the data flow of the various E
S applications, it
should be analysed how the data is processed in the different steps of a typical workflow.

What are typical use cases ?


What is typical for data provision?


The nature of the data in the different steps of the data flow gives informatio
n about

the needs of storage types and access types needed by the ES community.

Is preprocessing done

(for example reduction, reformatting,?) ?

Will the data be extracted from data archives
:

From different data archives and data servers

How is the access t
o the contents of the
files?

Is a management of files as entities (copy, move, delete, ..) used

What type of IO is typically used (byteIO, web service,?)

Is the data encrypted?

Which data formats are used

(flat files, data
bases ...)
?


Are the data archive
s/RDBs, repositories behind firewalls and what are the methods to

tunnel the data?


What IO
-
type is used in grid systems

(gridftp, web service (via XML formats, ?))?


Are Message Level Security protocols or secure channels
(eg. SSL,

?)
are used
?


Relation to Grid environments

The impact of the needs of the ES community to the Grid middleware is a central issue.


Are Grid systems used?


If so, which data management in Grid systems is used

(EGEE
-
SE, OGSADAI, ?.)?



Do they use logical files /file replica management?



Access to local or
remote (
relational)
databases (
information systems
)

The way of access and the location of the data
give

clues about the mechanisms needed. The role of
the data base as a ma
nipulator of the data helps to decide which software requirements are induced on
the system


What are the schemas?


What is typical?
Web

based access, access from simulation codes


DISSEMINATION AND
EXPLOITATION OF GRID
S IN
EARTH SCIENCE


SURVEY

SURVEY OF EXISTING D
ATA TECHNOLOGIES
IN EARTH SCIENCE AND

DATA USAGE POLICIES

Doc. Identifier:

towerdevelopment_2fab9bbf
-
4bd2
-
4139
-
9ddd
-
bb2f4593d871.doc
.doc

Date
:
3/18/2013




DEGREE
-
IST
-
2005
-
034619


PUBLIC


43

/
43




directly, access from Grid environments



Are manipulations done by the database

(e.g.: Postgres/GIS geometry calculations)?


How are the data information systems /repositories organized?

Whether a standardization of the data format is currently

used is of interest.



In data and metadata? Are they stored at one site or at different sites?


What metadata models are used? Are they WMO/ISO conform?


The local or remote location of data is important to extract
access rules.


Which user authentication is used?


Security is an important issue, in order to understand the needs of the ES community

the current policies are from interest.


What are the policies (Security policy management)?


Do

they use also authorisation and accounting

(so we have
AAA
)?


What AAA services are required?


What is the flow of data
during

computations?


The transport mechanism during a simulation of different processes is important requirements to a
gri
d.


Is
data
-
exchange

done between processes in simulation runs

(via mpi, streams,
..)
?


Is
data depending job management/scheduling

organized (required)?


What are the typical sizes?


Is a huge bandwidth needed?



Which access rates are needed?