An On-line Collaborative Data Management System

conditioninspiredInternet and Web Development

Dec 4, 2013 (3 years and 7 months ago)

54 views

An On
-
line Collaborative

Data Management System

Roger Curry
1
, Cameron Kiddle
1
,
Rob Simmonds
1

and Gilberto Z.
Pastorello

Jr.
2



1
Grid Research Centre, University of Calgary

2
Centre for Earth Observation Science, University of Alberta


Data Challenges


Related Work


Data Management System


Use Case: GeoChronos


Summary and Future Work

Outline

GCE 2010
Nov. 14, 2010

2


Data Acquisition


Much scientific data stored on off
-
line media


Cumbersome and time consuming to access


Making data available on
-
line difficult


Insufficient storage and bandwidth



Sharing of Data


Lack of willingness to share data


Proprietary data
-

need for controlled access



Data Challenges
-

I

GCE 2010
Nov. 14, 2010

3


Usability of Data


Insufficient metadata to describe data


Various metadata standards in some domains,
but many lacking metadata standards


many
scientists use their own metadata format



Finding Data


Difficult to find data that you need


Different data organized / stored differently


Tools to browse, search, visualize data often
lacking



Data Challenges
-

II

GCE 2010
Nov. 14, 2010

4


Content Management Systems


i.e.,
Drupal
,
Joomla
!, Microsoft SharePoint,
Plone
, ...


Offer rich set of features but do not handle:


Meaningful support to specific data formats


Efficient association of metadata and ancillary files to data sets


Access to a variety of data processing tools


Uniform handling of outputs from processing tools



Spectral Libraries


i.e., USGS, ASTER, Vegetation Spectral Library (VSL)


Are available on
-
line but lack:


ability to dynamically restructure metadata for browsing


collaboration features enabled by social networking

Related Work
-

I

GCE 2010
Nov. 14, 2010

5


Spectral Library Tools


i.e., DLR
-
DFD Spectral Archive, SPECCHIO


Flexibile

in creating / handling metadata but:


Have a fixed metadata schema


do not support new metadata needs



Data repositories for other domains


i.e., Astrophysics Data System, FLUXNET, European Bioinformatics (EBI)
Databases


Offer wide range of functionality but:


Primarily focus on data that is already validated and structured


Do not handle preliminary, intermediate, untested data (i.e. research in progress)



Digital Libraries


i.e., Planetary Data Systems,
NCore
,
SciPort


Have flexible functionality but:


Most focus on well
-
defined digital artefacts


Limited in handling collaboration on evolving data, metadata and schemas


Related Work
-

II

GCE 2010
Nov. 14, 2010

6


Supports the following functionality:


On
-
line access to data


Enables scientists to share data while
maintaining control of who sees it


Ability to add and edit metadata while working
with multiple schemas


Collaboratively create new schemas to facilitate
consistent/accurate recording of metadata


Dynamically restructure the way data is browsed

Data Management System
-

Overview

GCE 2010
Nov. 14, 2010

7

Data Management System
-

Framework

GCE 2010
Nov. 14, 2010

8


User & Data:


User acquires data from sensor and
uploads to portal


Direct acquisition of data also possible


Elgg

Portal:


Built on top of
Elgg



Open source
social networking platform


Fine grained access control


Flexible data model


Data Storage:


Currently local NFS storage


Working on distributed
iRODS

based
system


Data Ingestion Service:


Creates records, parses metadata,
establishes ancillary relationships


Deployed on cloud
-
based Condor pool

Data Management System



Data Model

GCE 2010
Nov. 14, 2010

9

Source: http://docs.Elgg.org/wiki/File:Elgg_data_model.png)

Data Management System


Data Model


Arbitrary metadata can be assigned to any
entity


Annotations allow users to comment on
entities not owned by them


Data management system adds three
new types of
ElggObjects


Schema


Collection


Record

Data Management System
-

Schemas

GCE 2010
Nov. 14, 2010

10


Create schemas


Custom or standards
-
based (i.e.
Dublin Core)


Individually or as a collaborative
team


Schemas consist of


Namespace


Description


Read/write access permissions


Series of metadata keys


Metadata keys consist of


Name


Description


Type (text,
latlong
, ancillary)


Optionality
: required,
recommended, optional


Data Management System
-

Collections


Group of related data


i.e., spectral library, set of satellite data


Collection consists of


Name, description, read/write access permissions, metadata, records

GCE 2010
Nov. 14, 2010

11

Data Management System
-

Records

GCE 2010
Nov. 14, 2010

12


Atomic unit of data management system


Usually represents a single file, but does not need to be
associated with a file


Tabbed interface for viewing:


Spectral plot, metadata, ancillary data, map, comments


Custom tabs based on data type

Data Management System


Virtual
Directory Structure

GCE 2010
Nov. 14, 2010

13


Dynamic restructuring of data for browsing purposes


Folders based on metadata keys/values


User can c
ustomize

the metadata keys used to establish the
directory hierarchy

Use Case
-

GeoChronos

GCE 2010
Nov. 14, 2010

14

(http://geochronos.org/)


An on
-
line platform


For:



Earth Observation Scientists


Facilitating:


Collaboration

between scientists


Data

access, management and sharing


Application

access, management and sharing


Leveraging:


Web 2.0 and social networking technologies


Cloud computing technologies


Funded by:


CANARIE
-

Network Enabled Platform (NEP
-
1) program


Cybera

GeoChronos
-

Overview

GCE 2010
Nov. 14, 2010

15

GeoChronos
-

Project Team

GCE 2010
Nov. 14, 2010

16

Dr. Arturo Sanchez
-
Azofeifa

University of Alberta

Dr. John Gamon

University of Alberta

Dr. Benoit Rivard

University of Alberta

Dr. Rob Simmonds

University of Calgary

Prinicipal Investigators

Project Coordination

Platform Development

Domain Scientists

GeoChronos

-

Virtual Organization

GCE 2010
Nov. 14, 2010

17


Libraries created


Ingested some existing on
-
line libraries


USGS, ASTER, Vegetation Spectral Library (VSL)


Many enhanced features as part of
GeoChronos

Spectral Library module
-

improved browsing,
dynamic plotting, mapping, annotations, ...


Domain scientists have contributed libraries


Rock samples, tar sand samples, lichen samples,
vegetation samples, alfalfa/barley field samples


Data formats / parsers supported


ENVI, UNISPEC, ASD, several ASCII formats


Schemas incorporated


Library specific


USGS, ASTER, VSL, ...


Sensor/Format specific


UNISPEC, ENVI, ..


Other Standards


Dublin Core


Currently hosting (including MODIS data)


10+ schemas,


20+ collections (libraries),


20,000+ records




GeoChronos



Spectral Libraries

GCE 2010
Nov. 14, 2010

18

GeoChronos



MODIS Satellite Data


Developed automated workflow
service for
mosaicing
,
subsetting
,
reprojecting

and masking MODIS
satellite data


Significantly reduces time that
scientists have spent manually doing
such workflows


Data management system used to
store raw MODIS satellite data and
data products derived from the
workflow


Parsers/schemas specific to MODIS
data have been added to system


User provided with same powerful
interface as Spectral Libraries for
browsing, accessing and viewing data

GCE 2010
Nov. 14, 2010

19


Have developed data management
system in an interactive, iterative fashion


Domain scientists on project have
provided much guidance, testing and
feedback


Have customized, enhanced the data
management system based on feedback
received

GeoChronos


User Feedback

GCE 2010
Nov. 14, 2010

20


Identified data related challenges facing
scientists


Discussed some related efforts and
shortcomings of these approaches


Presented an on
-
line collaborative data
management system addressing many data
challenges


Showed example usage of the data
management system by
GeoChronos

Summary

GCE 2010
Nov. 14, 2010

21


Currently have a single local data repository


Working on extending data management system to work with
distributed data repositories using
iRODS


Currently have powerful browsing functionality


Need to add search functionality across collections and based
on metadata values


Currently support custom metadata schemas


Plan to make use of Semantic Web technologies to better
relate data and provide ontological mapping between different
metadata schemas / standards


Currently work with spectral and MODIS satellite data


Plan to incorporate other data such as carbon flux data, other
satellite data, meteorological data,
phenology

tower data


Next Steps

GCE 2010
Nov. 14, 2010

22

Contact Information

GCE 2010
Nov. 14, 2010

23

http://geochronos.org/

info@geochronos.org

http://grid.ucalgary.ca/

http://ceos.ualberta.ca/

http://www.cybera.ca/