Gateway Implementation

arghtalentData Management

Jan 31, 2013 (4 years and 4 months ago)

112 views

ESG
-
CET Meeting, Boulder, CO, April 2008

Gateway Implementation

4/30/2008

ESG
-
CET Meeting, Boulder, CO, April 2008

Overview


Implementation Technologies / Tools


Science Metadata Implementation


Browse Interface


RDF Search Integration


Data Downloading


Metrics Integration

ESG
-
CET Meeting, Boulder, CO, April 2008

Database Driven Approach


All metadata and associated elements stored in a
single database


Data integrity for all elements enforced at the
database level


Normalization reduces the amount of duplicated
data over previous system


Concurrency and transaction control spanning all
related elements


Hot backups supported

ESG
-
CET Meeting, Boulder, CO, April 2008

Database Implementation


PostgreSQL 8.3 selected as the database engine


Better performance and scalability over MySQL


Feature rich and good SQL standard compliance


Full transactional support


OpenBSD license, no dual licensing issues

ESG
-
CET Meeting, Boulder, CO, April 2008

Gateway Implementation


Java based


Spring Framework:


Lightweight Inversion of Control Container (IoC)


Acegi (Spring Security)


Web application support


Database access abstractions (transactions,
exception handling, etc)


Full application support, integration of many useful
libraries

ESG
-
CET Meeting, Boulder, CO, April 2008

Gateway Implementation


Hibernate: Object Relational Mapping


Maps Java objects to the database


Greatly reduces the amount of database code that
needs to be written


Built
-
in caching, optimized join lookups, and other
performance enhancements

ESG
-
CET Meeting, Boulder, CO, April 2008

Database Schema


Still under very active development


Currently 92 tables


Database is separated into 4 logical
schemas


Metadata


Metrics


Security


Workspace

ESG
-
CET Meeting, Boulder, CO, April 2008

Science Metadata Schema

(subset)

ESG
-
CET Meeting, Boulder, CO, April 2008

Browse Interface


Driven completely from the database


Efficient queries and data structures


Straight forward to cache queries
and results


Relatively static structures involved

ESG
-
CET Meeting, Boulder, CO, April 2008

Future Features


Annotations


User submitted comments on resources


Can be applied to collections and logical files


Notifications sent to resource owners and admins for review


Tagging


User defined and assigned keywords


Can be assigned at the collection level


Browsable and searchable


Notifications sent to resource owners and admins for review

ESG
-
CET Meeting, Boulder, CO, April 2008

RDF Integration


Database is the authoritative source for
the RDF search data


Event mechanism to trigger RDF updates
when the underlying database changes


Database contains detailed information
beyond what is stored in RDF

ESG
-
CET Meeting, Boulder, CO, April 2008

Data Download


Data can be retrieved directly from data nodes or
the gateway when data is local


Files can be directly downloaded through the
gateway interface


Bulk data retrieval scripts can be created through
the user interface


WGET is currently supported


Additional options such as DML to come


Deep storage retrieval requests generated from
the same interface

ESG
-
CET Meeting, Boulder, CO, April 2008

Authorization Tokens


Lightweight tokens are used to allow users to
download restricted files using standard tools,
such as standard HTTP clients


Limited lifetime


Grants a particular user access to only a specific
resource


Currently implemented for direct gateway
downloads and appropriately configured TDS
servers

ESG
-
CET Meeting, Boulder, CO, April 2008

Authorization Tokens

ESG
-
CET Meeting, Boulder, CO, April 2008

Metrics System


Metrics data integrated with access control and metadata
schemas


Associated with user accounts and inventory metadata


Accurate associations of activities without duplication of
data


Use of Jasper reports to allow more flexible options for
creating new metrics reports in the system


Evaluating the use of star schemas to allow for better report
query performance / options