Repositoriesx - DSpace

candlewhynotΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 6 μήνες)

271 εμφανίσεις

Daniel Alderman and Cory Stone

Project Motivation


Goal:


Integrate a Workflow Repository into P
-
GRADE Portal



Why?


To allow users to upload, share, and browse application
workflows.



Project Approach


Create a new portlet for the P
-
GRADE Portal that uses a
repository backend for storing application workflows.


No need to reinvent the wheel


instead examine current
repository technologies and try to adapt one for grid use with
the P
-
GRADE Portal

Outline


Comparison Criteria



Applications we considered for integration with P
-
GRADE Portal


DSpace


Fedora


myExperiment


Archimède


ACS



Applications with features for consideration


NGS


Download.com



Results and Conclusions

Comparison Criteria


Functionality



What can this application do for us?


API



Is this application well documented?


GUI



Does this application have an intuitive interface?


Applied
standards



What standards in repository
technology does this application use?


License



Is this project open source?


Version



How far into development is this project?


Is this a
standalone

application, a web
service
, is it
free
?


Installation



How
long

will it take, is it
complicated
,
what kind of
hardware/software

is needed?


References



What kind of
community
? Has this
application been
successful
? Has it been used with
Grids
?


Developed by MIT Libraries and Hewlett
-
Packard



Designed for easy adaptation/customization


Documented Java APIs can be customized to allow
compatibility with currently used systems

https://pacer.ischool.utexas.edu/

University of Texas at Austin

DSpace Functionality


Completely customizable to your needs


User Interface for use or customization


Can change metadata format from Dublin Core to other formats


Configurable browse and search


Can use
Postgre

SQL or Oracle database


Available in over 20 different languages


can set your default



OAI Metadata harvesting



Bit Integrity checking



History system for logging of changes



No support for versioning or statistics currently

DSpace Technology


API


Content Management API


Java classes for reading and manipulating
content stored in the
DSpace

system


Harvesting API


Allows callers to extract information about items
modified within a particular timeframe and scope


Browse API


Maintains indices of dates, authors, titles and subjects,
and allows callers to extract parts of these



GUI


Web UI built on Java
Servlet

and JSP technology



Applied standards


OAI
-
PMH


Dublin Core


OpenURL

DSpace Legal/Commercial Issues


BSD open source license


Freely use, modify, integrate into applications



Version Status: Currently Released v1.5.1



Standalone application

DSpace Installation


Installation Time


1 day for prototype installation


1 day


1 week of exploring software


1 day for production installation with basic software



HW/SW requirements


Requires reasonably good server and decent amount of memory and
disk storage. Examples from community:


HP Server rx2600, dual Intel 64
-
bit processors (900MHz), 2GB RAM, 26GB disk
space, HP
StorageWorks

msa1000 with high
-
performance controller: $40,000


Dell
PowerEdge

2650 with dual Xeon processors (2.4GHz), 2GB RAM, 2x73GB
scsi

disks. 2.5TB Apple
XServe
. DLT tape library: $10K.


JDK 5 or later, Ant 1.6.2 or later, Maven 2.0.8 or later


Must have
PostgreSQL

or Oracle installed


Servlet

engine (
eg

Tomcat, Jetty, etc.)


Perl

DSpace References


Community


Large, active community of developers, researchers, users


80 developers contributing code


15 lead developers planning releases, integrating features/fixes suggested by
community


Over 250 institutions currently using DSpace within their organization



Success

stories


Countless


look at all the institutions using DSpace to meet their
storage needs


Preferred in 2007 survey of Institutional Repositories in USA



Grid Integration


Not currently used by grid communities but a great candidate for Grid
integration


free and open source complete with well documented and
easily customizable Java implementation


Fedora Repository, developed by Fedora Commons, a
non
-
profit organization providing sustainable
technologies for sharing and preserving digital
content.



Goals are similar to
DSpace

Public Library of Science

http://www.plosone.org/home.action

Fedora Functionality


Powerful digital object model


Extensible Metadata management


Expressive Object
-
to
-
Object Relationships


Web
-
service integration


Content Versioning


Access Control and Authentication


A number of features involving digital preservation

Fedora Technology


API


Exposed as web services


Management API


Administration interface


Access API


Interface for accessing objects in repository


Search API


provides basic field search of repository


Resource Index Search API


Searched for digital objects
based on:


Object properties


Object
-
to
-
object relationships


Metadata about
datastreams

and disseminations


Default Dublin Core record

Fedora Technology (cont.)


GUI


Presented as an underlying architecture, no front
-
end
application for end
-
users included




Applied standards


OAI
-
PMH


Dublin Core



Integration with P
-
GRADE portal


Can be easily integrated

Fedora Legal/Commercial Issues


License


Open source under Creative Commons
Attribution
-
Share Alike 3.0
Unported



Version


Current Release v3.1



Standalone application

Fedora Installation


Installation


Well detailed installation process documentation


Options for Quick, Custom, or Client only installation



HW/SW requirements


Windows/UNIX


JDK 5 or 6, Ant 1.7 or later


Must have one of the following databases installed first


MySQL


PostgreSQL


Oracle


McKoi

(Not recommended for production use)




Fedora References


Community


Fairly large and active community of developers, vendors, users


Growing community of developers, currently 22


Core community of vendors providing software integration services


156 users of Fedora listed, including corporations, government agencies, and
universities



Success stories


Public Library of Science


Shift the scientific and medical communities
from subscription based journals to an Open Access online commons


Encyclopedia of Chicago


Manage historical content in which objects are
complex and have varying presentations


Many others



Grid integration


Used by EGEE Library for access of various training and dissemination
resources


Developed in U.K. by University of Manchester and
University of Southampton



Designed for ease of use for end users


Simple, Intuitive layout with sections


Wiki Documentation


Implemented with Ruby on Rails



Essentially a Social Networking website with ability to
share workflows



Part of the
myGrid

consortium


Develops the
Taverna

Workflow Workbench for creating and
executing scientific workflows



http://www.myexperiment.org/workflows

myExperiment

myExperiment

Functionality


Content Versioning


User Profiles


Resource Sharing


Tags


Credits and Attributions


Messaging


News Feeds

myExperiment

Technology


API


Well laid
-
out API, easy to understand


Clear documentation



GUI


Has web
-
based
RESTful

interface, can design custom
interface



Harder to integrate with P
-
GRADE Portal


More of a social networking site, unrelated features

myExperiment

Legal/Commercial Issues


BSD open source license


Freely use, modify, integrate into applications



Version


Currently in Beta



Standalone application, but also available as a free
online service

myExperiment

Installation


Built on top of Linux, Need to install many packages that are not
included


Runs on
Debian

or
Ubuntu

versions of Linux


Need to install: Ruby, Rails, SVN,
Taverna
, Various
Debian

packages, etc.



Regular maintenance required


Usual IT overhead for data backup, content management, ongoing
development...



Requires reasonably good server and decent amount of memory
and disk storage. Examples from test installation


Pentium D 2.8GHz 1GB RAM 250GB Hard Disk
Debian

Lenny
(Linux Kernel 2.6.22
-
3
-
686 #1 SMP) Apache 2.2.8
MySQL

Version
14.12, Distribution 5.0.51a Java version 1.5.0

myExperiment

References


Small community of developers and users


9 developers contributing code


4 lead developers planning releases, integrating features/fixes
suggested by community


1,624 users currently



Current applications


Taverna

plugin



A
plugin

for
Taverna

that allows workflows
to be launched directly from
myExperiment


Workflows for
Facebook



Display
myExperiment

workflows
in
Facebook
, leveraging current social networks for workflow
sharing


Spacebook



Social networking between astronomers


Developed in Canada by Laval University



Designed as an institutional repository for
publications by faculty members and research
communities



Inspired by
DSpace

model



Appears to be no longer under development

http://www.erudit.org/apropos/info.html

Archimède

Archimède

Functionality


Fine
-
grained security


Versioning


Locks for exclusive content editing


Support for portal integration (JSR
-
168
portlets
)


Custom metadata formats


Supports multiple languages

Archimède

Technology


API


Well laid
-
out
Javadoc
, good documentation



GUI


Has a web interface



Applied standards


OAI
-
PMH


Dublin Core


Java Content Repository standard (JSR
-
170)

Archimède

Legal/Commercial Issues


GPL open source license


Freely use, modify, integrate into applications



Version Status: Currently version 2, no longer in
development



Standalone application

Archimède

Installation


Installation appears relatively simple, clearly defined
procedures



HW/SW Requirements


Runs on any OS


JDK 1.4.1, J2EE
Servlet

2.3, Apache Ant


Database supported by Torque
(http://db.apache.org/torque/)

Archimède

References


Community


No larger than the developers and users at Laval University


Development appears to have ceased in 2005


Broken links on website



Success stories


Integrated with Laval University Library system, but does
not seem to be used by another other popular communities



Grid Integration


Not currently used by grid communities but could possibly
be integrated

Application Contents Service (ACS)


Workflow repository designed for the Business Grid
Computing Project



Aim was to create a repository for application related
information that meets the standards of the Open
Grid Services Architecture (OGSA) and is not
dependent on Grid implementation.



Appears to be no longer under development


ACS Functionality


Application Lifecycle Management



Version Control



Ability to store resource properties


Author information, etc.


ACS Technology


API


Doesn’t have a well documented API and is difficult to
understand how it works


Plenty of documentation, but somewhat confusing



GUI


None included, command
-
line interface



Applied standards


OGSA

ACS Legal/Commercial Issues


Apache License Version 2.0



Version Status:


1 release available, listed as prototype (incomplete?)


Inactive since early 2007


Implemented in NAREGI Middleware


v. 1.1.3 will be
available soon



Standalone application

ACS Installation


Installation seems complex



HW/SW Requirements


1GHz CPU, 512MB RAM, 10GB disk space


RedHat

Linux 9.0, Java SDK 1.4.2, Apache Ant 1.6.2, GT4



Appears to run only on NAREGI Infrastructure
Middleware



ACS References


Community


Small community


only a handful of developers




Grid Integration


Integrated with NAREGI Infrastructure Middleware used by
the Business Grid Computing Project


NGS implemented a workflow repository with JSP
Portlet

for Applications Repository



Fairly crude GUI, but GUI does have nice
searching/filtering functionality and browsing ability



Able to explore repository without certificates, but
need a certificate to run a job (functionality that we
desire)

https://portal.ngs.ac.uk/JobProfiles.jsf

National Grid Service

National Grid Service

https://portal.ngs.ac.uk/JobProfiles.jsf

NGS
-

Issues


Unable to locate the source code for their repository



Documentation could not be found either



Unable to fully analyze or consider for integration
without more information


Third party service provided as a means of software
distribution.



Content controlled by download.com staff, must be
approved before being made available.



Inappropriate for the needs of P
-
GRADE Portal

http://www.download.com

CNET Download.com

Results

DSpace

Fedora

myExperiment

Archimède

ACS

Functionality

4

5

3

2

1

Documentation

4

5

2

3

1

GUI

3

2

5

4

1

Standards

-----

-----

-----

-----

-----

License

-----

-----

-----

-----

-----

Version

4

5

3

2

1

Standalone

-----

-----

-----

-----

-----

Installation

5

4

2

3

1

References

5

4

3

2

1

Totals:

25

25

18

16

6

Conclusions


DSpace

and Fedora are the two best choices for
integration with the P
-
GRADE Portal


A combination of both? (
DSpace

and Fedora currently
undertaking a joint venture)



While other technologies were ruled out, some of their
best features will be considered for inclusion if feasible


Some GUI components of
myExperiment

and NGS