A. Jack Tannenbaum

musicincurableData Management

Jan 31, 2013 (4 years and 4 months ago)

228 views

Foundations of
Excellence

DSpace vs Fedora: Or what I do on
my summer vacation

TRLN: Staff Enrichment Series: 8 Nov, 2007

Objectives


Background: Why we even considered a digital
repository


FOE


version 1


DSpace & Fedora: 50,000 foot view


FOE


version 2


FOE


version 3


Where to from here?


TRLN: Staff Enrichment Series: 8 Nov, 2007

Background

TRLN: Staff Enrichment Series: 8 Nov, 2007

75
th

Anniversary


Duke University School of Medicine established
in 1930


2005


year
-
long celebration


New published history


Articles, videos, speeches


Alumni weekend gala event


Josiah C. Trent Foundation Grant

TRLN: Staff Enrichment Series: 8 Nov, 2007

Digitization Project


500 images documenting the first 3 decades of
the School of Medicine and Hospital


Image groups:


Buildings


Education


Events


Clinical


People


Technology

TRLN: Staff Enrichment Series: 8 Nov, 2007

Digitization Project (cont.)


Selection


Whole staff


Digitization


Outsourced to University
Photography


Description


Technical services and Reference
coordinators


Subject terms


Technical services coordinator,
Head, Cataloging services.


Controlled vocabulary


Notetab templates and
libraries

FOE1.0

XML, XSLT, and Postgresql

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE1.0


600 images = 600 xml files = 2 xslt stylesheet


Xml =
EAD2002



XSLT = 1) convert xml to html; 2) convert xml
to SQL statements


Postgresql database used only for search


Result
http://archives.mc.duke.edu/projects/bld/bld0
0012.html

TRLN: Staff Enrichment Series: 8 Nov, 2007

Issues


SQL search statements worked…not


No indexing by search engines


JDBC


I am not a programmer


Definite need for improvements

TRLN: Staff Enrichment Series: 8 Nov, 2007

DSpace & Fedora:

A Birds
-
eye View

TRLN: Staff Enrichment Series: 8 Nov, 2007

Need for a Digital Repository


DSpace


First released in 2002. Developed by MIT Libraries
and Hewlett
-
Packard (
USA Today
)


Current version (
download
)


Optimal performance in a *nix environment, but
should operate in any environment


Written in Java


VERY active listservs


Manakin


TAMU created “front
-
end” which makes
for easier UI localization

TRLN: Staff Enrichment Series: 8 Nov, 2007

Need for a Digital Repository (cont.)


FEDORA
(Flexible Extensible Digital Object and Repository Architecture)


Began as a DARPA and NSF
-
funded research project at
Cornell in 1997


2001, UVA and Cornell: $1M Mellon grant


1.0 released 2003


Current version 2.2.1 (
download
)


Optimal performance in a *nix env, but will run on Windows
based systems


Written in Java


Several front
-
end tools developed. (more in a moment)



TRLN: Staff Enrichment Series: 8 Nov, 2007

Side by side testing


Testing environment:


Lenovo T60, 120 G hard drive, 2 G memory, Fedora
7, 2.6.23 kernel, java 1.5


TRLN: Staff Enrichment Series: 8 Nov, 2007

Requirements


DSpace


Java1.4 +


Apache Ant 1.6.2 +


Postgresql 7.3 + (or
Oracle 9 +)


Jakarta Tomcat 4.x/5.x (I
used 6.x)


Can also run on Jetty or
Caucho Resin




Fedora


JDK 1.5 +



Optional


MySQL


Postgresql


Oracle 9


Jakarta Tomcat


Ant 1.6.5 + if building
from source code

TRLN: Staff Enrichment Series: 8 Nov, 2007

File Size & Download times


DSpace


16 mb


1:43 over a T1 line


1:13 on a T line


Fedora


72 mb


7:49 over a T1 line


1:53 over a T line

TRLN: Staff Enrichment Series: 8 Nov, 2007

Installation time


DSpace


Postgresql installation and
set up: 8 minutes


Ant build and
configuration: 8 minutes


DSpace/Tomcat
configuration and
deployment: 8 minutes


Total time to live: 24
minutes


Fedora


Postgresql installation and
set up: 8 minutes


Fedora install: 5 minutes


Total time to live: 13
minutes


TRLN: Staff Enrichment Series: 8 Nov, 2007

Initial Live View


DSpace


Front Page


Fedora


Front Page

FOE2.0

Choosing our Digital Repository

TRLN: Staff Enrichment Series: 8 Nov, 2007

Deciding Factors


DSpace


Off
-
the
-
shelf view


Workflow process


Individual submitters, one
project admin


Item submission form (link
here)


Bulk load script (dc, item,
mapfile)


Searchbot harvestable


OAI harvestable





Fedora


Off
-
the
-
shelf view


One submitter


Item submission not intuitive
(link)


Bulk load script (foxml)


Content Models (will return)


Dissemenators


Behavior Definitions


Would require extensive
programming

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE2.0 = DSpace

Cup is Half Full


March 2006


Foundations new home


Data submission form


Item View
bld00012


Item Update


Access Restrictions


Handle server

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE2.0 = DSpace

Cup is Half Empty


Object is entered as one item


DSpace is self
-
contained


No real way to show complex relationships


All or nothing metadata


Access Restrictions


Handle server


Searchbot indexing:


DSpace@DukeMed: Item 2193/77

Title:,
A. Jack Tannenbaum
. Issue Date:, 10
-
Nov
-
2005
...

Abstract:,
A. Jack
Tannenbaum

received his medical degree from Duke University in 1935.
...


FOE3.0

“Our goal is to never be satisfied”

Content Models

Reusing datastreams

(next 2 slides borrowed from EDUCASE 2004
presentation by Grizzle, Wayland, and Wilper)

TRLN: Staff Enrichment Series: 8 Nov, 2007

Atomistic Model

TEI
etext
of
a 3-page letter
Persistent ID (
PID
)
Disseminators
System
Metadata
TextTextTextText
<image pointer tag>
TextTextTextText
TextTextTextText
<image pointer tag>
TextTextTextText
TextTextTextText
<image pointer tag>
Page 1 image
Page 2 image
Page 3 image
Persistent ID (
PID
)
Disseminators
System
Metadata
image
Persistent ID (
PID
)
Disseminators
System
Metadata
image
Persistent ID (
PID
)
Disseminators
System
Metadata
image
TRLN: Staff Enrichment Series: 8 Nov, 2007

Compound Model

TEI
etext
of
a 3-page letter
Persistent ID (
PID
)
Disseminators
System
Metadata
TextTextTextText
<image pointer tag>
TextTextTextText
TextTextTextText
<image pointer tag>
TextTextTextText
TextTextTextText
<image pointer tag>
Screen size image for page 1
Screen size image for page 2
Screen size image for page 3
TRLN: Staff Enrichment Series: 8 Nov, 2007

An old favorite blanket


2005
-
2007 Fedora minimally utilized


Primarily used for archiving Library Administrative
documents (Council and Management Team
minutes, and Policies and procedures)


Use of XACML policies to restrict access
(156
\
.16
\
.
\
d{1,3}
\
.
\
d{1,3} lock down)


Began looking at front
-
end GUIs

TRLN: Staff Enrichment Series: 8 Nov, 2007

Front End tools


Fez



A web front
-
end management system for Fedora that is developed in
PHP.


Fez functionality includes:

Web
-
based browsing and searching; Semi
-
advanced searching; Complex security; Basic image handling; Dublin Core.
http://
espace.library.uq.edu.au/documentation/


Elated

-

ELATED is a lightweight, general
-
purpose application for
managing digital files. ELATED is built on top of the Fedora Repository
system, and can be used as a digital assets management system, an
institutional repository, or to meet other collection archiving, publishing and
searching needs.

Dublin Core metadata entry and search; Custom metadata
by collection; Automatic previews for images; Collections with simple
editorial workflow; Indexing and searching of content; User feedback,
enabled by collection; Select and import existing Fedora objects

http://elated.sourceforge.net/



Both require extensive programming for localization

TRLN: Staff Enrichment Series: 8 Nov, 2007

External Forces at play


Fall 2006 we began a project to digitize 10,000+
cytopathology slides.


Images converted to JPEG2000 to increase user experience
(
example
)


Archives purchased Aware JPEG2000 Image Server


History of Medicine image database, Historical
Images in Medicine (HIM) needed new platform

TRLN: Staff Enrichment Series: 8 Nov, 2007

Call out of the blue


VTLS


Vital


Open Repositories

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE3.0 = Fedora/Vital

Cup is Half Full


June 2007


Foundations new home (link)


Data submission (3 ways to enter items)


Item View
bld00012


Object is entered as many datastreams (
fedora view
)


Vita/Fedora/Aware…interoperability


Complex relationships


Multiple metadata streams


Handle server


Searchbot indexing:


A. Jack Tannenbaum. | MeDSpace

Description:
A. Jack Tannenbaum

received his medical degree from Duke
University in 1935.
...

per00165,
A. Jack Tannenbaum
. 302.3 kB, JPEG 2000
Image
...

TRLN: Staff Enrichment Series: 8 Nov, 2007

FOE3.0 = Fedora/Vital

Cup is Half Empty


Fedora is open source, Vital is not


Customization possible with programming
knowledge


No way at this time to implement xacml policies
(work arounds exist)


Vital upgrades require full software installation


Local customization can cause breaks in certain
functions


Conclusions and
obligatory links

TRLN: Staff Enrichment Series: 8 Nov, 2007

Selected Links

DSpace


http://dspace.org

Manakin
-

http://di.tamu.edu/projects/xmlui/install

Fedora


http://www.fedora
-
commons.org/

Elated
-

http://elated.sourceforge.net/

Fez
-

http://espace.library.uq.edu.au/documentation/

Vital


http://vtls.com

DSpace@DukeMed


http://dspace.mclibrary.duke.edu

MeDSpace


http://medspace.mc.duke.edu/vital/access/manager/Index