samuelJ_RH1_Fedora.docx - LSU School of Library and Information ...

perchmysteriousData Management

Dec 1, 2012 (4 years and 9 months ago)

326 views

Fedora Repository
1


Running head: FEDORA REPOSITORY CASE STUDY










F
edora Repository Case Study:
An image
-
oriented digital library

Jeanne C. Samuel

Louisiana State University




Fedora Repository
2




Fedora Repository
3


Introduction



Figure
1
.

Fedora Common
s

Banner (
http://www.fedora.info/
)


“Digital art lasts forever or 5 years, whichever comes first”



Richard Rinehart
1

History


Fedora Commons, Inc. published two July 23, 2008 tutorials.
A summary
of the history
portion of Tutorial #1 follows.
Since 2008, with the Creation of Fedora Commons
,
http://fedora
-
commons.org/
, a 503(c)3 non
-
profit organization,
the goal of the digital repository is to be part of
a
technology solution which enables “durable access to digital content containing our cultural
and scientific heritage
2


(2008, p. 6)
.

Fedora is an open
-
source application available from
SourceForge.net.
FEDORA
, an acronym for
Flexible, Extensible, Digital,
Object, Repository
,
was

originally a 1997 Cornell University project funded by DARPA
(Defense Advanced
Research Projects Agency
3
)
and NSF

(National Science Foundation
4
)
.
Fedora was initially a
CORBA
-
based

(Common Object Requesting Broker Architecture
5
)

application but now is Java
-
based
6
.
In 1999, the University of Virginia
created the first practical implementation of Fedora.
They improved performance by adding a relational database management system.
In 2002, the
Andrew W. Mellon Foundation funded “full
-
scale” project development which led to the May
2003 Fedora version 1.0.
This version incorporated XML

(E
x
tensible Markup Language
7
)

and
SOAP

(Simple Object Access Protocol

8
)

technologies and supporte
d the REST



1

http://dmax.bampfa.berkeley.edu/blog/2008/07/digital
-
art
-
lasts
-
forever
-
or
-
5
-
years
-
whichever
-
comes
-
first/


2

http://www.fedora.info/docume
ntation/3.0/userdocs/tutorials/tutorial1.pdf


3

http://www.darpa.mil/


4

http://www.nsf.gov/


5

http://www.omg.
org/gettingstarted/history_of_corba.htm


6

http://java.com/en/download/whatis_java.jsp


7

http://www.w3schools.com/XML/xml_whatis.asp


8

http://www.w3schools.com/soap/soap_intro.asp


Fedora Repository
4


(Representational state transfer
9

)
architecture.
Each q
uarter, new version
s are released to correct
defects and add functionality to the Fedora Repository.

System Requirements


The Fedora installation and configuration guide,
http://fedora
-
commons.org/confluence/display/FCR30/Installation+and+Configuration+Guide

is a helpful
resource. According to the guide,
Fedora 3.1 requires

J
ava SE Development Kit (JDK) 5.0
.

Some
discussion boards have instructions for
Fedora
and
JDK 6 support.
The jar
10

installation
installs
the following application environments if
they are
not already installed

on the server
.
There is
the

database,
server application and client application.
The Fedora installer can install the
database and Tomcat
11

5.5.26. If you are installing Fedora from source code, you will need Ant
12

version 1.7 or higher. Fedora runs on an Apache2
13

server and supports MySQL,
McKoi SQL
Database
1.0.3, Oracle 9, and PostgreSQL database
applications
.


What is a digital repository?


A digital repository
holds information encoded as sequences of bits.
14

The information
from the user experience is one of objects such as text, video, images, or audio.
Information
about the objects is stored as metadata. Metadata is data about data.
Repositories are designed to
permit object storage, management, searching,
and access. Michael
Lesk states that digital
libraries store and
provide persistent

content availability

(200
4
, p.4)
.




9

http://en.
wikipedia
.org/wiki/Representational_State_Transfer


10

http://java.sun.com/docs/books/tutorial/deployment/jar/


11

http://tomcat.apache.org/


12

http://ant.apache.or
g/


13

http://httpd.apache.org/


14

Modified from Arms’ digital library definition,
http://www.cs.cornell.edu/wya/DigLib/MS1999/Chapter1.html


Fedora Repository
5







Figure
2
.

Digital Libraries
(Arms,
2000
)



William Arms in his book,
Digital Libraries

(2000
, chapter 1
),
lists the following benefits
of digital libraries: (1)
The digital library brings the library to the user, (2) Computer power is
used for searching and browsing, (3) Information can be shared, (4) Information is easier to keep
current, (5) The informati
on is always available, and (6) New forms of information become
possible.

Fedora Features


The Fedora repository addresses three digital repository challenges. The Fedora Tutorial
#1 Introduction to Fedora document (2008, p. 5) states that it overcomes th
e “islands of
information” problem by permitting use of the same repository for many applications, that there
is not “vendor lock
-
in” since “you control your content assets”, and that there is “durable access
to content”. Durable access means that as conte
nt items age, there is the increasing potential that
the object (format) cannot be accessed.
Fedora supports varied digital content formats, the
metadata about the
content
,

and cross
-
content relationships.
The tutorial identifies the ten Fedora
development

goals which are noted in Appendix A.
One of the key advantages of the Fedora
design is that it is a distributed repository (p. 9).
Distributed, virtual, digital collections are called
Fedora Repository
6


Fedorations
15
.
Fedora holds a URL (Uniform Resource Locator) pointer to data objects stored
remotely (p. 12).


Three

preservation and archiving features
in addition to OAIS

(Open Archival
Information System
16
)

compliance,
are the ability to maintain content versions
,

an
object change
audit trail
, and the ability to link related objects as “parent/child relationships” within the
metadata (p. 9)
.
The Switch website,
http://www.switch.ch/de/els/LOR/evaluation.html

,
compares Fedora version 2.2 and D
s
pace
17
.


Fedora Repository

The Fedora Digital Object Model


The Fedora digital object, according to Tutorial #1, has three parts: (1) a digital object
identifier, (2) syste
m properties
18
, and (3) a datastream or datastreams (2008, p. 11).
The
Persistent ID (PID) is assigned when the object is created.
Figure 3 shows the model of a Fedora
digital object.


Figure
3
.

Fedora Digital Object Data Model (Figure 1, Tutorial #1, p. 11)




15

http://lib.virginia.edu/digital/resndev/fedora_grant2.html


16

http://www.virtual
-
museum.at/glossary/OAIS


17

http://www.dspace.org/


18

http://fedora
-
commons.org/confluence/display/FCR30/Introduction+to+FOXML


Persistent ID (PID)

Digital Object Identifer


Object properties (FOXML metadata)
-

Fedora Object XML


Manage and track object within the repository


Required by the Fedora repository architecture (ex. object type, state, content model,
creation date, label, last modified date)

System Properties


Object content; Aggregates content items


Essense of object (ex. audio recordings, encoded text, digital images)


Content metadata

Datastreams

Fedora Repository
7


As noted earlier, a Fedora digital object may have
more than one d
atastream. This can be
metadata or content

essence

. There is a unique Datastream Identifier per object. Fedora may
use one
to three
provided
identifiers. Dublin Core

(DC)

is used by default, is created
automatically if one is not provided, and contains metadata about the object.
The other two
identifiers are
AUDIT and RELS
-
EXT

(2008, p.12)
.
AUDIT is used

for recording the object’s
change audit trails. This Datastream is under system control and cannot be edited. RELS
-
EXT is
used to provide relationship description information. Custom Datastreams (user
-
defined) may be
contained in a Fedora object.
Appendix

B contains basic Fedora Datastream properties.
Figure 4

is an example of an object’s system properties.



Figure
4
.

http://www.hull.ac.uk/esig/repomman/downloads/D
-
D4
-
iterative
-
dev
-
0602

User access


Users access the repository
via four APIs (Application Programming Interface
19
).
SOAP
or HTTP
20

are used to access the
search
,
access
, or
management
APIs of the Fedora repository.
HTTP is used access the
OAI
21

provider

API

(2008, 29)
.

Implementation




19

http://en.wikipedia.org/wiki/API


20

http://www.w3.org/Protocols/


21

http://www.openarchives.org/


Fedora Repository
8


Sites using Fedora


Unlike DSpace

which has many web
-
accessible implementation examples
22
, Fedora has
few
er web
-
documented installations
.
The best resource for these
case studies

is
http://www.fedora.info/usecases/
.
Example of
Fedora
use in
Education is

the National Science
Digital Library

(

NSDL.org
),

http://www.fedora.info/usecases/education.php?pid=NSDL

which
contains “resources for science, technology, engineering, and mathematics education and
research”

which “promote active, inquiry
-
based teaching and learning”.

Examples

of

libraries
who use Fedora,
(
http://www.fedora.info/usecases/libraries.php
)
, include:

(1) Tufts University:
The Digital Collections and Archives Department, (2) University of Virginia Library, (3) RUcore
(Rutgers Community Re
pository), (4) Irish Virtual Research Library and Archive (IVRLA), and
(5) Digital Collections at the University of Maryland.
Appendix C contains a list of Fedora
installations.


There is an online video in either WMV
23

or MOV
24

format,

which describes in d
etail
how the Encyclopedia of Chicago use Fedora. The name of the video is “
Access and
Management Stories From the Fedora Community: Digital Libraries and Collection
s”. The URL
of the page containing these videos is
http://www.fedora.info/resources/outreach.php#brochure
.




22

http://www.dspace.org/index.php/DSpace
-
Repositories/Repositories
-
Alphabetical.html

23

http://www.digitalpreservation.gov/formats/fdd/fdd000091.shtml


24

http://en.wikipedia.org/wiki/QuickTime


Fedora Repository
9






Figure
5
.

Screenshot of Fedora Video Part 1

Features from the
users’

perspective


For
the purposes of this paper, the University of Hull Fedora implementation
,
http://edocs.hull.ac.uk/muradora/

was evaluated.
As indicated in Figure
6
, users may browse by
collection, title, subject, author, y
ear, latest additions, and

recently updated.






Figure
6
.

http://edocs.hull.ac.uk/muradora/ (Home)

Fedora Repository
10


These options appear as tabbed navigation

(see Figure 7)
once any browse option has been
selected.
There is currently only one page of collections.




Figure
7
.

Browse by Collection

Referring again to Figure 6, t
here is search and advanced search by terms capability in addition
to sorting by date added and modified date.
Advanced search options included the following
criteria and Boolean And/Or operations.
You may conduct advanced search for title, author,
descrip
tion, format, language, and subject.
Figure 8 shows the results of searching the Images
collection for the term “computer”.

View metadata

Subscribe (Atom)

Fedora Repository
11







Figure
8
.

Search Images Collection

You may only sort the Images collection by “name” or “last updated”.
Selecting one of the
content images is displayed in Figure 8.
You may view or download the image or the image
thumbnail.
This paper focused on image libraries. However, I did compare the
object for an PHD
education dissertation to see the difference in metadata and object property information
collected. They did in fact differ and the later appeared more comprehensive.






Figure
9
.

Item/Object Selected

View

Download

Fedora Repository
12


Phil Cryer in his SlideShare presentation

(n.d.)

provides the following advantages of the
repository:

o

Online collection publication

o

The digital collection is sustainable

o

Best practices for storage and sharing is ensured due to
standards compliance

o

No new methodologies need to be adopted

o

The support and development community is active

o

The software is open source

He states the following disadvantages of the repository:

o

The learning curve is steep

o

It can be difficult to import exis
ting data

o

Initial development took longer than documentation

o

Difficult to get the site up quickly since there was no simple web
-
based front end.
(Note: this analysis may be pre
-
3.1 version).


Conclusion


Fedora is a robust repository. It does take a bit
of work to install and configure. I did not
test the ability to “ingest” (place an object in the repository
)
. It appears to take a lot of time to
perform a custom installation and to define content models. If your organization has the technical
expertise t
o install, configure, and maintain (administer) a Fedora installation, I think the
scalability, consideration to preservation, ability to create custom metadata

and define objects yet
to exist
,
and remote/distributed capability,

makes it a viable digital r
epository solution. That said,
since it is an open
-
source product, what you save in software expense, you may spend in
technical support.
Next steps for me are to attempt a Fedora installation.



Fedora Repository
13


References

Access and Management Stories from the Fedora
Community: Digital Libraries and Collections
.
Retrieved January 25, from the Fedora Commons Website:
http://www.fedora.info/videos/part1.wmv


Arms, W. (2000).

Digital Libraries: Chapter 1. Retrieved January 17, 2009, from Cornell
University Computer Science Web site:
http://www.cs.cornell.edu/wya/DigLib/MS1999/Chapter1.html


Cryer, p. (n.
d.)

Using Fedora Commons To Create A Persistent Archive

[Slideshow]
.
Retrieved
February 2, 2009, from SlideShare:
http://www.slideshare.ne
t/phil.cryer/using
-
fedora
-
commons
-
to
-
create
-
a
-
persistent
-
archive
-
presentation

eDocs. (n.d.) Retrieved January 29, 2009 from, the University of Hull Web Site:
http://edocs.hull.ac.uk/muradora/


Fedora Common
s Web site (n.d.). Retrieved January 25, from
http://www.fedora.info/


Fedora Tutorial #1 Introduction to Fedora
(2008, July 23). Retrieved January 25, 2009, from
http://fedora
-
commons.org/documentation/3.0/userdocs/tutorials/tutorial1.pdf


Green, R. (2006, February 13).
RepoMMan Project. Retrieved February 2, 2009, from The
U
niversity of Hull Web site:
http://www.hull.ac.uk/esig/repomman/downloads/D
-
D4
-
iterative
-
dev
-
0602


History of the “Fedora Project” and “Fedora Commons” Names (n.d.). Retr
ieved January 25,
2009, from
http://www.fedora.info/about/history.php


Installation and Configuration Guide (2009). Retrieved February 2, 2009, from
http://fedora
-
commons.org/confluence/display/FCR30/Installation+and+Configuration+Guide


Lesk, M. (2005
)
.

Understanding Digital Libraries

(2
nd

Edition)
, S
econd Edition
. San Francisco,
CA: Morgan Kaufmann Publishers

Rinehart, R. (2008, July 10).
Digital art lasts forever or 5 years, whichever comes first
.
Retrieved January 29, from Digital Culture Web site:
http://dmax.bampfa.berkeley.edu/blog/2008/07/digital
-
art
-
lasts
-
forever
-
or
-
5
-
years
-
whichever
-
comes
-
first/


The Fedora™ Phase 2 Andrew W. Mellon Foundation Grant. (2008, June 2).
Retrieved January
29, 2009, from Digital Initiative Website:
http://lib.virginia.edu/digital/resndev/fedora_grant2.html


Use Cases (n.d.). Retrieved January 29, 2009, from
http://www.fedora.info/usecases/




Fedora Repository
14


Appendix A


Fedora Development Goals
25

1.

Provision for persistent
identifiers
; unique names
for all resources without respect to
machine address

2.

Support for inter
-
object
relationships

3.

XML
-
based normalization (
tame

content
) of heterogeneous content and metadata

4.

Efficient,
integrated
(repository)
management

by administrators of data, metadata,
supporting programs, and services and tools which make data and metadata presentat
ion
possible

5.

Provision of
interoperable access
by means of a standard protocol to information about
objects and access to object content

6.

Fedora is
scalable
. It provides support for > 10 million objects

7.

Security

provisions for flexible authentication and policy enforcement

8.

Persistence/Preservation

provision for longevity and archival support, including XML
object serialization and content versioning

9.

Content (object) repurpose and reuse
including object content b
eing present in any
number of contexts within the repository; object repurposing allows dynamic
transformations to fit new presentations requirements

10.

Ability of digital objects to disseminate launch
-
pads or tools (
self
-
actualizing objects)

for end
-
user/con
tent interaction






25

http://fedora
-
commons.org/documentation/3.0/userdocs/tutorials/tutorial1.pdf


Fedora Repository
15


Appendix B


Fedora Object Datastream Basic Properties
26


Fedora Object Datastream Basic Properties

Datastream Identifier

an identifier for the Datastream that is unique within the digital object
(but not necessarily globally unique)

State

the Datastream state of Active, Inactive, or Deleted

Created Date

the date/time that the Datastream was created (assigned by the
repository service)

Modified Date

the date/time that the Datastream was modified (assigned by the
repository service)

Versionable

an indicator (true/false) as to whether the repository service should
version the Datastream. By default the repository versions all
Datastreams.

Label

a descriptive label for the Datastream

MIME Type

the MIME type of the Datastream (required
)

Format Identifier

an optional format identifier for the Datastream. Examples of
emerging schemes are PRONOM and the Global Digital Format
Registry (GDRF).

Alternate Identifiers

one or more alternate identifiers for the Datastream. Such identifiers
coul
d be local identifiers or global identifiers such as Handles or DOI.

Checksum

an integrity stamp for the Datastream which can be calculate using
one of many standard algorithms (MD5, SHA
-
1, etc.)

Bytestream Content

the "stuff" of the Datastream is about
(such as a document, digital
image, video, metadata record)

Control Group

pertaining the the bytestream content, a new Datastream can be
defined as one of four types, or control groups, as follows:



Internal XML Metadata



Managed Content



External Referenced

Content



Redirect Referenced Content


Control
Group

Internal XML Metadata

In this case, the Datastream will be stored as XML that is actually
stored inline within the digital object XML file.

Managed Content

In this case, the Datastream

content will be stored in the Fedora
repository and the digital object XML file will store an internal
identifier to that Datastream.

External Referenced Content

In this case, the Datastream content will be stored outside of the
Fedora repository, and t
he digital object will store a URL to that
Datastream.

Redirect Referenced Content

In this case, the Datastream content is also stored outside the
repository and the digital object points to its URL ("by
-
reference").




26

http://fedora
-
commons.org/documentation/3.0/userdocs/tutorials/tutor
ial1.pdf page 13
-
14


Fedora Repository
16



Appendix C


Fedora Installations
27

Education



National Science Digital Library
www.nsdl.org


eScience/eScholarship



The RepoMMan Project
http://www.hull.ac.uk/esig/repomman/




eSciDoc

Project of the Max Planck Society and FIZ Karlsruhe
http://www.escidoc
-
project.de/homepage.html


Libraries



Cornell University Library
http://arxiv.org/




Digital Collections at the University of Maryland
http://www.lib.umd.edu/digital/index.jsp




Irish Virtual Research Library and Archive
http://www.ucd.ie/ivrla




RUcore (Rutgers Community Repository)
http://rucore.libraries.rutgers.edu/




Tufts University: The Digital Collections and Archives Department
http://dl.tufts.edu/




U
niversity of Virginia Library
http://www.lib.virginia.edu/digital/


Museums/Culture



The Encyclopedia of Chicago
www.encyclopedia.chicagohistory.org


Other



Australian Research Repositories Online to the World (ARROW)
http://arrow.edu.au/




Isladora
http://www.upei.ca/library/ind
ex.html
,
http://vre.upei.ca/mhl/




University of Hulls
http://edocs.hull.ac.uk/muradora/




Tibetan and Himalayan Digital Library
www.thdl.org
,
http://www.thlib.org/#wiki=/access/wiki/site/0b308aa3
-
d044
-
469b
-
009a
-
d34c7841413d/thdl%20status%20reports.html


Open Access



Public Library of Science's PLoS ONE
http://plosone.org/







27

http://www.fedora.info/usecases/


Fedora Repository
17


Appendix D


Fedora

Development Timeline




Timeline (Adapted from
http://www.fedora.info/about/history.php
)