Running head: FEDORA REPOSITORY CASE STUDY
edora Repository Case Study:
oriented digital library
Jeanne C. Samuel
Louisiana State University
“Digital art lasts forever or 5 years, whichever comes first”
Fedora Commons, Inc. published two July 23, 2008 tutorials.
of the history
portion of Tutorial #1 follows.
Since 2008, with the Creation of Fedora Commons
, a 503(c)3 non
the goal of the digital repository is to be part of
technology solution which enables “durable access to digital content containing our cultural
and scientific heritage
(2008, p. 6)
Fedora is an open
source application available from
, an acronym for
Flexible, Extensible, Digital,
originally a 1997 Cornell University project funded by DARPA
Research Projects Agency
(National Science Foundation
Fedora was initially a
(Common Object Requesting Broker Architecture
application but now is Java
In 1999, the University of Virginia
created the first practical implementation of Fedora.
They improved performance by adding a relational database management system.
In 2002, the
Andrew W. Mellon Foundation funded “full
scale” project development which led to the May
2003 Fedora version 1.0.
This version incorporated XML
tensible Markup Language
(Simple Object Access Protocol
technologies and supporte
d the REST
(Representational state transfer
uarter, new version
s are released to correct
defects and add functionality to the Fedora Repository.
The Fedora installation and configuration guide,
is a helpful
resource. According to the guide,
Fedora 3.1 requires
ava SE Development Kit (JDK) 5.0
discussion boards have instructions for
JDK 6 support.
the following application environments if
not already installed
on the server
server application and client application.
The Fedora installer can install the
database and Tomcat
5.5.26. If you are installing Fedora from source code, you will need Ant
version 1.7 or higher. Fedora runs on an Apache2
server and supports MySQL,
1.0.3, Oracle 9, and PostgreSQL database
What is a digital repository?
A digital repository
holds information encoded as sequences of bits.
from the user experience is one of objects such as text, video, images, or audio.
about the objects is stored as metadata. Metadata is data about data.
Repositories are designed to
permit object storage, management, searching,
and access. Michael
Lesk states that digital
libraries store and
Modified from Arms’ digital library definition,
William Arms in his book,
, chapter 1
lists the following benefits
of digital libraries: (1)
The digital library brings the library to the user, (2) Computer power is
used for searching and browsing, (3) Information can be shared, (4) Information is easier to keep
current, (5) The informati
on is always available, and (6) New forms of information become
The Fedora repository addresses three digital repository challenges. The Fedora Tutorial
#1 Introduction to Fedora document (2008, p. 5) states that it overcomes th
e “islands of
information” problem by permitting use of the same repository for many applications, that there
is not “vendor lock
in” since “you control your content assets”, and that there is “durable access
to content”. Durable access means that as conte
nt items age, there is the increasing potential that
the object (format) cannot be accessed.
Fedora supports varied digital content formats, the
metadata about the
The tutorial identifies the ten Fedora
goals which are noted in Appendix A.
One of the key advantages of the Fedora
design is that it is a distributed repository (p. 9).
Distributed, virtual, digital collections are called
Fedora holds a URL (Uniform Resource Locator) pointer to data objects stored
remotely (p. 12).
preservation and archiving features
in addition to OAIS
are the ability to maintain content versions
, and the ability to link related objects as “parent/child relationships” within the
metadata (p. 9)
The Switch website,
compares Fedora version 2.2 and D
The Fedora Digital Object Model
The Fedora digital object, according to Tutorial #1, has three parts: (1) a digital object
identifier, (2) syste
, and (3) a datastream or datastreams (2008, p. 11).
Persistent ID (PID) is assigned when the object is created.
Figure 3 shows the model of a Fedora
Fedora Digital Object Data Model (Figure 1, Tutorial #1, p. 11)
Persistent ID (PID)
Digital Object Identifer
Object properties (FOXML metadata)
Fedora Object XML
Manage and track object within the repository
Required by the Fedora repository architecture (ex. object type, state, content model,
creation date, label, last modified date)
Object content; Aggregates content items
Essense of object (ex. audio recordings, encoded text, digital images)
As noted earlier, a Fedora digital object may have
more than one d
atastream. This can be
metadata or content
. There is a unique Datastream Identifier per object. Fedora may
identifiers. Dublin Core
is used by default, is created
automatically if one is not provided, and contains metadata about the object.
The other two
AUDIT and RELS
AUDIT is used
for recording the object’s
change audit trails. This Datastream is under system control and cannot be edited. RELS
used to provide relationship description information. Custom Datastreams (user
defined) may be
contained in a Fedora object.
B contains basic Fedora Datastream properties.
is an example of an object’s system properties.
Users access the repository
via four APIs (Application Programming Interface
are used to access the
APIs of the Fedora repository.
HTTP is used access the
Sites using Fedora
which has many web
accessible implementation examples
, Fedora has
The best resource for these
the National Science
contains “resources for science, technology, engineering, and mathematics education and
which “promote active, inquiry
based teaching and learning”.
who use Fedora,
(1) Tufts University:
The Digital Collections and Archives Department, (2) University of Virginia Library, (3) RUcore
(Rutgers Community Re
pository), (4) Irish Virtual Research Library and Archive (IVRLA), and
(5) Digital Collections at the University of Maryland.
Appendix C contains a list of Fedora
There is an online video in either WMV
which describes in d
how the Encyclopedia of Chicago use Fedora. The name of the video is “
Management Stories From the Fedora Community: Digital Libraries and Collection
s”. The URL
of the page containing these videos is
Screenshot of Fedora Video Part 1
Features from the
the purposes of this paper, the University of Hull Fedora implementation
As indicated in Figure
, users may browse by
collection, title, subject, author, y
ear, latest additions, and
These options appear as tabbed navigation
(see Figure 7)
once any browse option has been
There is currently only one page of collections.
Browse by Collection
Referring again to Figure 6, t
here is search and advanced search by terms capability in addition
to sorting by date added and modified date.
Advanced search options included the following
criteria and Boolean And/Or operations.
You may conduct advanced search for title, author,
tion, format, language, and subject.
Figure 8 shows the results of searching the Images
collection for the term “computer”.
Search Images Collection
You may only sort the Images collection by “name” or “last updated”.
Selecting one of the
content images is displayed in Figure 8.
You may view or download the image or the image
This paper focused on image libraries. However, I did compare the
object for an PHD
education dissertation to see the difference in metadata and object property information
collected. They did in fact differ and the later appeared more comprehensive.
Phil Cryer in his SlideShare presentation
provides the following advantages of the
Online collection publication
The digital collection is sustainable
Best practices for storage and sharing is ensured due to
No new methodologies need to be adopted
The support and development community is active
The software is open source
He states the following disadvantages of the repository:
The learning curve is steep
It can be difficult to import exis
Initial development took longer than documentation
Difficult to get the site up quickly since there was no simple web
based front end.
(Note: this analysis may be pre
Fedora is a robust repository. It does take a bit
of work to install and configure. I did not
test the ability to “ingest” (place an object in the repository
. It appears to take a lot of time to
perform a custom installation and to define content models. If your organization has the technical
o install, configure, and maintain (administer) a Fedora installation, I think the
scalability, consideration to preservation, ability to create custom metadata
and define objects yet
and remote/distributed capability,
makes it a viable digital r
epository solution. That said,
since it is an open
source product, what you save in software expense, you may spend in
Next steps for me are to attempt a Fedora installation.
Access and Management Stories from the Fedora
Community: Digital Libraries and Collections
Retrieved January 25, from the Fedora Commons Website:
Arms, W. (2000).
Digital Libraries: Chapter 1. Retrieved January 17, 2009, from Cornell
University Computer Science Web site:
Cryer, p. (n.
Using Fedora Commons To Create A Persistent Archive
February 2, 2009, from SlideShare:
eDocs. (n.d.) Retrieved January 29, 2009 from, the University of Hull Web Site:
s Web site (n.d.). Retrieved January 25, from
Fedora Tutorial #1 Introduction to Fedora
(2008, July 23). Retrieved January 25, 2009, from
Green, R. (2006, February 13).
RepoMMan Project. Retrieved February 2, 2009, from The
niversity of Hull Web site:
History of the “Fedora Project” and “Fedora Commons” Names (n.d.). Retr
ieved January 25,
Installation and Configuration Guide (2009). Retrieved February 2, 2009, from
Lesk, M. (2005
Understanding Digital Libraries
. San Francisco,
CA: Morgan Kaufmann Publishers
Rinehart, R. (2008, July 10).
Digital art lasts forever or 5 years, whichever comes first
Retrieved January 29, from Digital Culture Web site:
The Fedora™ Phase 2 Andrew W. Mellon Foundation Grant. (2008, June 2).
29, 2009, from Digital Initiative Website:
Use Cases (n.d.). Retrieved January 29, 2009, from
Fedora Development Goals
Provision for persistent
; unique names
for all resources without respect to
Support for inter
based normalization (
) of heterogeneous content and metadata
by administrators of data, metadata,
supporting programs, and services and tools which make data and metadata presentat
by means of a standard protocol to information about
objects and access to object content
. It provides support for > 10 million objects
provisions for flexible authentication and policy enforcement
provision for longevity and archival support, including XML
object serialization and content versioning
Content (object) repurpose and reuse
including object content b
eing present in any
number of contexts within the repository; object repurposing allows dynamic
transformations to fit new presentations requirements
Ability of digital objects to disseminate launch
pads or tools (
Fedora Object Datastream Basic Properties
Fedora Object Datastream Basic Properties
an identifier for the Datastream that is unique within the digital object
(but not necessarily globally unique)
the Datastream state of Active, Inactive, or Deleted
the date/time that the Datastream was created (assigned by the
the date/time that the Datastream was modified (assigned by the
an indicator (true/false) as to whether the repository service should
version the Datastream. By default the repository versions all
a descriptive label for the Datastream
the MIME type of the Datastream (required
an optional format identifier for the Datastream. Examples of
emerging schemes are PRONOM and the Global Digital Format
one or more alternate identifiers for the Datastream. Such identifiers
d be local identifiers or global identifiers such as Handles or DOI.
an integrity stamp for the Datastream which can be calculate using
one of many standard algorithms (MD5, SHA
the "stuff" of the Datastream is about
(such as a document, digital
image, video, metadata record)
pertaining the the bytestream content, a new Datastream can be
defined as one of four types, or control groups, as follows:
Internal XML Metadata
Redirect Referenced Content
Internal XML Metadata
In this case, the Datastream will be stored as XML that is actually
stored inline within the digital object XML file.
In this case, the Datastream
content will be stored in the Fedora
repository and the digital object XML file will store an internal
identifier to that Datastream.
External Referenced Content
In this case, the Datastream content will be stored outside of the
Fedora repository, and t
he digital object will store a URL to that
Redirect Referenced Content
In this case, the Datastream content is also stored outside the
repository and the digital object points to its URL ("by
ial1.pdf page 13
National Science Digital Library
The RepoMMan Project
Project of the Max Planck Society and FIZ Karlsruhe
Cornell University Library
Digital Collections at the University of Maryland
Irish Virtual Research Library and Archive
RUcore (Rutgers Community Repository)
Tufts University: The Digital Collections and Archives Department
niversity of Virginia Library
The Encyclopedia of Chicago
Australian Research Repositories Online to the World (ARROW)
University of Hulls
Tibetan and Himalayan Digital Library
Public Library of Science's PLoS ONE
Timeline (Adapted from