60639_S04 - Kent State University

clappingknaveSoftware and s/w Development

Dec 14, 2013 (3 years and 7 months ago)

102 views

Models, Architectures, and
Technologies of Digital Libraries
(2)
Session 4
LIS 60639 Implementation of Digital Libraries
Dr. Yin Zhang
2
1.
Important protocols for digital libraries
Rhyno (2004):
Ch. 2 Important protocols for
digital libraries and OSS options for
using them
3
What is a protocol and why?

Digital libraries usually are called on to communicate with many
different external systems.

These duties can range from delivering Web-based interfaces for remote
users to exposing content to third-party applications.

Certain interactions are so common or have so many requirements that a
protocol has been established for standardizing and streamlining the
process.

A
protocol
is a set of ground
rules
for how systems carry out
specific activities.

Protocols often define which format and syntax systems use for
exchanging information and what one system must indicate to
another before any data is made available.
4
Core protocols for DL projects (1)

The
Hypertext Transfer Protocol
(
HTTP
) powers the Web and is the protocol that
most Web users interact with when using a Web browser.

HTTP's ability to be plugged into many different types of technologies is shown in
Figure 2.3.

Most Web users are unaware of how many hoops the content delivered to their
browsers has been through. With the use of a
gateway
,
HTTP also can be the basis
for interacting with many other types of protocols.

A
gateway
takes the results of one protocol and translates them to fit the
requirements of a different protocol or application; for example, taking the results of
an HTML form and using the values to formulate a query to a remote database.

For example,
CGI
(
Common Gate Interface
) is a specification introduced in 1994 to
allow HTML content to be created dynamically.
The ubiquitous nature of HTTP is a testimony to both its
simplicity and extensibility. A more complex protocol
would be harder to map to other applications.
As a result, HTTP became firmly entrenched in the
toolkits of application developers at an early stage of the
Web's development and remains there today.
http://www.w3.org/Protocols/
5
HTTP Software Examples

Web server software guide:
http://webdesign.about.com/cs/webservers/bb/abwebservers.htm

Free web server software
http://en.wikipedia.org/wiki/Category:Free_web_server_software

Apache
:

Apache exists to provide a robust and commercial-grade reference
implementation of the HTTP protocol

Apache dominates the Web server world
6
Core protocols for DL projects (2)

OAI-PMH
- Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH)

It has been called the "HTTP of digital libraries” even though the protocol actually
uses HTTP as a transport mechanism between digital collections.

OAI-PMH is several years younger than HTTP, with origins in a 1999 meeting in
Santa Fe, New Mexico, to address a series of problems that were occurring in
the e-print server world.

As disciplinary e-print servers became more common, it was difficult to support
searching across multiple repositories.

Repositories needed greater capabilities to automatically identify and copy
papers that had been deposited in other repositories

The solution was the definition of an interface to permit an e-print server to
expose metadata for the papers it held. This would allow the metadata to be
picked up by programs on the Web called
harvesters.

Harvesting

programs travel around a network gathering, or harvesting, content by
copying it to a central site.


More in Reading 4.
7
Core protocols for DL projects (3)

Z39.50

has roots that stretch back to the early 1970s and the
Linked
Systems Project
for searching bibliographic databases and transferring
records among the major library institutions (e.g., Library of Congress,
OCLC, etc.).

Z39.50 is a protocol that allows a client machine (called an
origin)
to search
a server machine (called a
target
).

Despite its close association with the library community, Z39.50 is a
relatively generic protocol with a rich set of functions for search and
retrieval, including the ability to sort result sets and registries of objects such
as
attribute
sets
that specify
search points
.

These

search points
can be mapped onto the indexes and search
capabilities of the underlying server.

Perhaps the best-known
attribute set
is
Bib-1
, originally designed for
bibliographic resources. but now commonly used for a wide range of
applications
8
Z39.50
(cont. 1)

Bib-1 Attribute Set:

http://www.loc.gov/z3950/agency/defns/bib1.html

http://www.loc.gov/z3950/agency/bib1.html

Bib-1
comprises
six
types of groupings of
attributes
, or
attribute types
, that define a deep
level of precision in putting together queries:

Use

attributes
(type = 1) define the access point:
1 Personal-name 2 Corporate-name 3 Conference-name 4 Title 5 Title-series 6 Title-uniform 7 ISBN 8 ISSN …

Relation

attributes
(type = 2) define the relation of the search term to the values in the database
1 Less than 2 Less than or equal 3 Equal 4 Greater or equal 5 Greater than 6 Not equal …

Position attributes
(type = 3) specify the location of the search term within the field or subfield in
which it appears.
1 First in field 2 First in subfield 3 Any position in field

Structure attributes
(type = 4) specify the type of search term.

1 Phrase 2 Word 3 Key 4 Year 5 Date (normalized) 6 Word list 100 Date (un-normalized)


Truncation attributes
(type = 5) specify whether one or more characters may be omitted in
matching the search term in the target system at the position specified by the Truncation attribute.
1 Right truncation 2 Left truncation 3 Left and right truncation 100 Do not truncate ….

Completeness attributes
(type = 6) specify that the contents of the search term represent a
complete or incomplete subfield or a complete field.
1 Incomplete subfield 2 Complete subfield 3 Complete field
9
Z39.50
(cont. 2)

Z39.50-compliant systems can use the these attributes correspond to numbers in the
standard to deconstruct queries.

For a search query:

FIND TITLE PROGRAM* OR SUBJECT UNIX
Use

attributes
(type =
1
)

1 Personal-name 2 Corporate-name 3 Conference-name
4 Title
5 Title-series 6 Title-uniform ..
21 Subject
heading
Relation

attributes
(type =
2
)

1 Less than 2 Less than or equal
3 Equal
4 Greater or equal 5 Greater than 6 Not equal …
Position attributes
(type =
3
)

1 First in field 2 First in subfield
3 Any position in field

Structure attributes
(type =
4
)

1 Phrase 2 Word
3 Key
4 Year 5 Date (normalized) 6 Word list 100 Date (un-normalized) …
Truncation attributes
(type =
5
)

1 Right truncation
2 Left truncation 3 Left and right truncation 100 Do not truncate ….
Completeness attributes
(type = 6)

1 Incomplete subfield 2 Complete subfield 3 Complete field
10
Z39.50
(cont. 3)

Z39.50 is more complex than either HTTP or OAl and is an important
protocol for digital libraries because it is designed to meet the very real
complexities of information retrieval.

It also can be used as a tool to build distributed search services, also
know as

federated

search systems:

The client in a federated system sends a search to all of the servers comprising the
federation.

It can then gather the results and attempt to eliminate duplicates or perform value-
added services such as clustering the results under topics, unlike the harvesting
approach used with OAI that takes entire sets of records (see Figure 2.6).
11
Z39.50 Software

Z39.50 is an abstract layer on top of an existing system, so it isn't surprising
that most Z39.50 tools are architected to work on top of other applications.

Suggested by Library of Congress:
http://www.loc.gov/z3950/agency/resources/software.html

Free Software

Commercial Software

Suggested in this chapter a few open source applications (see Table 2.5).
12
Other protocols for DL projects (4)

There are some protocols are supported widely outside of the digital library
community

SOAP:
Simple Object Access Protocol (SOAP)

It combines XML with HTTP for accessing services, objects, and servers.

It is a lynchpin of a suite of technologies called
Web Services

that leverages the Web for
delivering application functions in a well-defined manner.

SOAP allows a great deal of information to be passed to an application, and it leverages
XML for laying out the data that goes between DL applications.

RSS:
RDF Site Summary (RSS)

It is an XML-based format that allows simultaneous publication, or
syndication,
of lists of
hyperlinks, along with other information or metadata, that help viewers decide whether they
want to follow a link.

Shibboleth:
http://shibboleth.internet2.edu/

Shibboleth is an
authentication
and
authorization

project
under the auspices of Internet 2, a
consortium of a group of universities working in partnership with industry vendors and
government agencies to develop and deploy advanced network applications and
technologies.

The Shibboleth System is a standards based, open source software package for web single
sign-on across or within organizational boundaries. It allows sites to make informed
authorization decisions for individual access of protected online resources in a privacy-
preserving manner.
13
Discussion and Reflection

Summary
:

Protocols make network systems work together and are the basis of
many formal communications.

Digital libraries depend on protocols, particularly HTTP, OAI-PHM, and
Z39.50, to provide services. Think of

HTTP as the highway between digital libraries, with

OAI as a friendly but comprehensive census taker that periodically turns up on the
highway for updates on changes in the collection, and

Z39.50 as a sometimes more demanding visitor asking for less predicable and more
specific information on the collection.

SOAP, RSS, and Shibboleth promise to enhance further and expand the boundaries of
digital library services.

Issues raised in this reading

How such issues are addressed in your DL case
14
2.
Interoperability: Standards and
protocols
Witten & Bainbridge (2003):
8.5-8.7 in Ch. 8
Interoperability: Standards and protocols
15
Interoperability

Interoperability
is the name of the game for libraries. An important part of traditional library culture
is the ability to locate copies of information in other libraries and receive them on loan-interlibrary
loan. Libraries work together to provide a truly universal international information service. The
degree of cooperation is enormous and laudable.

For digital libraries to communicate with one another, standards are needed for representing
documents, metadata,
and
queries.

The components are in place. What we need are
protocols
that put them all together to achieve
effective and widespread communication.

Different protocols have sprung from the two different cultures upon which digital
libraries are founded.
Two principal ones
:

the
Z39.50
protocol developed by the library community and maintained by the Library of
Congress, and

the
Open Archives Initiative (OAl)
protocol, developed by members of various communities
concerned with electronic documents.
16
Supporting the Z39.50 protocol

A particular Z39.50 system need not implement all parts of the protocol. The
protocol is so complex that full implementation is a daunting undertaking and
may in any case be inappropriate for a particular digital library site.

For this reason the standard specifies a minimal implementation, which
comprises the

Initialize Facility,

Search Facility,

Present Service (part of the Retrieval Facility), and

Type 1 Queries (part of the registry).

Using this baseline implementation, a typical client-server exchange works as
follows:

First the client uses the
Initialization Facility
to establish contact with the server and negotiate values
for certain resource limits.

This puts the client in a position to transmit a
Type 1 query
using the
Search Facility.

The number of matching documents is returned, and the client then interacts with the
Present
Service
to access the contents of desired documents.

Greenstone DL software supports Z39.50
17
Supporting the Open Archives Initiative (OAI)

For a given digital library site to become an OAl data provider, software needs to be
written that can respond to CGI requests and access the database system that stores
the documents.

Many programming languages have library support for implementing CGI scripts -
Perl, Python, Java, and C++, among others although the database itself will probably
dictate the most suitable choice.

Greenstone can support the construction of a digital library collection based on OAl
exported data by the following two steps:
1.
obtaining the raw material from a data provider and configuring a suitable collection
2.
augmenting the collection configuration file with a built-in OAI plugin

With the issuing of the appropriate
import.pl
and
buiIdcol.pl
commands, the end result of
these two stages is a searchable, browsable Greenstone collection based on the exported
content.

Further configuration of indexes and classifiers is possible depending on the metadata
available.
18
Research protocols –
(1)

Dienst


Two long-standing digital library protocols
from the research
community that are designed to promote
interoperability
.

The trouble with interoperability though is that the purpose is
defeated if several groups promote different interoperability
schemes.

Dienst
- Dienst, at Cornell University, is one of the longest-running
digital library projects in the research community: its origins stretch
back to 1992. It has three facets:

a conceptual architecture for distributed digital libraries,

an open protocol for service communication, and

a software system that implement the protocol.
19
Research protocols –
(1) Dienst
(cont.)

The protocol supports

search and retrieval of documents,

browsing documents,

adding new documents, and

registering users. Each of these is an independent

There are six categories of DL collection services:

repository services

store digital documents and associated metadata;

index services

accept queries and return lists of document identifiers;

query mediator
services
dispatch queries to the relevant index servers;

info services

return information about the state of a server;

collection services

provide information on how a set of services interact;

registry services

store user information.
20

(2) Simple digital library interoperability protocol (SDLIP)

lnteroperation among distributed objects has been a central plank of
Stanford University's digital library project, the
lnfobus
.

Many lnfobus objects are in fact proxies to estab.lished information sources
and services.

The original
Digital Library lnteroperation Protocol (DLIP)
has since been
superseded by the
Simple Digital Library Interoperability Protocol (SDLIP),

designed in collaboration with other U.S. research projects.

SDLIP paces emphasis on a design that is scalable, permitting the
development of digital library applications that run on handheld devices such
as Palm Pilots) as well as workstation- and mainframe-based systems.

There are four parts (called
interfaces)
to the protocol:
searching, accessing
results, metadata,
and
delivery.
21
Translating between protocols

The Stanford research group provides a Java-
based software development kit to support
SDLIP.

The translator runs as a server in its own right.

For example, the translator server implements
the intersection of the Greenstone protocol and
SDLIP's
search
and
source metadata
interfaces.
22
Discussion and Reflection

Summary:

Four digital library protocols: Z39.50, Open Archives Initiative (OAl), Dienst, and
SDLIP

all support browsing and document retrieval, and all but OAl support searching

Text searching is relatively well understood-alI support ranked and Boolean
queries, with a rich array of options: fielded search, stemming, case matching,
and so forth.

Issues raised in this reading

How such issues are addressed in your
DL case
23
3.
General purpose technologies
useful for digital repositories
Reese
&
Banerjee
(2008):
Ch. 4 General purpose
technologies useful for digital repositories
24
The Changing Face of Metadata

The foundation of any digital repository is the underlying metadata structures that
provide meaning to the information objects that it stores.

Libraries have traditionally treated the creation and maintenance of bibliographic
metadata as one of the core values of the profession.

For libraries to truly integrate their digital content, their bibliographic infrastructure
must change dramatically. This change must include both the metadata creation and
delivery methods of bibliographic content.

The days of a homogenous bibliographic standard for all content are coming to an
end as more specialized descriptive formats are needed to describe the various types
of materials being produced today and into the future.

This chapter will focus on the technologies that make up today's current digital
repository systems

XML (eXtensible Markup Language), and

SOAP (Simple Object Access Protocol)
25
XML in Libraries

The library community has been one of the early
implementers of XML-based descriptive schemas.

Issues of document delivery, indexing, and display have
pushed the library community to consider XML-based
markup languages as a method of preserving digital and
bibliographical information

Today, libraries make use of XML nearly every day. We
can find XML in the ILS systems, in image management
tools, and in many other facets of the library.
26
XML in digital repositories

The ability to provide XML-formatted data from one's digital
repository is a valuable access method.

When making decisions regarding a digital repository, one must look
at how well the digital repository supports XML and XML-related
technologies.

One should ask the following questions:

Does the digital repository support XML-structured bibliographic and
administrative metadata? Does the digital repository support structural XML-
based metadata schemas like METS (Metadata Encoding and Transition
Standard)?

Can the metadata be harvested or extracted? And can the data be extracted in
XML?

Does the digital repository support SOAP or other XML query syntaxes?

Can my digital repository support multiple metadata formats?
27
Why Use XML-based Metadata?

XML is human readable

One of the primary benefits associated with XML is that the generated metadata
is human readable.

This characteristic of XML (1) makes data more transparent, (2) makes the data
less susceptible to data corruption, and (3) reduces the likelihood of data lockup.

XML offers a quicker cataloging strategy

In many cases, XML-based metadata schemas will lower many of the barriers
organizations currently face when creating bibliographic metadata.

XML can represent multi-formatted and embedded documents

One of XML's strengths is its ability to represent hierarchical data structures and
relationships.

An XML record could be generated that contains information on a single
document available in multiple physical formats with the unique features of each
item captured within the XML data structure.
28
Why Use XML-based Metadata?
(continued)

XML metadata becomes “smarter”

In an XML document, metadata fields can have attributes and
properties that can be acted upon.

Data can be manipulated and reordered without having to
rework the source XML document.

The ability to illustrate relationships and interlinks between
documents - the ability to store content or links to content within
the metadata

XML is not just a library standard

While the LIS community has created XML-based schemas like
MODS, METS, and Dublin Core, the fact that these schemas are
in XML allows libraries to look outside the traditional library
vendors to a broader development community.
Web Services and SOAP

SOAP:
the Simple Object Access Protocol

SOAP is a standard method for generating API for Web-based
applications.

As a digital repository's content and traffic grow, users of the
repository may want to access the repository's content outside the
traditional user interface.

A digital repository that lacks Web services support greatly reduces
the amount of integration that an organization can accomplish with
its content.

Technologies like SOAP hold the keys to opening a digital
repository beyond the "walls of the application platform, allowing
other services like search engines or users to search, harvest, or
integrate data from one digital repository into their own context or
workflow.
30
Discussion and Reflection

Issues raised in this reading

How such issues are addressed in your
DL case
31
4.
Open Archives Initiative
Protocol for Metadata
Harvesting (OAI-PMH)

http://www.oaforum.org/tutorial/english/intro.htm

Rhyno (2004):
Ch. 2 Important protocols for digital libraries
and OSS options for using them

32
As one of the core protocols for DL projects

OAI-PMH
- Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH)

It has been called the "HTTP of digital libraries” even though the protocol actually
uses HTTP as a transport mechanism between digital collections.

OAI-PMH origined in a 1999 meeting in Santa Fe, New Mexico, to address a
series of problems that were occurring in the e-print server world.

As disciplinary e-print servers became more common, it was difficult to support
searching across multiple repositories.

Repositories needed greater capabilities to automatically identify and copy
papers that had been deposited in other repositories

The solution was the definition of an interface to permit an e-print server to
expose metadata for the papers it held. This would allow the metadata to be
picked up by programs on the Web called
harvesters.

Harvesting
programs travel around a network gathering, or harvesting, content by
copying it to a central site.
33
OAI-PMH
(continued 1)

OAI-PMH divides the world into
data providers
and

service
providers

Registered OAI-PMH
data providers
:

http://www.openarchives.org/Register/BrowseSites

Data providers
who support the OAI-PMH may choose to list their
repository in the OAI registry, which serves to

Provide a publicly accessible list of OAI conformant repositories, making it easy
for
service providers
to discover repositories from which metadata can be
harvested. Repositories may also wish to expose a
friends container
as part of
their
Identify
response as a parallel means for guiding service providers towards
repositories from which metadata can be harvested.

Provide a mechanism for data providers to ensure their conformance with the
OAI-PMH specification.

Provide a means for the OAI to monitor use of the protocol and plan future
activities and strategies.
34
OAI-PMH
(continued 2)

Registered OAI-PMH
service providers
:
http://www.openarchives.org/Register/BrowseSites

As of Feb 9, 2009, there are 959 OAI conforming repositories.

The concept is that
service providers
add value to the data they harvest by
defining search engines and other applications.

Although other metadata schemes can be specified, OAI-PMH mandates
that
Dublin Core

be available.

OAI is purposely designed to be "low barrier" to developers. Relatively
simple criteria are used for harvesting:

date stamps, which identify when resources have last been modified, and

sets, which group together records based on criteria defined by the data provider.
Main Technical Ideas of OAI-PMH (1)

The main ideas of OAI

world-wide consolidation of scholarly archives

free access to the archives (at least: metadata)

consistent interfaces for archives and service provider

low barrier protocol / effortless implementation (e.g., because based on HTTP, XML, DC)

Basic functioning of OAI-PMH

Data Providers
(open archives, repositories) provide free access to metadata, and may, but
do not necessarily, offer free access to full texts or other resources. OAI-PMH provides an
easy to implement, low barrier solution for Data Providers.

Service Providers
use the OAI interfaces of the Data Providers to harvest and store
metadata. Note that this means that

there are no live search requests to the Data Providers; rather, services are based on the harvested
data via OAI-PMH.

Service Providers may select certain subsets from Data Providers (e.g., by set hierarchy or date
stamp).

Service Providers offer (value-added) services on the basis of the metadata harvested, and they may
enrich the harvested metadata in order to do so.
Main Technical Ideas of OAI-PMH (2)

OAI-PMH: overview and structure model


OAI-PMH supports six request types (known as "verbs"), e.g.,
http://archive.org?verb=ListRecords&from=2002-11-01.

Responses are encoded in XML syntax. OAI-PMH supports any metadata format encoded
in XML. Dublin Core is the minimal format specified for basic interoperability.
Data Provider: prerequisites
These are the things you must, should, or may have in place in order to implement
OAI-PMH as a Data Provider:

metadata
on resources ("items")
These should be stored in a database (such as an SQL database). A file system may be
necessary. It is necessary to have a unique identifier for each item.

Web server
, accessible via the Internet, e.g. Apache, IIS

programming interface / API


e.g. Perl, PHP, Java-Servlet

web server extension

access to database (or filesystem)

not needed: session management

archive identifier / base URL


unique identifier for each item


metadata format
(one or more; at least: unqualified Dublin Core)

datestamps for metadata
(created / last modified)

logical set hierarchy
(may have)
This is most usefully by agreement within communities, especially subject communities

flow control
by implementation of resumption token (optional, but 'larger' repositories
should have it)
Data Provider: components and architecture

Components
:

Argument Parser
validates OAI requests.

Error Generator
creates XML responses with encoded error messages.

Database Query / Local Metadata Extraction
retrieves metadata from the
repository, according to the required metadata format.

XML Generator / Response Creation
creates XML responses with
encoded metadata information.

Flow Control
realises incomplete list sequences for 'larger' repositories. It
uses resumption token as the control mechanism.
This diagram illustrates an
example
architecture
for a
Data Provider
Service Provider: prerequisites

There are three technical infrastructure prerequisites for
implementing an OAI-PMH
Service Provider
that will
harvest metadata from Data Providers via OAI-PMH:

an Internet-connected server

a database system
(relational or XML)

a programming environment.
(The programming environment must be one that can issue HTTP
requests to web servers, can issue database requests, and includes an
XML parser.)
Service Provider: components and architecture


Archive management
involves the selection of repositories to be harvested. Entries to your list of repositories to be harvested may
be made manually or you can automatically add or remove archives using the official registry.

Request Component
creates HTTP requests and sends them to OAI repositories (Data Provider). It demands metadata using the
allowed verbs of the OAI-PMH. It may do selective harvesting using the
set
parameter.

Scheduler
realises timed and regular retrieval of the associated archives. The simplest case would be manual initiation of the jobs,
but this can be automated, e.g., as a cron job.

Flow Control
is implemented via resumption token, partitioning of the result list into incomplete sections with a new request to
retrieve more results. An HTTP error 503 (service not available) allows analysis of the response to extract a “retry-after” period.

Update Mechanism
realises the consolidation of metadata which have been harvested earlier (merge old and new data). The
easiest case would be to delete all ‘old’ metadata from each repository before harvesting it again. A reasonable alternative is to do
an incremental update (
from
parameter) – insert
new
metadata and overwrite
changed / deleted
metadata (assignment using the
unique identifiers).

XML Parser
analyses the responses received from the repositories, with validation using the XML schema, and transforms the
metadata encoded in XML into the internal data structure.

Normaliser
transforms data in different metadata formats into a
homogenous structure. It harmonises representation of, for
example, date, author, language code. It may map between or
translate different languages.

Database
receives the output of the normaliser mapping the XML
structure of the metadata into a relational database that will handle
multiple values of elements. An alternative is to use an XML
database.

Duplication Checker
merges identical records from different data
providers. One possibility for implementing this is by the unique
identifier for each item (for example, by URN). However, this
solution is often not easily practicable and is not risk or error free.

Service Module
provides the actual service to the 'public'. The
basis for a service provided is the harvested and stored records of
the associated archives. That is, it uses only the local database for
requests etc., and thus it does not make calls on the Data
Providers during operation.
Basics of XML schemas for OAI-PMH

OAI-PMH uses XML Schemas to define record formats.

OAI-PMH allows for any metadata format, so long as it is encoded in XML
with an XML Schema.

You can exchange any metadata you like using OAI-PMH as long as you
can encode it as XML and define an XML Schema for it.

OAI-PMH mandates the
oai_dc
schema as a minimum standard for
interoperability.

All repositories
must
support oai_dc for a minimum level of interoperability.

If oai_dc does not have enough elements, you can extend it.

If oai_dc is not precise enough, a qualified Dublin Core schema can be used.

If oai_dc is not the right schema for your community or purpose, then use
something else as well.
42
OAI Software and Tools

There are many OAI tools available. The following table contains links to tools
implemented by members of the Open Archives Initiative community:
http://www.openarchives.org/pmh/tools/tools.php
DSpace
HP Labs and
MIT Libraries
DSpace is an open source digital asset managment software platform that
enables institutions to capture and describe digital content. It runs on a variety of
hardware platforms and supports OAI-PMH version 2.0.
eprints.org
University of
Southampton
Software to run centralised, discipline-based as well as distributed, institution-
based archives of scholarly publications. The software is OAI compliant, i.e.
metadata can be harvested from repositories running the software using the OAI
metadata harvesting protocol.
Fedora
Cornell
University
An open source digital repository architecture that allows packaging of content
and distributed services associated with that content.  Fedora supports OAI-PMH
requests on content in the repository.
MARCXML

framework
Library of
Congress
A suite of tools, stylesheets, guidelines and XML documents to support MARC21
records in the XML environment. Includes Universitytools to support
transformation/migration from oai_marc to MARCXML, including an XML schema
for MARC21 records.
OAI Software and Tools
(cont)

The tools you choose will depend on such considerations as the
type of repository or service you are implementing and the technical
skills available to you in-house:

if you are setting up an e-print archive

you may want to consider
using the
EPrints
software package,

DSpace
provides a digital asset management framework that includes
preservation considerations, and

the advantage offered by
PHP OAI Data Provider
is support for on-the-
fly output compression aiming at a significant reduction in data transfer
load.

In addition, about
thirty
OAI-related tools are described in the OA-
Forum
Final Report on Technical Issues
(download from
http://www.oaforum.org/documents/). This report also includes a
detailed comparison of GNU EPrints and DSpace.
44
Discussion and Reflection

Issues raised in this reading

How such issues are addressed in your
DL case