Copying Archives Project Proposal

bewgrosseteteΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 4 χρόνια και 20 μέρες)

82 εμφανίσεις







Copying Archives
Project Proposal

Ngoni Munyaradzi (MNYNGO001)

Mushashu Lumpa (LMPMUS001)


1.

Project Description

The Copying Archives project is about developing a mechanism to interconnect

various

digital

systems
.

Digital

systems to be considered are DSpace, EPrints and LOCKSS.
DSpace and
EP
rints are primarily repository systems,

and L
OCKSS a preservation system.

Having a way to
interconnect these would make it possible for institutions to take advantage of the different
ca
pabilities that each of the systems offer with little m
anagement overhead
.

The objective will be to develop a common generic interchange package that will be a mode used
to transfer data from one system to another. The interchange of this package among the

systems
will in a way
,

be
built
on the Open Archives Initiative Protocol for Metadata Harvesting (OAI
-
PMH) standards. Plug
-
ins to ing
est this common format into the systems will be devel
oped.
These plug
-
ins should

necessitate for the update of remote
ly

and local
ly located

systems’ digital
content.

The plug
-
ins will also be a means to export digital content to the common package.

2.

Problem Statement

2.1 Research Questions

The main research questions that are to be answered at the completion of the project
are:



Is it possible to use a generic package format for batch and incremental import/export
from various repositories systems that

is both accurate, efficient and secure
?



Is it possible to create an accurate and efficient batch and incremental import and e
xport
interface for LOCKSS, utilising the generic package format?



Is it possible to create an accurate and efficient batch and incremental import and export
interface for DSpace and EPrint
s

using a
generic package format
?

2.2

Motivation

Projects
like
StoneD
[1]and TIPR[2] highlight the need to have interoperable heteroge
neous
systems. The development of the OAI
-
PMH is another indicator of the need of digital systems
interoperability. Achieving digital systems interoperability has the benefit of contribu
ting to the
preservation of the digital content [9], an important aspect in the management of digital systems.

Enabling interoperability of heterogeneous systems

allows for institutions to take advantage of
the varying capabilities of the digital systems available.

Research proposals to interoperate digital systems[8][10][11][12] are another indication of the
need, and the motivation for this project, as its succe
ss can be a notable contribution to the digital
libraries community.

2.3

Requirements

The Copying Archives project is both a software engineering and a research project. This is
because it seeks to provide a solution to Stellenbosch University’s library department and at the
same time attempt to answer some research questions as listed abov
e. The systems requirements
are:



Develop a mechanism that
interoperates

LOCKSS
, EPrints

and DSpace in a distributed
environment.

Meeting this requirement will mean that digital content can be migrated
from one of the digital systems into another
.




Migrated

digital content to maintain their properties in the target systems. This will make
sure that rights of the digital objects remain the same and are treated in the way as the
original system, that is embargoed resources will remain embargoed from system to
system.



Digital content interchange among the systems
should require very minimal to no user
intervention. This is to make the developed mechanism less laborious to

manage and
user friendly. Librarians with little technical knowledge will be managing this

system.


3.

Procedures and Methods

The digital systems will be made interoperable through the use of a generic interchange package.
The digital systems will export their data into a

format

that conforms to the predefined generic
interchange package.
Reasons based on the architectures of DSpace, EPrints and LOCKSS, the
project will be divided into two components namely, LOCKSS to a Common Format, and
EPrints and DSpace to a Common Format.

3.1 LOCKSS to Common Format

A plug
-
in in the LOCKSS system will
be developed that will export data to a common format. It
will also handle the injestion of the same package back into LOCKSS. The plug
-
in will also
handle the communication of remotely and locally placed systems (DSpace to and from
LOCKSS and EPrint to an
d from LOCKSS communication).

3.2 DSpace and EPrints to a Common Format

As in 3.1 above, plug
-
ins will be developed for DSpace and EPrints that will export data into the
common
predefined package, and also ing
est a similar package back into DSpace and EPr
ints.

Overall, the different systems relationship will be as shown in figure 1, which shows that
DSpace, EPrints and LOCKSS will export and in
g
est data that is based on a common interchange
package.


Figure
1
. Copy
ing
Archives
Components

and their relationship


3.3
Evaluation

The system’s main objective is to transfer digital objects from one system to the other. Tests to
be carried out on the system will include:



Data migrations consistency and integrity

test
: this will seek
to verify if there has been any
attributes about the digital object that has been lost. This will also reveal that embargoed
items remain embargoed across all systems.



Efficiency tests:

the time to migrate the digital contents from one system to the other
will be
tested. Since the migration of content will be in two modes, online
-
incremental mode and
one
-
off
full data
migration mode, test will be carried out ascertain that both do not take
unacceptably
too long to conclude the tasks
.



System usability tests:

the requirement to have the system easily manageable by less
technical personnel will be tested by observing and interviewing the testers on the
experiences with using the systems.

Success of the test will mean that the developed solution not only solves
the problem but also is
acceptable to the clients.


4.

Ethical, Professional and Legal Issues

Usability testing will be done for the Copying Archives project with users. Ethical clearance for
doing these test will be sought, from relevant
authorities
.

As the project entails working with
resources obtained from Stellenbosch University, there is

a

need to manage the privacy of their
data. The data stored on the LOCKSS system will only be viewed by people directly involved in
the Copying Archives project.

Some of the data will be embargoed content
-

content not ready for
public viewing. Details about the embargoed material should be kept private, so that Intellectual
Property rights are not violated. Professional conduct has to be maintained throughout the
project
life cycle, as with all other software projects.


5.

Related work

The need to have heterogeneous interoperability

in digital repositories

has been of importance to
a number of institutions.

T
he University of Wales Aberystwyth implemented a system to
enable interoperability

among

4
systems. The systems run on DSpace and Fedora. Both these
systems implement the OAI
-
PHM
[7
] protocol. Their main challenge was in handling embargoed digital objects, a challenge they
did not meet at all

[6]
.

In [8], a protoc
ol
is proposed that interoperates between heterogeneous and homogeneous digital
systems. This is suited for systems that are compliant with the OAI protocol. This protocol
however does not provide a way to interoperate with LOCKSS, a system that is not ful
ly OAI
compliant.

System packaging formats have been proposed that include METS, PREMISE, DIDL MPEG
-
1
etc in the q
uest to ease the interoperability of repositories
[13]. The
se are

formats referred to as
complex metadata formats as they are

more

e
xpressive f
ormats and make it easy to represent
digital content.


Manual approaches have also been explored, using techniques that utilise the built
-
in import
export functions of the digital systems. This has made migration of data a lot more laborious and
complex

[1
4].





6
. Anticipated Outc
omes

6.1
System components and Design challenges

Below is a list of key features and software that should be contained in the overall project
design. These key features will be implemented in the form of plug
-
ins to Dspace,
Eprints
and LOCKSS. This is illu
strated in F
igure 1.




Implementation of export functionality on Eprints




Implementation of export functionality on Dspace




Implementation of export functionality on LOCKSS




Implementation of ingest/import functionality on Eprints




Implementation of ingest/import functionality on Dspace




Implementation of ingest/import functionality on LOCKSS

The design o
f the common interchange format

will most likely be a challenging part o
f the
project. This is because there is no existing format that is readily available for use in the
project. Hence there is need to

determine
on a feasible mechanism that can be implemented
for LOCKSS, Dspace and Eprints.


6.2 E
xpected impact of Project

On completion of the Copying Archives project the team expects to have managed to implement
a
system that

provides the capability of performing repository

to
-
repository content transfers.
Enabling interoperability with LOCKSS, organisations
running Dspace

or Eprints can take
advantage of the preservation capabilities of LOCKSS. The capability to interchange content
between
repositories will

also be surety that data

is safe. Overall, this project

success will
contribute greatly to the field of digital libr
aries in the area of preservation and migration, and
will allow more institutions to preserve their content more reliably.


6.3
Key success Factors

The key success factor
s

are

based on whether the researc
h questions stated in section
3.1
have

been me
t,

specifically successful
:



import and export plug
-
ins on Dspace, Eprints and LOCKSS



implementation of incremental updates of the

content



design of a generic transfer format



7
. Project Plan

7.1
Risks and Risk Mitigation

Risk is a factor that has to be taken

into consideration in any project development. Risk
mana
gement plans have to be drafted

to avoid the noted risks occurring. Risk mitigation is
the
continual

process of steps taken to reduce or eliminate the identified risks. Some of these risks
and mitig
ation strategies have been outlined in Appendix A.


7.2
Timeline, including Gantt c
hart

See
appendix B


7.3
Resources required for the Copying Archives project



A running LOCKSS system instance



Dedicated personal computers for running Eprints and Dspace



Dspac
e running instances



Eprints running instances



Dependency software e.g. Apache, Tomcat, Postgrel, MySQL and PERL

The LOCKSS network system has been provided for by Stellenbosch University. The Dedicated
personal computers will be available for development
from the Digital Libraries Laboratory,
through Hussein Suleman.


7.4
Deliverables and Milestones

More detail o
f task break
-
down
provided

in G
antt chart.

1.

Project group formed and pref
erences indicated



01 April

2.

Projects Allocated








12 April

3.

Team
Roles determined







16 April

4.

Literature Survey Due







3 May

5.

Project Proposal

due, including project plan



12 May

6.

Presentation of Project Proposals




17 May

7.

Installation of development software




30 May


8.

Revised Proposal
Finalised






31 May

9.

Project Web Prese
nce: Proposal, timeline/plan



01

June




10.

Feasibility
project prototype and testing




30 May
-
04 June

11.

Design of prototype plug
-
ins





30 May

12.

Implementation







31 May


2 June

13.

Testing and Completion






2 June


3 June

14.

Initial feasibility demonstration





4 June


15.

Se
mester break development





13

July
-
25 July

16.

Export functionality design





13 July
-
15 July

17.

Implementation







17 July
-

19 July

18.

Testing








19 July


20 July

19.

Iteration #2







20
July
-
22 July

20.

Testing and
Competition






23 July


25 July

21.

First project prototype






26 July

22.

Second project prototyping and testing




27 July


27 September

23.

Import functionality design





27 July


29 July

24.

Implementation







5 August


13 August

25.

I
teration #3







13

August


16

August

26.

Component integration and Usability testing



9



19

September

27.

First Implementation/Experimentation/Performance Test


20 September

28.

Final Prototype/Experiment/Performance



29 September

29.

Project Report Final Hand
-
in





01 November

30.

Poster Due








04 November

31.

Web Page








08 November

32.

Reflection Paper







12 November

33.

Project Demonstrations and Open Day




03
-
08 November

34.

Final Project Presentations





18
-
19 November


7.5
Work Allocation to team members


This project is broken
down into two parts:


1.

Part one is the development of component
s for the LOCKSS system network
-

refer to
Figure 1.

2.

Part two
is the

development of components
for Eprints

and Dspace
-

refer to Figure 1.


Mushashu Lumpa

will be focusing on the development of the components needed for Eprints and
Dspace, to export and import the common format. Ngoni Munyaradzi will focus on the
development of components for the LOCKSS network, to generate and ingest the common
format. The

individual parts developed by both project members will then be integrated to
produce a functional system.










References

[1]

Witten IH, Bainbridge D, Tansley R, Huang CY, Don K, and Hamilton NZ. A bridge
between greenstone and DSpace. D
-
Lib Magazine
2005; 11: 1082
-
9873.

[2]

Gutteridge C. GNU EPrints 2 overview. 2002; .

[3]

Magazine DL. Repository to repository transfer of enriched archival information packages.
D
-
Lib Magazine 2008; 14: 1082
-
9873.

[4]

Reich V, Rosenthal DSH. Lockss

(lots of copies keep stuff safe). New Review of Academic
Librarianship 2000; 6: 155
-
161.

[5]

Smith MK, Barton M, Bass M, Branschofsky M, McClellan G, Stuve D, Tansley R, and
Walker JH. An open source dynamic digital repository. D
-
Lib Magazine 2003; 9: 1082
-
98
73.

[6]

Jonathan Bell, and Stuart Lewis,’Using OAI
-
PMH and METS for exporting metadata and
digital objects’.

http://www.emeraldinsight.com/Insight/ViewContentServlet?contentType=Article&Filename
=/published/emeraldfulltextarticle/pdf/2800400307.pdf (accessed on

06 May,2010)

[7]

Lagoze, C. and De Sompel, H. V. 2001. The Open Archives Initiative. In Proceedings of the
1st Joint Conference on Digital Libraries (JCDL’2001) (Roanoke, Va.). 54

62.

[8]

Yang Z and Airong J,’A Digital Resource Harvesting Approach for Distributed

Heterogeneous Repositories’, 9
th

International Conference on Asian Digital Libraries,
ICADL 2006, Kyoto, Japan, November 2006 proceedings.

[9]

Andrew W, Ross W, Brendan H and Jon Dell’oro.2000. Preservation Digital Information
Forever. In Proceedings of the

5
th ACM conference on Digital libraries














A
ppendix A:

Risks and Risk Management


Risk: Project implementation going out of scope

Likelihood: low

Impact: Can possibly delay the delivery time of the project

Mitigation: Frequently meet with
supervisors and the clients to determine progress.


Management: Quickly re
-
divert project development to required path.


Risk:
Failure to meet project deadlines

Likelihood: low

Impact: Failure to complete project

Mitigation: Accountability of each project member, frequent progress checks.


Management: Discuss the way
forward with

project supervisors.


Risk: Loss of work due to system crashes

Likelihood: low

Impact: Drastically delay project pace

Mitigation
and Management: Acquire back
-
up hardware resources. Continue
development of project on back
-
up hardware resources.


Risk: Inadequate Knowledge

Likelihood: low

Impact: Slow development rate

Mitigation: Frequent group meetings to check if each project partner knows what
they have to develop.

Management: Seek help from other people knowledgeable in the field.










Risk: Outside Interruptions (e.g. ill health, dropping out of honours)

Likelihood: medium

Impact: possibility of failing to complete the whole project

Mitigation: There are no means of avoiding such a risk.

Management: Each team member should design their own
stubs

of the other team
member’s systems
.


Risk: Lack of
resources

Likelihood: low

Impact: Could potentially slow project pace

Mitigation: Project team should make sure that they have acquired all required
resources before development time.

Management: If risk becomes reality then we probably have to rethink

overall
project outcomes or implement partial functionality.