DSpace: Technical Basics

quicksandwalleyeInternet και Εφαρμογές Web

31 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

101 εμφανίσεις

DSpace: Technical
Basics

Iryna Kuchma

Open Access Programme Manager



www.eifl.net

Attribution 3.0 Unported

Application Architecture

The DSpace system is organised into
three tiers
which consist of
a number of
components








Each layer only invokes the layer below it i.e. the application layer
may not used the storage layer directly

The Storage Layer




The
storage layer
is responsible for
physical storage
of
metadata

and
content


DSpace uses a
relational database
to store all information about
the organization of content, metadata about the content,
information about e
-
people and authorization, and the state
of currently
-
running workflows.

The Business Logic Layer






The
business logic layer
deals with
managing

the
content

of the
archive,
users

of the archive (e
-
people),
authorization
, and
workflow

The Application Layer



The
application layer
contains
components

that
communicate

with the world outside of the individual DSpace installation,
for example the Web user interface and the Open Archives
Initiative protocol for metadata harvesting service

The
DSpace Web UI
is the
largest

and
most
-
used

component in
the
application layer.
Two versions:

1.
JSPUI: Built on Java Servlet and JavaServer Page technology

2.
XMLUI (Manakin): Built on XML and Cocoon technology


Server Architecture

Web Application Server

User Interface












These systems may reside on a single server or be hosted
separately on dedicated servers


Structural Overview

DSpace is split into three directory trees:

Source Directory [dspace
-
src]


Surprisingly, this is where the source code resides

Install Directory [dspace]


Populated during install & during normal operation


Contains:


Configuration files


Command line tools


Libraries


DSpace archive (depending on configuration)

Web Deployment Directory [tomcat]/webapps/dspace


Contains the JSPs and Java classes and libraries necessary to run
DSpace

Persistent Identifiers

The use of location based identifiers such as the Uniform
Resource Locator (URL) often leads to problems in accessibility
to resources with time

Often when accessing a resource via a hyperlink users receive a
“404
-

page not found” error

Persistent identifiers are an attempt at solving the issues
surrounding resource identification and long term
preservation

A persistent identifier allows the resource to be uniquely
identified in a way that will not change if the resource is
renamed or relocated


Persistent Identifiers

This means that a resource can be reliably referenced for future
access by humans and software


Caveat: Persistence is heavily dependant on organisation policy
i.e.
persistence of an object is only effective if an organisation
maintains and manages this persistence


Different systems in use for persistent identifiers


Persistent Uniform Resource Locators (PURLs)


Digital Object Identifiers (DOI)



Handle


Used by DSpace




The Handle


In a handle system, resource address is identified by a unique handle
assigned by a common registration service








Registration Service

Handle Prefix

Local Identifier

http://hdl.handle.net

2160

568

http://hdl.handle.net/2160/568

Practical: Using a Handle


Navigate to Aberystwyth’s DSpace repository


Cadair


Select an item from a collection and note the handle address



Open this address in a new browser window




The handle will resolve an redirect back to your original item

Configuring the Handles service

Out of the box, a DSpace installation will use the handle:

hdl:123456789

These aren't really Handles, since the global Handle system
doesn't actually know about them


3 Steps to handle configuration




Configuring the Handles service

In order to use handle in DSpace, registration for a prefix with the
Corporation for National Research Initiatives (CNRI) is required


How to register with CNRI?


Complete the registration form on the CNRI website


Create & Upload the sitebndl.zip to CNRI


Pay a small annual fee


http://www.handle.net/service_agreement.html


Generating the sitebndl.zip

The Site Bundle is an archive which contains information about
your DSpace installation and is used to generate your handle

To generate the sitebndl.zip run the command:




[dspace]/bin/dsrun net.handle.server.SimpleSetup



[dspace]/handle
-
server

You will be required to complete a series of questions

Once completed the sitebndl.zip can be found:



[dspace]/handle
-
server/sitebndl.zip

Complete the registration and upload the sitebndl.zip

Configuring the Handle Server

Once registration is complete, a handle should be returned from
CNRI


Edit the
[dspace]/handle
-
server/config.dct
to include the
lines in the

server_config

clause:

"
storage_type" = "CUSTOM"

"storage_class" = "org.dspace.handle.HandlePlugin”


Update all references to
YOUR_NAMING_AUTHORITY

to your
assigned handle:

300:0.NA/YOUR_NAMING_AUTHORITY
-
>

300:0.NA/2097

Configuring the Handle Server

Updating the Handle Prefix

Edit
[dspace]/config/dspace.cfg
and update the handle prefix




A restart of Tomcat will be required

If items have already been deposited into DSpace their handle
will need updating

[dspace]/bin/update
-
handle
-
prefix 123456789 YourHandle


Starting the Handle Server

Finally start the handle server


[dspace]/bin/start
-
handle
-
server


A script will be required to automate the starting of the handle
server upon a server boot


Once configured the handles should resolve as the practical
demonstrated earlier in this module

Workflow
scenarios

Scenario 1: Head of research

I want to be able to see everything my
researchers deposit for quality control purposes

Workflow
scenarios

Scenario 2: Repository manager

I want to approve everything that goes in to the
repository to make sure there are no copyright
issues or bad metadata

Workflow
scenarios

Scenario 3: Cataloguer

I want to be able to see everything my
researchers deposit for quality control purposes

The three
workflows

DSpace has three workflow steps

1.
Accept/Reject Step

2.
Accept/Reject/Edit Metadata Step

3.
Edit Metadata Step


You can use any combination of the three


Steps are worked through in order

Which might be used in each of the previous
scenarios?

RSS feeds

RSS feeds


Site level (all new items)


Community level (new items in all contained
collections)


Collection level (new items in that collection)

Can be read in modern web browsers

Can be subscribed to in news reader
software


Alerts

Alerts


Created by users


Created for a collection


Emails sent each day for new items


Script must run daily:


[dspace]/bin/sub
-
daily

DSpace statistcis

DSpace statistics:


Collated from DSpace log files


Reports generated daily (daily and monthly
reports)


http://dspace.example.com/dspace/statistics


Or via the Administer menu


Can be private (must be logged in) or public


In dspace.cfg:


report.public = [true|false]

Statistics
collected

The following statistics are collected


General overview (e.g. number of items
archived / number of item views / user logins)


Archive Information (numbers of each type of
item)


Item view counts


Actions performed


Search terms used

Google Analytics

Google Analytics allow a richer and more
detailed suite of statistics


Time visitors spent on the site


Where they came from


Terms they used in search engines to find items


The geographic location of visitors


How many pages they looked at


Which pages they started and ended their visit on


JSPUI requires a small code change, Manakin
has a configurable option.

Credits

These slides have been produced re
-
using
The DSpace Course by:


Stuart Lewis & Chris Yates



Repository Support Project

http://www.rsp.ac.uk/



Part of the RepositoryNet



Funded by JISC

http://www.jisc.ac.uk/

Thank you! Questions?