TR10 - DigiTool 3.0

clappingknaveΛογισμικό & κατασκευή λογ/κού

14 Δεκ 2013 (πριν από 3 χρόνια και 10 μήνες)

96 εμφανίσεις

Repository

DigiTool Version 3.0

Repository

2



Main characteristics:

Repository


Stand
-
alone component with standard API (Web
Services)


File storage: local (NFS), remote (URL)


Metadata storage in “XML boxes” based on the metadata
type (e.g. descriptive, technical)


Decomposition of compound objects (e.g. METS) to
facilitate long term preservation


Local storage is controlled by storage rules


Responsible for storage and management of objects


Repository

3

Deposit

Approval

Search

&

Index

Dispatcher

&

Viewers

Single

&

Bulk

Web Services

DigiTool Modules

Repository

4

An Object


An Object in a DigiTool repository is comprised of several
components:

A persistent ID of an object

Metadata of various types, connected to an object

DataStream, which points to the contents of an object

PID
-
Persistent ID

Metadata

DataStream

Technical MD

Rights MD

Preservation MD

Technical MD

Technical MD

PID
-
Persistent ID

Metadata

DataStream

Technical MD

Repository

5

Persistent ID
-

PID

Each object in repository has a unique, persistent
identifier.

Repository

6

Metadata
-

MD


Descriptive metadata

facilitates discovery,
identification, and selection (for example: MARC ,
Dublin Core).




Administrative metadata

aids in the management of
resources and includes Access Rights MD,
Preservation MD, and Technical MD, describing the
physical characteristics of a resource.




Structural metadata

describes the internal structure
of complex objects.

Repository

7

Access Rights

Determines who is allowed to use the objects and under what
conditions.

DigiTool allows the storing of license information as well as the
management and enforcement of common prevailing rights (access
rights).

The relevant editing format for rights metadata in DigiTool is
rights_md metadata in the XML format, attached or linked to the
object’s PID.

Each rights_md consists of:

Copyrights statement

Access rights conditions, logically combined with “OR” or expressions,
logically combined with “AND”.

The access rights and copyrights conditions of the digital entity are
defined during the deposit or ingest procedures, according to the
options presented in the administrative unit or edited in the
DigiTool Meditor GUI during object’s editing.

Repository

8

Copyrights

A copyright statement is presented to the user prior to
the viewing of any object.



The institution is able to maintain different types of
copyright notices and to specify the notice that will be
displayed for a given object.


Most institutions have a set of coherent policies
governing intellectual property subject law. The
Copyright check routine provides the institution with a
helpful tool to support the Institution’s Copyright
policy.

Repository

9

Deposit Access Rights and Copyright

Copyright Files

All copyright files are located under
j_home

in
the
/profile/conf/copyrights

directory.

Copyright availability and options are global in the system.



Access Rights Routines

Access rights pre
-
definitions for the Deposit module are located
under
j_home

in the
/profile/units/ADM
01
/conf/accessrights

and
activated in the
unit_configuration.xml

file located in the
administrative unit’s
/profile/units/ADM
01
/conf/
.

Each administrative unit in DigiTool may have different access
rights routines set that can be used when building deposit
workflows.

Depending on the administrative unit the staff user is
representing when building deposit workflows, the relevant
access rights of that unit will be available for use only.

Repository

10

Access Rights in Deposit


The access rights and copyright template are associated
a with a Material Flow under Deposit Management
Interface:

Repository

11

Creating Access Rights Metadata


The access rights conditions of the digital entity can
be defined in Meditor.

Repository

12

Access Rights and Copyrights Forms

The access rights and copyrights conditions of the
digital entity can be defined using Meditor Access
Rights Forms.

Repository

13

Access Rights (rights_md) XML

<ar:access_right_md
xmlns:xs="http://www.w
3
.org/
2001
/XMLSchema"
xmlns:ar="http://com/exlibris/digitool/repository/api/xmlbeans"
targetNamespace="http://com/exlibris/digitool/repository/api/xm
lbeans">


<ar_copyrights required="true">


<text_file>copyrights
1
</text_file>


</ar_copyrights>


<ar_conditions>


<ar_condition>


<ar_expressions>


<ar_expression ar_operation="eq">


<key>group</key>


<val
1
>
6
</val
1
>


</ar_expression>


</ar_expressions>


</ar_condition>


</ar_conditions>

</ar:access_right_md>

Repository

14

Preservation Metadata


The types of information enumerated above address two
functional objectives:



1
) providing preservation managers with sufficient
knowledge to take appropriate actions in order to
maintain a digital object’s bit stream over the long
-
term.



2
) ensuring that the content of an archived object can be
rendered and interpreted, in spite of future changes in
access technologies.

Preservation metadata is intended to support and
facilitate the long
-
term retention of digital information.

Repository

15


This schema addresses pure preservation
metadata elements such as hardware and software
environments, installation requirements, access
inhibitors and facilitators.




Since there are no firm standards evolved for
preservation metadata, DigiTool uses a proprietary
schema based on the metadata elements that were
published by the PREMIS working group as a part of
the initial completion of the PREMIS data dictionary.



The preservation metadata covers the elements
that are described in the data dictionary as part of
Object entity.


Preservation Metadata

Repository

16

Technical Metadata

Technical Metadata includes information about the
physical description of the digital object.


Four types of technical information are available in
DigiTool
3.0
: image, text, video and audio metadata.


Technical Metadata extraction upon ingest.


Documented technical information for preservation
management.

Repository

17

Structural Metadata


Structural Metadata is the information that ties the components of
a complex or compound object together or holds relations among
objects’ manifestations and supports navigation among them.


For example: turning pages of a book, jumping to a particular
chapter or page, or switching between images and corresponding
text.

Repository

18

Object Relations

The repository stores the
includes, part_of and
manifestation

relations in the object relations table for the
main three purposes:


Provide the data

completeness in the system.

The system
does not allow to delete objects with relations unless
relations are removed.


Support basic relation management:


For given object id return its members.


For given object id return its manifestations.


Provide Metadata linking

Repository

19

DataStream

DataStream points to the contents of an object:


For a simple object (jpg, txt file), the DataStream is the file itself.


For a compound object (a book, represented by a METS file, an
EAD file, etc.), the DataStream would be an XML file pointing to
other objects in the repository.

PID
1000

Metadata

DataStream (XML)

PID
1001

Metadata

DataStream (tiff)

Simple Object

DataStream

PID

Metadata

tiff

Compound Object

PID
1002

Metadata

DataStream (tiff)

Repository

20

Title
: Spring

Creator
: Jones, John

Subject
: Scenery

Subject
: Flowers

Subject
: Seasons

Description
: Kids in a flowering field

Descriptive Metadata

File

Rights holder:

Contact address:

Permission type:
Display


Software environments :

Installation requirements :

Access facilitators
:

Process type
: Migration

Process purpose
: Format obsolete

File name:
spring
981182
.jpg

File size:
780
KB

File type
: Image

File format
: jpeg

Checksum
:
9811821243567643

Administrative Metadata

Technical

Preservation

Rights

Simple Object

Repository

21

Simple Object Representation

OBJECT PID

Technical NISO

File

Preservation

Descriptive DC

Descriptive MARC

Persistent ID


Metadata records can be shared between objects


no
data duplication

Repository

22

XML representation of an object

Repository

23

Manifestations


DigiTool manages manifestation relations
among objects


All manifestations can share the same
descriptive metadata but have different
access rights or the vice versa.


Manifestations are usually different
representations of the same object but this
is not compulsory


Repository

24

Manifestations

Hi Resolution
JPEG
2000

Low Resolution
JPEG

Digital Master
TIFF

Thumbnail

Thumbnail

Thumbnail

Descriptive
metadata

Descriptive
metadata

Descriptive
metadata

Access Rights

Access Rights

Access Rights

Repository

25

Compound Objects

Decomposition of compound objects (book) into atom
units (pages).

The Repository maintains clear records of such
relationships as part of the compound object
decomposition process.

The Repository does not “understand” the structure of
the compound object (e.g. the order of the pages in the
book); this information is stored in the Repository as
part of the compound object metadata. This approach
is essential to ensuring that the handling of compound
objects by the Repository remains generic.

METS objects can be decomposed and managed in the
Repository in the same way, regardless of its internal
structure.

Repository

26

METS File

Tiff File

Descriptive DC

Descriptive MARC

Technical

Access Rights

OBJECT PID

1001

Descriptive DC

Descriptive MARC

Technical

Access Rights

OBJECT PID

1002

Descriptive DC

Descriptive MARC

Technical

Access Rights

OBJECT PID

1000

PID
1001
= Cover

PID
1002
= Page
1

PID N = Page N

METS multi
-
page book example: a METS file includes
2
Tiff files
for each page of the book

Advantage: simple and precise preservation of single pages

1001
is part of
1000

1002
is part of
1000

Management of Compound Objects


Tiff File

Repository

27

METS File

Tiff File

Descriptive

Technical

Access Rights

OBJECT PID

1001

Tiff File

Descriptive

Technical

Access Rights

OBJECT PID

1002

Descriptive

Technical

Access Rights

OBJECT PID

1000

Metadata Linking

Advantage: easy metadata creation and maintenance

In most cases m
ultiple digital entities can “point” to the same

Descriptive, Preservation or Access Rights metadata for the
whole book

Descriptive

Access Rights

Repository

28

Repository Architecture



The repository architecture enables a sufficient control of the information
provided to the level needed to ensure a long term preservation.

The repository provides access and management services for digital objects.

The repository consists of:


Storage of files that are pointed by objects on the file system.


The objects representation


that is oracle tables holding the information.


Relations table


an auxiliary oracle table to be used for fast retrieval of
relations and deletion purposes . The table contents can be derived from the
objects at anytime.


Repository uses the Web Services technology:


Web Services are defined as a distributed application that run over the
internet.


Web Services are typically configured to use HTTP as a transport protocol for
sending messages between the different parts of the application.


The use of XML is a key feature for Web Services applications.

Repository

29

Repository: Tiered Design

Metadata

Objects

Standard Web Services
-

SOAP

Digital
Entities/Metadata

Relations/Constraints

A is part of B

C is manifestation of B

Repository
Services

Repository Indexes, Export
in METS, etc.


NFS

URL

Repository

30

Repository Management Access

Access the Repository under DigiTool
3.0
Management Interface :

Utilizing the following URL format:





Using Management tab in Meditor (PC Application):



http://hostname:port/mng

Repository

31

Repository Staff Privileges

Staff members obtain Access Rights for Repository Search
Options and Maintenance under Meditor Staff Privileges:

Staff members that have the access rights in multiple
Administrative Units may be provided with an option to choose
the Unit they wish to work with.

Repository

32

Management Staff Privileges

Only Admin Staff users may be authorized to manage
Repository Storages.

The Admin Staff User should be allowed for all
management sub
-
functions or Repository specific sub
-
functions of REP
00


Repository System Unit.

Repository

33

Repository Search


Repository contains built
-
in search index defined on
the Repository level under
repository_indexing_schema.xml.


All updates in Repository cause search index update.


Indexed data includes control elements as well as
metadata.


Allows harvesting to target specific sets of entities.

Repository

34

Repository Index

Repository

35

Repository Search

The staff user is able to search in all or any of the fields,
defined in the repository index.

Repository

36

Repository Search


Users can perform search across Administrative Units by
checking the Cross Admin Unit check box or in the home
Administrative Unit.

The query results appear on the screen. Users can view the
object’s details, open the object’s stream and navigate
within the object’s manifestations and related objects
using ellipsis button.

Repository

37

Viewing Formatted Digital Entity

Repository

38

Viewing an Object

The delivery system is
activated for the
digital object in
accordance with the
access rights and
delivery rules defined
in the system in the
same manner as for
the viewing an object
in the Resource
Discovery Module.

Users are able to view
the object related
metadata by clicking the
title link.

Repository

39

Viewing Descriptive Metadata

Repository

40

Metadata Search


Staff users can perform metadata search based on a
metadata type using the drop
-
down lists of the
Metadata Search.

Repository

41

Storage of Data Streams

Data streams can be stored remotely (i.e. URL) or
locally inside the Network File System (NFS).

The Repository allows system administrators to
define flexible storage rules that enable them to
better control where the objects are stored within
the file system.

A system administrator can set a storage rule by
which all data streams loaded to the Repository by
administrative unit “A”, which have a “tiff”
extension, are larger than
2
MB, and have a “high”
preservation level will automatically be assigned to
storage group “X” which is then automatically
mapped into a certain location in the NFS.

Repository

42

Storage of Metadata


All object related metadata is stored in “XML boxes”
in the Oracle database. DigiTool stores all metadata
in Unicode (UTF
-
8
).

The Repository is metadata adaptable, which means
that it can be easily configured to store any type of
metadata in XML (e.g. MARC, DC, MIX) or any other
format.

When the metadata is in XML format, the Repository
allows staff to attach XSD files (XML Schema
Definition) to specific metadata types and validate
metadata records of this type loaded to the
Repository.

The Repository allows different digital entities to
share the same metadata record enabling the reuse
and easy management of metadata.

Repository

43

Storage Management


Storage Groups determine the storage type, location
root and folder break.


Storage Rules: determine to which Storage Group a
certain object is directed. Storage rules can be based
on: ingest application, preservation level,
administrative unit, file size and extension, MIME
type or combination of these elements.


Storage Groups and Storage Rules are controlled by
repository administrator.

Repository

44

Creating Storage Groups


The list of storage groups is created using Repository
Management System and holds a list of storages.

Repository

45

Creating Storage Rules


The configurable list of storage rules is based on the list
of the existing Storage Groups.

Repository administrator

can add a new storage rule, edit
or delete the existing one from the Repository Module by
selecting the relevant icon from the left tool bar near the
rule number.

Repository

46

Repository Maintenance


Repository administrator can use this tab to manage
repository administrative or maintenance jobs that can be
viewed when submitted from the “Reports” tab.



Maintenance jobs are designed as two
-
step processes,
allowing an administrator to confirm or cancel submission
after reviewing it’s summary.

Repository

47

Viewing Reports


The reports can be viewed and printed in HTML and XML
formats:

All reports are stored under profile/reports directory of
DigiTool Java tree.

Repository

48

Repository Maintenance

Reload Repository Configuration is another maintenance
procedure available from the main menu. This service
enables staff user to implement changes related to
repository indexing, delivery or storage.

Repository

49

Repository API’s

The interaction between the Repository and other DigiTool
modules (optional third
-
party systems) is handled via
standard Web Services.

The Repository has
3
main API’s:


Single Object API



Supports SOAP (Simple Object Access
Protocol) and RMI (Remote Method Invocation) and calls for
the creation, retrieval, updating and deletion of a single
Digital Entity.


Search API



SOAP and RMI queries which return result sets.


Stream Gate API



Supports HTTP Get/Post requests,
designed for the transfer of data streams in/out of the
Repository.

The Repository is equipped with an access rights module
which verifies that the call to the Repository is made by an
authorized user (usually an application and not a real
person).

Repository

50

Repository APIs

Single Object API

Create, Retrieve,

Update, Delete


Search API

Third Party Application

SOAP (XML over HTTP), RMI

Stream Gate API

HTTP GET/POST


Repository

Repository

51

Delivery System Main Components

DigiTool
3.0
Delivery Module consists of the
following internal components:

Dispatcher

Access Rights and Copyrights Checks

Delivery rules

Viewer Pre
-
processors

Viewers

Repository

52

Delivery System Workflow







Delivery Manager


1
.

Get

Digital

Entity

with

PID

3
.

Call

the

access

rights
.

2
.

Return

full

Digital

Entity

4
.

Return

access

checker

result
.

6
.
Return

the

Pre
-
processor

name

Delivery System Components

Access Rights
Checker


Viewer Pre
-
processor


Delivery Rules


5
.
Look

for

the

appropriate

Pre
-

processor,

based

on

rules

defined

in

a

configuration

table

7
.

Prepare

the

object

for

display

8
.

Return

URL

for

Viewer

Repository

Repository

53

Digital Entity View Flow







User’s


Browser

1
.

User

requests

full

view

Digital

Entity

3
.

User

accepts

the

copyrights

4
.

Redirect

the

user

to

the

appropriate

Viewer

2
.

Display

Copyrights

if

needed






Delivery
Manager


Repository

54

Delivery Dispatcher

The Dispatcher receives requests to display a particular
object.

The Dispatcher is responsible for processing the request
and activating the appropriate viewer or denying the
request.

For the viewing purposes, the Dispatcher retrieves the
following necessary information:

User’s attributes (IP, group, course enrollment, etc.)

Object’s attributes (MIME type, object type, access rights information)

The Dispatcher checks the user attributes against the
object’s access rights information and chooses the most
appropriate viewer for the specific object.

The Dispatcher is fully integrated with the ExLibris PDS
(Patron Directory Services) application.

Repository

55

Delivery Check Routines

Access Rights Check

Delivery Access Rights Check is a mechanism defined
to restrict access to objects in the DigiTool system
based on the patron registration and attributes
information. Access Rights Check is based on the
algorithm and configuration tables.

Copyrights Check

The copyright mechanism of the Delivery system
assures an end user’s commitment to act in
accordance with the copyright statement required for
the object. The copyright statement is presented to
the end user at the final stage of the delivery
process, when an object is required copyright terms.

Repository

56

Delivery Viewers and Pre
-
processors

Viewer Pre
-
processors

DigiTool
3.0
Delivery system provides Viewer Pre
-
processors that are Java class algorithms that
prepare objects for viewing via specific Viewers,
taking into account their technical attributes.

Viewers

The DigiTool
3.0
Delivery system provides the Java
class built
-
in Viewers that are activated by Viewer
Pre
-
processors according to the delivery rules.
Viewers allow the viewing of objects in the most
appropriate way.

Repository

57

Delivery Rules


Delivery rules define the relationship
between the delivery system components
and determine how end users view objects.

Definitions for delivery rules are managed
by the system administrator in the Delivery
Management Interface.


Repository

58

Delivery Rules Management


The configurable list of delivery rules is based on the list
of the existing viewers.

Delivery administrator

can add a new delivery rule, using
“+” icon, edit or delete the existing rule from the system
by selecting the inline “Pencil” or “x” icons from the left
tool bar.

Repository

59

Delivery Rules Attributes

Delivery rules are based on three types of attributes:

General Attributes.

Digital Entity (Object) Attributes.

Request Attributes.

Each attribute is perceived by the system as a
condition for the delivery rule. When more than one
condition exists, they are logically combined with
“AND”.

All attributes are set to “ANY” value by default and
should be changed only when used as conditions.

The list of values must be separated by commas and
must not contain spaces. For example:
jpeg,tif,tiff,gif.

Repository

60

Types of Viewers


Simple Viewer


J
2
K Viewer


Image Viewer


METS Viewer


Sid Viewer


Text Viewer


EAD Viewer


Message Viewer


Webarchive Viewer


Default Viewer

Repository

61

Viewing Objects Side by Side

Repository

62

Viewing Compound Objects
-

METS

Repository

63

Viewing Compound Objects


EAD


Repository

64

www.exlibrisgroup.com

Thank you!