Digital preservtation: identifiers and rights aspects - DOIs

leathermumpsimusΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 5 μέρες)

88 εμφανίσεις

Preparing for Digital Preservation

What is being preserved:

Identification and Rights Management issues


Norman Paskin

International DOI Foundation


doi>


Preservation Management of Digital Materials


The Handbook



N. Beagrie/M. Jones/DPC


www.dpconline.org/graphics/handbook/


3.4 Rights management


4.4 Metadata and Documentation


4.5 Access



Digital preservation: an introduction to the standards issues
surrounding the deposit of non
-
print publications


M Bide/E J Potter/A Watkinson Sept 1999


www.bic.org.uk/digpres.doc

Recommended background material

doi>

1. Identifiers


1.1 Identifiers and metadata


1.2 Interoperability


1.3 Different meanings of “identifier”


1.4 Persistence



2.
“Keep a copy”
-

?


3.
Rights


3.1 Accessing “definitive copy”


3.2 Rights framework

Outline of presentation

doi>


An identifier = an unambiguous string denoting an
entity


1.1 Identifiers and metadata

doi>

0550

10234 5


An item of metadata =
“a relationship that
someone claims to exist between two entities”

(indecs), each of which may have an identifier:


1.1 Identifiers and metadata

doi>

0550

10234 5

[BookData says]


the cover of this book is red

Pantone

4567


To be useful, an identifier requires some
metadata:

1.1 Identifiers and metadata

doi>

0550

10234 5

[Books in print says]


The title of this identified book is….

Chambers

Dictionary



entity:

something that is identified


“Nothing exists until is identified”


Entities may include:


Abstractions (red); technical means (MP3 player); labels
(title); things (book) etc.




ontology:

structured relationships between entities


“an explicit formal specification of how to represent the
entities that are assumed to exist in some area of interest
and the relationships that hold among them”


(such as: “page” is component of “book”)


Examples: indecs framework; ONIX; FRBR

1.1 Identifiers and metadata

doi>


In a distributed environment, there is no one central
physical archive



A distributed virtual archive requires that all the
players and components interoperate


1.2 Interoperability

doi>


Across media


books, serials, audio, audiovisual, software,
abstract works, visual material, etc


Across functions


cataloguing, discovery, workflow, rights
management, archiving


Across levels of metadata


Simple, complex


Across linguistic and semantic barriers


Across territorial barriers


Across technology platforms


1.2 Interoperability

doi>


Preservation:



"How do we interoperate with the future?“



Preservation issues (identifiers, metadata, rights) are
the same as any other interoperability problem


1.2 Interoperability

do
i
>


[1]
Labels
: the output of “numbering schemes”


1.3 Meanings of “identifier”

doi>

doi>



ISBN:
ISO 2108:1992

International Standard Book Numbering



ISSN:
ISO 3297:1998

International Standard Serial Number




ISRC:
ISO 3901:2001

International Standard Recording Code




ISRN:
ISO 10444:1997

International Standard Technical Report Number



ISMN:
ISO 10957:1993

International Standard Music Number



ISWC:
ISO 15707:2001

International Standard Musical Work Code



ISAN:
Draft ISO 15706

International Standard Audiovisual Number



V
-
ISAN:
Draft ISO 20925

Version Identifier for audiovisual works



ISTC:
Draft ISO 21047

International Standard Text Code



PII:
Publisher Item Identifier




etc



1.3 Meanings of “identifier”

doi>

doi>


[2]
“infrastructure specifications”:

specifying how to
make labels actionable


Do not generate a label, but if you have one, specify how
to use it in some particular context



URN: Uniform Resource name



URI: Uniform [Universal] Resource Identifier



PURL: Persistent Uniform Resource Locator


e.g. ISBN as URN

Note same concept in also in other non
-
digital contexts

e.g. ISBN as EAN (978….) bar code or RFID



1.3 Meanings of “identifier”

doi>

doi>


[3]
“implemented systems”


Implement labels, through actionable specification, in a
managed way



EAN/UPC: physical product codes :



implement ISO bar codes, RFIDs


in the supply chain



DOI: digital object identifiers :



implement URN/URIs in


intellectual property (+metadata, policy)



doi>

“For use on the Internet, an ISBN
label
can become a
URN
specification
; an ISBN label can be
incorporated into a DOI, which is an
implemented
identifier system

following the URI
specification
.”


Is clearer than


“an ISBN
identifier

can become a URN
identifier
; an
ISBN
identifier

can be incorporated into a Digital
Object
identifier
, which is an implemented URI
identifier
” (?)


do
i
>

doi>

doi>

1.3 Meanings of “identifier”

A particular use of the word may be

a mix of meanings [1], [2]& [3]

doi>

1.4 Persistence

Content

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

Printed identifiers,

bookmarks, etc

doi>

404

File not found

Content

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

do
i
>

"Linkrot": recent estimates 16% in 6 months

doi>

doi>

DOI

directory

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

Content

Content

Assigner

DOI

directory

DOI

directory

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

do
i
>

doi>

doi>

Redirection

(resolution)

e.g. DOI

Assigner

Content

DOI

directory

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

Response Page



purchase content


view free excerpt


get related items


get archive copy


request permissions

Assigner

do
i
>

More than just "locate"

doi>

doi>

Archive

Response Page



purchase content


view free excerpt


get related items


get archive copy


request permissions

Assigner

DOI

directory


purchase content

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

DOI

do
i
>

doi>

doi>


Persistent identifier


Resolution (redirection)


Persistence of the associated metadata


Persistence of the resolution system


Persistence of the identified copy


digital preservation: migration, emulation, encapsulation



Persistence is a matter of social infrastructure


Technology can help but not guarantee

1.4 Persistence

doi>


Distinguish two issues:


1.
The technical specification of “what is”
a URN and a URI etc.


identifiers in sense [2]


2. What this means for practical
implementation


identifiers in sense [3]


Internet: DOI, URN, URL, PURL

do
i
>

doi>

doi>


See DOI Handbook


4.9 DOI as a URI


4.10 DOI as a URN


6.10 DOI and PURL


Aim:
persistent across time and unique across
network space; useful and implemented


PURLs are tied to http and are single redirect etc.


URI/URNs are intended to be abstract names


independent of protocols (approx)


DOIs
are

URIs
(formal specification)


DOIs
are
URNs
(in effect)



URN and URI proponents disagree


(& there are other proposed specs e.g. ARK)


Internet persistent id specs

do
i
>

doi>

doi>

doi>

URN

URL

URI

Resolution (N2L)

http:// www.w3.org/addressing

(But largely from IETF, W3C did not see need for URN)

urn:

ftp:

gopher:

http:

doi>

Internet persistent id specs


IETF formal spec “URI scheme for Digital Object
identifier”


Paskin, Norman; Neylon, Eamonn; Hammond, Tony; Sun, Sam;
Uniform Resource Identifier (URI) scheme for Digital Object Identifiers
(DOIs);
An abstract specification (uri:doi:)


Would be doi: (like tel:)
[uri: is not part of the uri spec, unlike urn:]


May be a pure name or de
-
referenced by any service



The namespace provides its own mechanism
(“Bootstrapping”)



On its own, it’s just a specification!


Requires code distribution for any implementation

DOI as URI

do
i
>

doi>

doi>


URN is less clear:


Higher level situation muddy


Set of IETF drafts that define URN


Set of registered namespaces (e.g. isbn)


DOI could be but isn’t
-

no advantage


Unlike URI, provides a specific DNS
-
based middle layer (RDS) to
find the appropriate resolution service


Scalability and security questioned; and:


Little or no resolution
implementation


urn:isbn:123456789

can be defined ; but what does it do
over and above

isbn:123456789
?


neither have a readily available, well known, global,
resolution


A DOI is more than URN or URI


Adds Policy, business rules, business model


Adds Metadata specifications (cf ISBN, EAN, Visa)


DOI as URN

do
i
>

doi>

doi>

1. Identifiers


1.1 Identifiers and metadata


1.2 Interoperability


1.3 Different meanings of “identifier”


1.4 Persistence



2.
“Keep a copy”
-

?


3.
Rights


3.1 Accessing “definitive copy”


3.2 Rights framework

Outline of presentation

doi>


Digital preservation is “keeping a copy”


What is it you are archiving? (or managing, or counting)


What’s a copy? Something that is “
the same as



Is A
the same as

B?


Consider a photocopy….
text; author; work
;
paper; spatial
location….

2. “Keep a copy”

do
i
>

doi>

doi>

B

A

….etc


“Is A the same as B?”

is meaningless


Can only say
“Is A the same as B for the purpose of…?”


“the same”

for some is
“two different things”

for others


Purpose is defined by attributes


“Nothing exists until is identified”


…and its relevant attributes identified


Structured metadata is needed (e.g. ONIX for digital
preservation?)

2. “Keep a copy”

do
i
>

doi>

doi>

A

B


“How can an identifier be used to locate a specific
local copy, which may have different access
rights?” [see www.doi.org, FAQ 26]



Resolution of identifiers to global services.


Contextualization of requests to those services to
local requirements.


split this into separate global and subsequent delegated
local resolution steps e.g. OpenURL


a globally
-
maintained database is clearly the wrong place
to hold information on every local collection.


("Linking to the Appropriate Copy: Report of a DOI
-
Based Prototype"; (O. Beit
-
Arie, et. al.) D
-
Lib Magazine,
www.dlib.org September 2001)


A
definitive
archive copy
could

be separately
identified (with its own DOI)


a matter of policy


Functional granularity

3.1

Accessing the definitive copy

do
i
>

doi>

doi>


ISO/IEC MPEG
-
21 as exemplar


Digital item
: a structured digital object with a
standard representation, identification and
metadata


The fundamental unit of distribution & transaction
in the MPEG
-
21 framework



Maps to “
Digital Object”

(DOI, Digital Object
Architecture) or
“Resource”

(IETF)


"Digital objects provide a means of organizing and
identifying content for purposes of storage,
access or distribution… …metadata may include
restrictions on access to digital objects, notices
of ownership, and licensing agreements…"

(www.xiwt.org/documents/ManagAccess.html)



3.2 Rights framework

doi>

Vocabulary
layer

Rights metadata

Data Dictionary

Use

Enforcement of
rights &
permissions

DRM

Expression
layer

Rights Expression
Language


Machine
-
capable
interpretation of
rights: XRML etc

Technology
Platform

Application
layer

Rendering,
environment etc.

Metadata set 1

Metadata set 2

3.2 Rights framework


Standards infrastructure must accommodate many
different components (MPEG 21 standard is many parts)



But a structured digital object with a standard
representation, identification and metadata

is "The
fundamental unit”



Must be interoperable with existing metadata
standards
-

e.g. ONIX, SMPTE so need
Dictionaries



MPEG 21

Rights Data Dictionary & Rights Expression
Language


Purpose:
"To achieve the goal of expressing rights
for all Users of MPEG
-
21’s Digital Items"

doi>

3.2 Rights framework

Pieces of "rights metadata" used


in each semantic structure

Describing rights using (meta)data


Primary rights events (claims, deals) are

described using pieces of data:

Rights Statement (“claim”)

[
party
] owns [
right
] in [
creation
] in [
time
] and [
place
]

Rights Agreement (“deal”)

[
party
] agreed with [
party
] in [
time
] and [
place
] that
[
event
]

doi>

Describing rights using (meta)data


Rights Statement (“claim”)

[
party
] owns [
right
] in [
creation
] in [
time
] and [
place
]

Rights Agreement (“deal”)

[
party
] agreed with [
party
] in [
time
] and [
place
] that
[
event
]

Primary rights events (claims, deals) are

described using pieces of data:

Creations typically have standard identifiers,

which may have associated structured data,

or which may act as keys to get this data

Other pieces of data also need

standard identifiers (time, party..)

doi>

What is "rights metadata"?

A mix of data from
many sources:


Rights “events”

Statements,
agreements,
transfers,
permissions,
prohibitions,
requirements,
assertions,

approvals

doi>

A mix of data from
many sources:

Rights “events”

Descriptive
metadata

Creations,

Creation types,
contributor roles,
user roles,

tools,

classifications,
measures

What is “rights metadata”?

doi>

Rights,

persons,

intellectual property

What is “rights metadata”?

A mix of data from
many sources:

Rights “events”

Descriptive
metadata

Legal metadata

doi>

A mix of data from
many sources:

Rights “events”

Descriptive
metadata

Legal metadata


Financial metadata


Terms,

conventions

What is “rights metadata”?

These sets of “rights metadata" are
standardized and maintained in different
places.

doi>

This

mix of data from many sources
is used in many
different places by different people in chains of
rights events:

Distributed rights management

agreement

transfer

statement

agreement

permission

prohibition

permission

assertion

agreement

requirement

etc

[
party
] can [
verb
] [
amount
] to [
creation
] at [
time
] in [
place
].

Each entity can be expanded to reveal more data


doi>

agreement

transfer

statement

agreement

permission

prohibition

permission

assertion

agreement

requirement

etc

Each of these is
an information object


an
entity
-

which may need to
link

to or

use

information objects in
other
databases.

The information used by each must
therefore be standardised/interoperable

Distributed rights management

doi>


Is there a way of getting to this
"interoperation of data from many
sources"?



Yes: work already done which shows how





doi>

3.2 Rights framework


In
teroperability of
D
ata in
E
-
C
ommerce
S
ystems


Produced principles for
structured metadata

and
basis for a data dictionary for
interoperability



Principles used by DOI, ONIX, etc


Applicable to other structured approaches e.g.
SMPTE (and creates means of interoperability
with them)


Now extended to
rights transactions:


<in
d
ecs>2 rdd Consortium (includes IDF)


Accepted as basis of MPEG
-
21 Rights Data Dictionary


indecs (www.indecs.org)

doi>


A data dictionary is a place where the process of
semantics definitions

meets
technology


MPEG standards have traditionally been about
engineering solutions



MPEG
-
21 is a multimedia and a lifecycle framework: its
rights terminology
does not exist in a vacuum


Interacts with a large number of existing and
developing schemes and systems


The number of terms involved is likely to grow
steadily and significantly




MPEG
-
21 is
taking the lead

in establishing an RDD; it is
likely to be widely supported if it is flexible and
interoperable

The MPEG
-
21 RDD

doi>


“Rights” metadata describes what people can (or can’t)
do

with assets, and when, where, how and with what
they can do it.



“Descriptive” metadata describes what people
did

with
assets: the same thing, but in the past.
The majority of
terms are common.



Any
descriptive

term may be relevant to the conditions
of an agreement



When new works are created through derivation,
aggregation or copying,
new

descriptions

are needed
which rely on both descriptive & rights metadata

Rights & description are interdependent (1)

doi>


Ownership
changes

and changes of law or jurisdiction
often require querying of descriptive metadata for
implementation in systems



“Requirements” can be
dependent on description

in
complex (and unfamiliar) ways



Terms from
descriptive schemes

such as ONIX, Mi3P,
DOI
-
NS, PRISM, MPEG7 Descriptor Schemes, DC and
SCORM (and many others) will need to be integrated
with any effective RDD


Rights & description are interdependent (2)

doi>


Many content metadata schemes are in use and
development and
there will be many more



These all impact on rights descriptions. Users will be
reluctant (or unable)

to adopt separate terms for
“rights” descriptions


automated interoperability into and out of RDD
terms needed


Users need to describe “
non
-
digital
” rights in tandem
with digital



The meaning of terms in external schemes must be
fully
mapped

to RDD terms so that they form a part of the
available data dictionary and enable users to automate
their participation

Relationship with other metadata schemes

doi>


T
o provide a method for generating a set of clear,
consistent, structured and integrated
terms

and
definitions
, to the required level of granularity, for
an

MPEG Rights Data Dictionary



To
provide a comprehensive methodology for the
interoperability

of terms from different schemes and
systems used in the management of rights and
permissions

through mapping.


Will be used by DOI Application Profiles


DOIs can deliver this required interoperability



To
describe

but in no way
prescribe

how rights and
permissions operate



To provide a framework for future
governance.

<indecs> Data Dictionary

doi>

1. Identifiers


1.1 Identifiers and metadata


1.2 Interoperability


1.3 Different meanings of “identifier”


1.4 Persistence



2.
“Keep a copy”
-

?


3.
Rights


3.1 Accessing “definitive copy”


3.2 Rights framework

Outline of presentation

doi>


“DRM Technology: Identification and Metadata”


Norman Paskin


In: Digital Rights Management: Technical, Economic, Juridical and
Political Aspects (ed. Becker et al)


Springer Lecture Notes in Computer Science series


In press



"Towards a Rights Data Dictionary
-

Identifiers and Semantics at
work on the net".


Norman Paskin


imi insights, June 2002



http://www.epsltd.com/IMI/IMI.htm (subscription access)




Copies available from author on request (n.paskin@doi.org)

Additional material

doi>

doi>

Norman Paskin, International DOI Foundation

n.paskin@doi.org