ATTEMPT TO BUILD AN RDF MODEL

snufflevoicelessInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 4 χρόνια και 2 μήνες)

82 εμφανίσεις



WHAT I HAVE FOUND OUT FROM AN
ATTEMPT TO BUILD AN RDF MODEL
OF FRBR
-
IZED CATALOGING RULES

BY

Martha M. Yee

Cataloging Supervisor

UCLA Film & Television Archive

myee@ucla.edu

http://myee.bol.ucla.edu

INTRODUCTION

1. Some definitions

2. The vision

3. The experiment

4. Some problems?

SOME DEFINITIONS

The semantic web: a way to represent
knowledge; a knowledge
representation language that provides
ways of expressing meaning that are
amenable to computation; a means of
constructing maps of domains of
knowledge consisting of class and
property axioms with a formal
semantics

SOME DEFINITIONS

The semantic web

The web as huge shared database

Hyperdata replacing hypertext

SOME DEFINITIONS

RDF (Resource Description Framework):
a family of specifications for methods
of modeling information that
underpins the semantic web through a
variety of syntax formats

SOME DEFINITIONS

RDF (Resource Description Framework)

Data encoded as:

the subject of a triple (New York)

the predicate of a triple (has the postal
abbreviation)

the object of a triple (NY)

SOME DEFINITIONS

RDF (Resource Description Framework)

XML is commonly used to express RDF,
but is not a necessity

SOME DEFINITIONS

RDF (Resource Description Framework)

RDFS or RDF Schema is an extensible
knowledge representation language
providing basic elements for the
description of ontologies, AKA RDF
vocabularies

SOME DEFINITIONS

RDF (Resource Description Framework)

RDFS data encoded as:

Class (= Entity); the subject of a triple

Class relationship (semantic linkage); the
predicate of a triple

Class property (= Attribute); the object of
a triple

SOME DEFINITIONS

RDF (Resource Description Framework)

OWL (Web Ontology Language): a family
of knowledge representation
languages for authoring ontologies
compatible with RDF

SOME DEFINITIONS

RDF (Resource Description Framework)

SKOS (Simple Knowledge Organisation
Systems): a family of formal languages built
upon RDF and designed for representation
of thesauri, classification schemes,
taxonomies or subject
-
heading systems

THE VISION

The Web as shared database instead of
shared document store


THE VISION

Instead of records, URI’s (Uniform
Resource Identifiers) for entities:


URI for work containing all work
attributes, including preferred name,
variant names, but also much more
data about work than our current
authority records do

THE VISION

URI for expression, containing all
expression attributes, and linked back
to work


THE VISION

URI for manifestation, containing all
manifestation attributes, and linked
back to expression

THE VISION

URI’s for persons, corporate bodies,
places, subjects, etc. , including
preferred name, variant names, but
also much more data about person,
corporate body, place or subject
(concept or object) than our current
authority records do

THE VISION

If any data about a particular entity
needed to be changed, it would be
changed once at the URI and
immediately accessible to all users,
libraries and library staff by means of
links down to local data such as
circulation, acquisitions, and binding
data

THE EXPERIMENT

A set of cataloging rules that are more FRBR
-
ized than RDA in that they more clearly
differentiate between:

data applying to the expression

vs.

data applying to the manifestation

THE EXPERIMENT

You can find these rules at:


http://myee.bol.ucla.edu

THE EXPERIMENT

I am now in the process of trying to
model my cataloging rules in the form
of an RDF/RDFS/OWL/SKOS model

THE EXPERIMENT

I don’t seriously expect anyone to adopt
these rules!


THE EXPERIMENT

My research questions:

1. Is it possible for catalogers to tell in all
cases whether a piece of data pertains to
the expression or the manifestation?



THE EXPERIMENT

My research questions:

2. Is it possible to fit our data into
RDF/RDFS/OWL/SKOS?


THE EXPERIMENT

My research questions:

3. If it is, is it possible to use that data to design
indexes and displays that meet the objectives of
the catalog (providing an efficient instrument to
allow a user to find a particular work of which
the author and title are known, a particular
expression of a work, all of the works of an
author, all of the works in a given genre or
form, or all of the works on a particular
subject)?


THE EXPERIMENT

You can find my RDF/RDFS/OWL/SKOS
model at:


http://myee.bol.ucla.edu

SOME PROBLEMS?

Can we do what we need to do within the
context of the semantic web?

SOME PROBLEMS?

More granularity, or data parsing by
catalogers

Those familiar with RDA, FRBR, and
FRAD development will recognize
that much of that development is
directed at increasing granularity in
cataloger
-
produced data

SOME PROBLEMS?

Granularity issues:

More structure and more granularity makes
possible more powerful indexing and
more sophisticated display,

but is more complex and expensive to apply
and less likely to be adopted in a
standard fashion across all communities,
i.e. less likely to produce interoperable
data.


SOME PROBLEMS?

Granularity issues:

Currently, we demarcate a surname from a
forename by putting the surname first,
followed by a comma and than the forename.

Even that amount of granularity can sometimes
pose a problem for a cataloger who does not
necessarily know which part of the name is
"surname" and which part is "forename" in a
culture unfamiliar to the cataloger.


SOME PROBLEMS?

Granularity issues:

Currently we do not collect information
about gender.

If we were to increase the granularity of our
data in order to gather that information,
we would encounter situations in which
the cataloger would not necessarily know
if a given creator was a female or a male
or of some other sexual orientation.


SOME PROBLEMS?

Granularity issues:

Currently, if we are adding a birth and/or death
date, whatever dates we use are all together in a
$d subfield, without any separate coding to
indicate which date is birthdate and which is
death date (although an occasional b. or d. will
tell us this kind of information).

We could certainly provide more granularity for
dates, but that would make the MARC format
just that much more complex and difficult to
learn.

SOME PROBLEMS?

Granularity issues:

People who dislike the MARC format already
argue that it is too granular and therefore
requires too much of a learning curve before
people encoding data using MARC can learn to
use it.

How much of the granularity already in MARC is
used either in existing records, or even if
present, is used in indexing and display
software?

SOME PROBLEMS?

Granularity issues:

Granularity costs money and libraries and
archives are already starving for resources.

Granularity can only be provided by people,
and people are expensive.

One frightening thing about the Internet is that
it seems to be based on an economy of free
intellectual labor. Only the programmers
get paid. Everyone else is a volunteer.

SOME PROBLEMS?

Other issues:

Potentially every piece of data describing a
particular entity could be represented by a
URI leading out to a SKOS list of data
values. Is the Internet really fast enough to
assemble a record from hundreds of URI’s
in a reasonable amount of time?

SOME PROBLEMS?

If the work is represented by a URI and
the author of the work is represented
by a linked URI,

how would it be possible to guarantee
success for a user that searched on

a variant of the author name

in combination with a variant of the title?

SOME PROBLEMS?

There is a cross reference from FBI to United
States. Federal Bureau of Investigation,
but not from FBI Counterterrorism
Division to United States. Federal Bureau
of Investigation. Counterterrorism
Division. For that reason, a search in any
OPAC name index for FBI
Counterterrorism Division will fail.


SOME PROBLEMS?

The solution to this problem is to
define a transitive or inheritance
relationship between a corporate
body and its corporate
subdivisions.

SOME PROBLEMS?

Unfortunately, RDF seems to resist
hierarchical relationship.

It assumes that you just need to
connect everything to everything
else without needing to express
any hierarchy.

SOME PROBLEMS?

This is bad news for bibliographic
data which is rife with
hierarchical relationships.

Hierarchy is one of our major tools
for expressing meaning to our
users.

SOME PROBLEMS?

Can all bibliographic data be reduced to
either a class or a property with a finite
list of values? Another way to put this is
to ask if all that catalogers do could be
reduced to a set of pull
-
down menus?


SOME PROBLEMS?

Is there an assumption on part of semantic
web developers that a given type of data,
such as publisher name, would be
EITHER “literal” (i.e. transcribed or
composed) OR represented by a URI
(controlled)?

SOME PROBLEMS?

Cataloging is rooted in humanistic practices
that require careful recording of
evidence. There will always be a value
in distinguishing (and labelling as such)
the following types of data:

copied as is from an artifact (transcribed)

supplied by a cataloger

categorized by a cataloger (controlled)

SOME PROBLEMS?

I notice that Tim Berners
-
Lee, the father of
the Internet and the Semantic Web
himself, emphasizes the importance of
recording not just data, but where the
data came from, for the sake of
authenticity (see February 7, 2008
interview of Sir Tim Berners
-
Lee by Talis
http://talis
-
podcasts.s3.amazonaws.com/twt20080207
_TimBL.html)

SOME PROBLEMS?

For many data elements, therefore, it will be
important to be able to record BOTH a
literal (transcribed and/or composed
form) AND a URI (controlled form)


Is this a problem in RDF?