Integrated Access to Cultural Heritage Resources through Representation and Alignment of Controlled Vocabularies

drillchinchillaInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

148 εμφανίσεις

Integrated Access to Cultural Heritage Resources through
Representation and Alignment of Controlled Vocabularies




Antoine Isaac

Antoine Isaac

(
http://www.few.vu.nl/~aisaac/
)

works as

a postdoc
at the Vrije
Universiteit in Amsterdam and the
Koninklijke Bi
bliotheek

in the Hague
, in the context
of

the STITCH and TELplus projects.
His research interests include

different aspects
of the use of Semantic Web
languages and technologies

within the Cultural Heritage
field
, focusing on the representation and the int
eroperability of collections and their
vocabularies.

Address
: Antoine Isaac, Vrije Universiteit Amsterdam, Department of Computer
Science, De Boelelaan 1081a, 1081 HV Amsterdam
,
The Netherlands
.

Email: aisaac@
few.vu.nl


Abstract:


Purpose of this paper

To
show how Semantic Web techniques can help addressing semantic interoperability issues
in the broad cultural heritage domain allowing users an integrated and seamless access to
heterogeneous collections.


Design/methodology/approach

This paper presents the

heterogeneity problems to be solved. It introduces
S
emantic Web
techniques that can help solving
them
, focusing on the representation of controlled
vocabularies and their semantic alignment. It gives pointers to some previous projects and
experiments that

have tried to address the problems discussed.


Findings

Semantic Web research provides practical technical and methodological approaches to
tackle the different issues. Two contributions of interest are the Simple Knowledge
Organization System (SKOS) mode
l and automatic vocabulary alignment methods and tools.
These contributions were demonstrated to be usable for enabling semantic search and
navigation across collections.


Originality/value

This article provides a general and practical introduction to rel
evant Semantic Web
techniques. It is of specific value for the practitioners in the Cultural Heritage and Digital
Libraries domain who are interested in applying these methods in
practice
.


Research limitations/implications

Our research aims at choreogra
phing different representation and alignment methods for
solving interoperability problems in the context of controlled subject vocabularies.
Given the
variety and technical richness of current research in th
e Semantic Web

field, it is impossible
to provid
e
an

in
-
depth account

or an exhaustive
list of
references
. Every aspect is however
given one or several
pointers for further reading
.



Keywords
Semantic Web,
integrated access,
thesaurus alignment, SKOS, semantic
interoperability




1. Introduction: the S
emantic Interoperability Problem


In the digital age, cultural heritage (CH) institutions have the opportunity, and face
the challenge, to use the World Wide Web to make accessible the
digital artefacts
of
their collec
tions, together with their meta
data. W
eb
-
based access to digitized images
and their descriptions, at anytime, from anywhere, lowers the barriers for access to
information resources
. Once there is digital access to the content of museums,
libraries and archives, there is also the tremendous opp
ortunity to merge collections
from different locations into virtual, federated institutions, thus increasing access
across collections and institutional boundaries.


In stark contrast to the vast amount of existing digital resources on the World Wide
Web,

cultural heritage assets from libraries, museums, and archives are very well
described. Over many generations, librarians, curators, archivists have developed
knowledge organization systems

(KOSs)
, among which

controlled
vocabularies such
as
thesauri, cla
ssification schemes and ontologies
,

to organi
z
e and manage their
collections. The organi
z
ation
and access to

cultural heritage along human capacity to
deal with information and knowledge is a valuable human achievement in itself. It
helps us to grasp our p
ast and present, and this understanding must be exploited to
facilitate its access at a grander scale.


The move toward cross
-
institutional CH portals is well under way as, for instance,
The European Library
1

and the Memory of the Netherlands
2

testify. In

this paper, we
describe how CH expertise can be combined with knowledge and technology from
the Semantic Web (SW) community to deliver portals that provide a seamless and
unified access to different collections via semantic search and navigation.


Fig. 1
illustrates

the problem that needs to be solved

in a networked environment
.
Consider

two collections, each of which is indexed by its dedicated knowledge
organization system. Instead of using one single conceptual vocabulary for querying
or browsing the ob
jects of both collections

simultaneously
,

user
s are expected and
required to use the terminology of the first KOS to identify objects of the first
collection, and the second KOS to identify those of the second collection.



Figure 1
Semantic heterogeneit
y hampers collection access




1

See http://www.theeuropeanlibrary.org

2

See http://www.geheugenvannederland.nl


We say that
these two

knowledge organisation systems are not
interoperable at the
semantic level
. In the given example, when searching for objects showing
a

M
adonna” one will only retrieve objects that were indexed using this
specific subject
description (the
statue

in the upper right), and will not find the
manuscript illumination

(in the lower right) that was indexed as “Virgin Mary”, which is clearly a conceptually
similar subject description, but stems from another controll
ed vocabulary.


N
ot tak
ing care of

the semantic heterogeneity of their respective
KOSs

when merging
collections

clearly hampers the ease of accessibility
.
T
he burden of search
is indeed
transferred
to users who then need to perform two well
-
formulated que
ries (using the
respective correct terminology) to obtain the desired objects from the two collections.


Two heterogeneity problems must be solved to enhance the interoperability of
controlled vocabularies and, hence, of the systems and collections that us
e them:


-

Representation heterogeneity

: vocabularies often come in different formats:
some will be encoded in XML while others will come as plain text
.

Beyond, the
models guiding their design might not be directly compatible, for instance
,

because they mir
ror different general information needs

(
e.g.,
thesauri contain
“terms” while classification schemes contain

classes
”)
, and different KOS
might have different kinds of notes and labels attached to conceptual entities.


-

Conceptual heterogeneity

:
any two
v
ocabularies
will usually
contain concepts
that have
identical or
similar meanings
but different labels or names (
e.g.,
like
“Virgin Mary” and “Madonna”
)
.
Also, there will be concepts that are more
general

than others

(
e.g.,
like “Mother” and “Virgin Mary”)
.
Such similarity
and
subsumption
links have to be determined and exploited so that
an integrated

system can provide users with seamless access to
joint
content described by
several vocabularies.


In this paper we

show how these
two
problems can be addres
sed using techniques
that are currently being investigated in the Semantic Web research domain. In
section 2 we describe the basic elements of the Semantic Web infrastructure, and
illustrate how
the Simple Knowledge Organization Systems

(
SKOS
)

standard
mod
el
can be used to represent different vocabularies
(KOSs
) homogeneously. In section 3,
we show how the representation of the different vocabularies, then
commonly
represented in the SKOS format, can be
semantically aligned

to enable
a
semantic
integration
of different collections.
Finally, in

section 4, we
demonstrate how we
solved a real
-
life
problem with a combination of SW techniques, and briefly describe
the resulting prototype.



2. Semantic Web techniques and controlled vocabulary representation


The

Semantic Web (Berners
-
Lee, Hendler & Lassila, 2001) is a proposed extension
of the existing Web, where information found on the web is augmented with machine
-
accessible knowledge
3
. The basic building blocks of the Semantic Web, as



3

The following is a simplified introduction to the Semantic

Web. For further detail, the reader is
encouraged to read the Semantic Web Primer (Antoniou & Harmelen, 2004).

introduced by the Resour
ce Description Format (RDF)
4
, are
resources

which denote
any element that can be identified on (or even outside) the
W
eb. These resources
are described by three
-
part
statements

that link them together. Each statement has
a
subject

resource which is linked

to an
object

resource via a
property

resource.
Together, several such
triplets

form a graph, such as the one represented in Fig. 2
5
.
These graphs can contain:


-

factual knowledge:

the third paragraph of the described document is about
“Amsterdam”; the type

of the described document is “Article”; and the selected
paragraph “par3” is part of a (larger) file called “file1”.

-

ontological knowledge:

the Semantic Web is concerned about the way
resources can be grouped in conceptual
classes
. These classes are
intro
duced in
ontologies

that contain formally expressed knowledge about
them. Here,

Article


is a class more specific than (or a
subclass of
)

Document

.





Figure 2
A Semantic Web RDF graph
.
6


The information contained in ontologies is important, since it
provides material for
automated reasoning on the resources which populate the classes. For example,
from the information found in Figure 2 for

file1

,

Article


and

Document

, a
n

automated
reasoning engine can infer that

file1


is also an instance of th
e

Document

class, which will
yield

more

answer
s

for

queries

contain
ing

“Document”.


It should be noted

that the RDF framework is designed to allow different sources of
knowledge to co
-
exist with each other, inhabiting the same space.
This means that

Sema
ntic Web data can merge and operate with resources coming from different
information spaces. In our example, the objects and links in figure 2 come from



4

See http://www.w3.org/RDF.

5

Figure 2 is an abstract representation of an RDF graph. Such a graph will be usually serialized in the
form of an

XML file, according to the RDF/XML syntax specified by the World Wide Web Consortium
(W3C).

6

Nodes in the graph are RDF resources; labelled edges represent assertions of a property between the linked elements. “rdf:”
namespace stands for “http://www.w3.o
rg/1999/02/22
-
rdf
-
syntax
-
ns#”, “rdfs:” for “http://www.w3.org/2000/01/rdf
-
schema#”,
“myVoc1:” for “http://example.org/voc1#”, and “myVoc2:” for “http://example.org/voc2#”.

different
namespaces
, either user
-
defined (
myVoc1:
,
myVoc2:
) or predefined
(
rdf:
). The resource “Amste
rdam” in myVoc2: may indeed refer to the capital of
The Netherlands (as the RDF graphs in which it occurs would show), while some
other resource with the same name, but from a different vocabulary space may refer
to a city in the state of New York, USA. In

any case, both resources stem from
different name spaces and can both inhabit different contexts further defining and
constraining their intended meaning.


RDF

triples


are the basic building blocks for translating KOS into a homogeneous
format.
Also, i
n order to

mirror a KOS’
modelling

elements (
e.g.
, the “broader than”,
or “narrower than” relation types of thesauri), additional constructs are necessary.
RDF
-
Schema (Brickley & Guha, 2004), in short RDF
-
S, is a simple representation
language that allows
users to define their models, introducing different types for RDF
resources and links. One can also express, for instance, that the source and target of
a relation are of a specific type,
e.g.,

that the relation type “has painted” requires a
subject of typ
e “painter” (or “artist”), and an object of type “painting” (or “drawing

).
The current standard web ontology language is called OWL (McGuinness &
Harmelen, 2004)
. This
formal language

is
more expressive than RDF
-
S
, allow
ing
users
to define a variety of di
fferent properties of classes and relations between
them.
A m
ore detailed

discussion of OWL is
beyond

the scope of this paper.


To support experts in converting their KOSs into the RDF
-
based formats, but also to
facilitate the future exchange of such form
ats, the World Wide Web Consortium
(W3C) has initiated the development of
SKOS
7
,
a standard model that allows CH
practitioners (and other terminologists) to homogenously represent the basic features
of

knowledge organization systems
.

SKOS introduces a set
of constructs for RDF
,
which

mainly allow for the description of
concepts

and
concept schemes

(Miles &
Brickley, 2005).



Concept description


SKOS has chosen a concept
-
based approach for the representation of controlled
vocabularies. As opposed to a term
-
based approach, where terms from natural
language are the first
-
order elements of a KOS, SKOS describes abstract concepts
that may have a different materialization in language (lexicalizations). SKOS
introduces a special construct
skos:Concept
8

to properl
y characterize the (web)
resources that denote such KOS elements. To further specify these conceptual
resources, SKOS features:


-

Labelling properties
, e.g.
skos:prefLabel

and
skos:altLabel
, to link a
concept to the terms that represent it in language. The
prefLabel

value shall
be a non
-
ambiguous term that uniquely identifies the concept, and can be
used as a descriptor in an indexing system. The term
altLabel

is used to
introduce alternative entries


synonyms, abbreviations
etc
. SKOS allows
concepts to be
linked to
prefLabel
s and
altLabel
s in different languages.

SKOS

concept
s

can thus be used seamlessly in multilingual environments.




7

SKOS stands for Simple Knowledge Organisation System. It is currently under scrut
iny by the W3C
Semantic Web Deployment Working Group and is planned to be published as a W3C Proposed
Recommendation in 2008. See http://www.w3.org/2004/02/skos.

8

In the following “skos:" stands for http://www.w3.org/2004/02/skos/core#.

-

Semantic properties

are used to represent the structural relationships between
concepts, which are usually at the core of con
trolled vocabularies like thesauri.
The construct
skos:broader

denotes the generalization link (BT in standard
thesauri), while
skos:narrower denotes
its reciprocal link

(NT)
, and
skos:related

the associative relationship (RT).

-

Documentation properties
. O
ften, informal documentation plays an important
role in a KOS. SKOS introduces explanatory notes


skos:scopeNote
,
skos:definition
,
skos:example



and management notes


skos:changeNote
,
skos:historyNote

etc
.


Concept scheme description


A KOS as a whole

also has to be represented and described. SKOS coins a
skos:ConceptScheme

construct for this. It also introduces specific properties to
represent the links between different KOSs and the concepts they contain. The term
skos:inScheme
asserts that a given c
oncept is part of a given concept scheme,
while
skos:hasTopConcept

states that a KOS contains a concept as the root of
(one of) its constituent hierarchical tree(s),
i.e.
, a concept without
a
broader concept.


Conversion from a KOS native representation t
o SKOS RDF data requires
the
analys
is of

the original model of the KOS, and t
he

link
ing of

the elements of this
model to the SKOS ones that fit them most (Assem
et al
., 2006). One can
,

for
instance
,

decide to represent a “class” in a classification scheme
as a resource of
type skos:Concept. Based on such a specification, it is then possible to implement an
appropriate conversion program


e.g.
, a
n

XSL stylesheet

when

the vocabulary is
natively encoded in XML


to
automatically
convert the initial representa
tion to a
SKOS one.

As an example, a subject 11F coming from the Iconclass concept scheme
9
, “
the
Virgin Mary
”, identified by the (as yet fictive) resource http://www.iconclass.nl/s_11F,
could be partly represented by the graph in
f
ig
ure

3.





Figure

3
A

SKOS graph partly representing the Iconclass subject 11F.

Quoted strings are plain literals. “@” specifies the language of a literal: “en” is the tag
for “English”, “fr” for “French” and “zxx” stands for any “artificial language”.


3. Vocabulary alignment

as a solution to the interoperability problem


Having unified and linkable representations of the concepts contained in different
collections’ vocabularies helps managing them in a single framework. However, this
is not
sufficient
for solving the semantic

interoperability problem
.

O
ne
still
has to
d
etermine semantic similarity links between the elements of the different



9

See http://www.i
conclass.nl
.

vocabularies


to
align
10

them (Doerr, 2001).

Fig. 4

illustrates that

i
f a search engine

kn
ew
"

that a
SKOS
concept C from a thesaurus T1 i
s semantically equivalent to a
SKOS
concept D from thesaurus T2, then it c
ould

return all the objects that were
indexed against D for a query for objects described using C
.

The
objective

is
therefore

to align as many concepts of one thesaurus to the
ir

sema
ntic equivalent
s

in
the other

thesaurus
. Where such equivalency cannot be established
,

it may be
possible to establish links between concepts of one thesaurus
and

concepts

of the
second thesaurus th
at are
either
more specific or
more
general
,

and to exploi
t such
“narrower than” and “broader than”

relations for query processing.





Figure 4
Using vocabulary alignment for integrated access

to different collections


Such an approach has been
investigated for subject vocabularies in projects such as
HILT (
Mac
gregor et. al.
, 200
7
).
The alignment of
these vocabularies

is
however
a
labour
-
intensive task that requires considerable expertise in the
concerned
thesauri.
M
anual alignm
ent
has been approached by several projects
, notably,
CARMEN
(Krause, 2003),
Renardus

(Day
et. al.
, 2005), K
o
M
o
H
e
11
,
AOS (Liang & Sini, 2006)
or the ongoing
CRISSCROSS
12
,
MACS
13

(Landry, 2004) and

MSAC (Balikova,
2005)
. The
se projects

have
yielded
very interesting results

such as the
development
of tools
to support manual alignment, the deplo
yment of search engines that exploit
resulting alignments, and the contribution of initial methodological ideas. However,
they

also demonstrated the complexity
,
difficulty
, and cost

of manually
aligning large
vocabularies (us
u
ally containing many thousand
concepts) in realistically
-
sized
collections and settings. Given that manual labour is expensive and that
vocabularies
evolve
over

time,
it is clear that the construction and maintenance of alignment
cons
t
itutes an important issue that needs to be addresse
d
. There is a need for
developing
advanced
, computer
-
based tools that can identify
candidate mappings
between two vocabularies
, and
that can
then propose them to the human expert for
consideration.
Alignment would th
us

become a semi
-
automatic task
where th
esaurus




10

Alignment
refers

in this paper
to

the crea
tion of

semantic relationship
s

(
e.g.

equivalence)
between
concepts coming from
different
KOS
s

in order to solve interoperability problems
. This notion
approximates

what is

referred to

in the
KOS

comm
unity
by vocabulary
mapping,

crosswalk or
reconciliation,

and in the Semantic Web community
by

ontology
alignment, mapping or matching
.

11

See http://www.gesis.org/en/research/information_technology/komohe.htm

12

See
http://www.d
-
nb.de/wir/projekte/crisscros
s.htm

13

See http://macs.cenl.org.

exper
t
s’

work
would be

assisted, and where the integration of collections
would
become
more cost
-
efficient
.



Recently, the Semantic Web community has produced alignment tool
s that
address

the
specific
problem of formal ontology matching (Shvaiko &

Euzenat, 2005
).
However,
the techniques they employ and the goals they advertise make them
deployable in
a more general

context
, including thesauri and
other
similar KOSs
.


Although

m
ost of the existing
ontology alignment
tools rely on sophisticated meth
ods
(Euzenat & Shvaiko, 2007)
, they can be classified and described by the
basic
techniques
they build upon and t
he
different
sources of information

they exploit:
the
lexical information attached to the concepts

of the vocabularies
,
the
structure of
vocabu
laries,
the
collection objects
described by vocabularies,
or

other (
external
)

knowledge sources.



Lexical alignment techniques


In these techniques the lexical materializations of the concepts are compared
to each
other
. If a significant similarity is fou
nd, then we can establish a semantic link
between the concerned concepts. A straightforward example is when two concepts
have the same label. But one can also search for string inclusion patterns or more
complex techniques relying for instance on lemmatize
rs


get
ting normalized forms of
labels, e.g. “tree” for “trees”


and syntactical analysis tools. A concept labelled “(map
of) the North Pole” can be detected as a narrower concept of another which is
labelled “Charts, maps”. These lexical methods exploit

the preferred labels of
concepts, but they can also turn to their lexical variants or their associated definitions
and scope notes.

Of course, such approaches encounter the same problems as humans when dealing
with words taken out of context. Polysemy and

homonymy, for instance, are common
source
s

of errors. This has to be compensated
with

contextual information.




Structural alignment techniques


The

first kind of context is provided by the vocabulary itself,
as it contains

hierarchical
and associative l
in
ks between concepts
. These links, especially those concerning

hierarchical generalization and specialization, are useful to constrain a concept’s
natural interpretation: “bank” will be understood differently if it is a narrower term of
“finance” or “geog
raphy”. Some tools will analyze this semantic context, either to
check similarities obtained by other techniques or to derive new similarities from
existing ones.
If

two concepts from different vocabularies are semantically equivalent,
this
equivalence

wil
l positively influence the alignment tool when it will
examine

the
children of these concepts to find similarities between them.



Extensional alignment techniques


The

second kind of context comes from the actual usage of the concepts in
real
-
life
applic
ations
.

For instance, a class from a classification scheme will be used to
categorize a number of objects
(
e.g.
, books
)
in a collection. Accessing this
information will
provide

an

extensional


characterization of the class’ intended
meaning


akin to its
literary warrant
. When documents are described
using

two
different vocabularies
14
, statistical techniques
can be employed
to compare the sets
of documents described by the concepts from these vocabularies (Figure
5
). A high
degree of overlap between these s
ets will yield a high similarity between
corresponding concepts.

Several such techniques have

already been

experimented
in the
KOS

field, as in (Zhang, 2006) or (Isaac et. al, 2007)
.




Figure
5

Using object
-
level information to align vocabularies

[adapte
d from (Harmelen, 2005)]




Background knowledge
-
based alignment techniques


A final group of alignment methods rely on knowledge sources that are external to
the application and the vocabularies being considered. These sources can be of
a
different nature

as for instance
the use of existing gener
al
-
purpose ontologies like
CYC
15

or
semantic networks like Wordnet (Miller, 1995). Th
ese sources can
contribute KOS
-
external
knowledge
to
compensat
e

for the lack of
KOS
-
internal
lexical or structural
information. Fo
r example a concept “calendar” from one
thesaurus can be aligned to the more general
concept
“publication” from another
thesaurus, using the hypernymy relation that
holds
between the two corresponding
terms in Wordnet

(Fig.6)
.






14

This also applies to the more general case when the similarity between objects from two collections
described by their own vocabularies can be assessed, applying for example text similarity measures
on textual documents.

15

See http://www.opencyc.org.



Figure
6

Using backgroun
d knowledge to align vocabularies

[adapted from (Harmelen, 2005)]

4. Integrated Collections Access
: an Example


To illustrate the potential of
the described technology,

we used
it

for creating an
integrated access to two collections in two different Dutch

cultural heritage
institutions, the

Rijksmuseum, and the
National

Library

of the Netherlands
(Gendt

et
al.
, 2006)
.
The Manuscripts collection contains 10
,
000 medieval illuminations which
are

annotated by subject indices describing the content of the image
. These indices
come from the Iconclass classification scheme, a vocabulary
of 25,000 elements
designed for

iconographical analysis.
The Masterpieces collection contains 700
objects such as paintings and sculptures and its subjects are indexed using the
AR
IA
“catalogue”, a vocabulary conceived mainly as a resource for hierarchical browsing.


Both
vocabularies

were translated into

SKOS
, and mappings
between
t
hem
were
calculated with existing state
-
of
-
the

art mapping tools
,
namely,
Falcon

(
Jian

et al.,
2005)
and S
-
Match

(
Giunchiglia

et al., 2005).

Falcon

use
s

a mixture of lexical and
structural techniques.
In addition to lexical techniques,
S
-
Match uses Wordnet as
background knowledge, and
exploits


semantic
reasoning


using a
logical

interpretation
of the con
cepts
based on the structure of the vocabularies
.


W
e implemented a faceted browser, in

which the mappings
and
the
vocabular
ies’

semantic web representations are exploited
to provide integrated assess to the

collections
,

offering
three

different
views:

sin
gle
,
combined
, and
merged view
.


The
Single View
presents the integrated collections from the perspective of just one
of the vocabularies. In the screen capture (
Fig.

7
) the first four pictures come from the
Rijksmuseum, the others are Illuminated Manuscr
ipts. Browsing is done solely using
the
ARIA
Catalogue, i.e. these illuminations have been selected
exploiting
the
mapping between
the currently selected ARIA

concept “Animal Pieces” and
the
Iconclass
concept
“25F:animals”.


Figure
7

Single View: Using the

ARIA

thesaurus to browse the
two

collections

[from
(Gendt

et al.
, 2006)
]


The
Combined View

provides simultaneous access to the collections through their
respective vocabularies in parallel. This allows us to browse through the integrated
collections as
if it was a single collection indexed against two vocabularies. In figure
8,

we made a subject refinement to
ARIA

“Animal pieces”, and narrowed down our
search with Iconclass to the subject “Classical Mythology and Ancient History”.


Figure
8

Combined Vi
ew: Using
ARIA

and Iconclass to browse the
two

collections

[from
(Gendt

et al.
, 2006)
]


Finally, the
Merged View
gives

access to the collections through a merged thesaurus
combining both original vocabularies into a single facet, based on the links found
between them in the automatic mapping process. If we select the ARIA
concept
“Animal pieces”, the view provides both
ARIA

concepts such as “Birds” and Iconclass
concepts
such as
“29A:animals acting as human beings” for further refining our
search.



5. Di
scussion and conclusion


Existing alignment tools have been reported to perform relatively poorly on real
-
life

cases
such as

cultural heritage thesaurus alignment (Gendt et al., 2006)
. In fact,
a
lignment is still a
n open
research problem

a
s no single techn
ique
is universally
applicable, or will return satisfactory results.
In

practi
c
e
, different techniques have to
be
carefully
selected and combined, depending on the characteristics of the case at
hand,
such as

the richness of the semantic structures of voca
bularies, their lexical
coverage
and

the existence of collections
simultaneously
described by several
vocabularies.
It should be noted, h
owever,
that a
continuous
improvement

of
techn
i
ques and
tools
can
lead to significant improvements, as
witnessed in

the

regular evaluation campaigns organized by the research community (Euzenat et al.,
2006).


The Semantic Web
-
inspired methods and tools
described

in this paper still require
further experimentation in
practical

applications, and
a greater availability of
v
ocabularies. Nevertheless, the
existence

of current representation and alignment
techniques already allows
the creation
demonstrators showing their potential value
for integrating collections at the semantic level, leading from separate islands of
cultural

heritage knowledge to better connected networks of collections and
vocabularies.


One such demonstrator
is

described in Sect
ion

4
;
th
is

faceted
browser
gives

a unified
access to two collections of illuminated manuscripts

via any of its respective
metadata

descriptions.
Other examples of Web portals
that
illustrat
e

the use of
Semantic Web techniques in the cultural heritage domain can be seen on the
websites of the
MuseumFinland
16

and
eCulture
17

projects.

These projects, even if not
focusing on semantic align
ment, demonstrate the possible benefits of using Semantic
Web
technologies: the use of the SKOS
representation

format, the

develop
ment of

innovative interfaces to access Cultural Heritage collections, and
the exploitation of
automated reasoning techniques
over
RDF
-
based
metadata.


Other
portals
are being

created

with enhanced functionality and usability,
as
the
synergy between

CH and SW communities
increases
:
one example is the ongoing

eCulture project,
which

has
been given

the Semantic Web Challenge
18

awar
d
in
2006. I
n

fact, the richness and high quality of cultural heritage data is very attractive
to researchers of the Semantic Web community
,

as they have many tools but little
real
-
life

metadata to show their true potential
.
On the other hand, the CH
domai
n

(including digital
libraries
)
could profit from techniques and tools developed by the
SW community
in creating

a Web of cultural heritage that delivers high quality
content via easily accessible
semantic

search and navigation.
Semantic search



matching
meanings


represents

a huge advance
in relation
to current web search
ing
techniques

that are based on

full text search


matching

strings
.



References


Antoniou, G., Harmelen, F. van (2004) Semantic Web Primer.
Cambridge, MA: MIT
Press.


Assem, M.

van
,
Malaise, V., Miles, A., Schreiber, G.

(2005)

A Method to Convert
Thesauri

to SKOS.
3rd European SemanticWeb Conference
, Budva, Montenegro,
2005.


Balikova, M. (2005) Multilingual Subject Access to Catalogues of National Libraries
(MSAC) Czech Republic’s co
llaboration with Slovakia, Slovenia, Croatia, Macedonia,
Lithuania and Latvia. Paper presented at the 71th IFLA General Conference and
Council "Libraries
-

A voyage of discovery", August 14th
-

18th 2005, Oslo, Norway.

Available at: http://www.ifla.org/IV
/ifla71/papers/044e
-
Balikova.pdf.


Berners
-
Lee, T., Hendler, J., Lassila, O. (2001) The Semantic Web.
Scientific
American
, 284 (5), 34
-
43.

Available at: http://www.sciam.com/2001/0501issue/0501berners
-
lee.html.


Brickley
, D.,

Guha
, R. V., Editors (2004)
R
DF Vocabulary Description Language 1.0:
RDF Schema
.
W3C R
ecommendation, 10 February 2004. Latest version a
vailable at
http://www.w3.org/TR/rdf
-
schema/


Day, M., Koch, T., Neuroth, H. (2005) Searching and browsing multiple subject
gateways in the Renardus s
ervice. In
Proceedings of the

Sixth International
Conference on Social Science Methodology
, Amsterdam, 2005.


Doerr, M. (2001) Semantic Problems of Thesaurus Mapping.
Journal of Digital
Information
, 1 (8).Article No. 52.




16

See
http://www.museosuomi.fi.

17

See
http://e
-
culture.multimedian.nl.

18

See
http://challenge.semanticweb.org
.

Available at: http://jodi.tamu.edu
/Articles/v01/i08/Doerr/.


Euzenat, J. et al. (2006) Results of the Ontology Alignment Evaluation Initiative 2006.

International Workshop on Ontology Matching, 5th International Semantic Web
Conference (ISWC 2006)
, Athens, Georgia, USA.

Available at:http:
//www.dit.unitn.it/~p2p/OM
-
2006/7
-
oaei2006.pdf.


Euzenat, J., Shvaiko, P. (2007) Ontology Matching.
Berlin, Heidelberg: Springer.


Gendt, M. van et al.
(2006) Semantic Web Techniques for Multiple Views on
Heterogeneous Collections: a Case Study. In: Julio

Gonzalo et al. (Eds.). Research
and advanced technology for digital libraries: proceedings of the
10th European
Conference on Research and Advanced Technology for Digital Libraries (ECDL
2006)
, Alicante, Spain, September 17
-
22 2006. Berlin, Heidelberg: Spr
inger. (Lecture
Notes in Computer Science, 4172), 426
-
437.


Giunchiglia, F., Shvaiko, P., Yatskevich

, M.

(2005)

Semantic Schema Matching.
13
th
.
International Conference on Cooperative Information Systems (CoopIS 2005).


Harmelen, F. van
(2005) Ontology M
apping: A Way Out of the Medical Tower of
Babel? In: S. Miksch, J. Hunter and E. Keravnou (Eds.). Artificial Intelligence in
Medicine: proceedings of the
10th Conference on Artificial Intelligence in Medicine
,
AIME 2005, Aberdeen, UK. Berlin, Heidelberg: S
pringer. (Lecture notes in computer
science, 3581), 3
-
6.


Isaac, A., Meij, L. van der, Schlobach, S., Wang
, S. (2007)
An empirical study of
instance
-
based ontology matching.
Proceeedings of the 6th Intern
ational Semantic
Web Conference (ISWC 2007)
. Busan,
Corea.


Jian, N., Hu, W., Cheng, G., and Qu, Y.

(2005)

Falcon
-
AO: Aligning Ontologies with

Falcon.
K
-
CAP Workshop on Integrating Ontologies
, Ban
ff,

Canada, 2005.


Krause, J. (2003)

Standardization, heterogeneity and the quality of content indexing:
a key c
onflict of digital libraries and its solution.

World library and information
congress: 69th IFLA general conference and council
.



Landry, P. (2004) Multilingual Subject Access: The Linking Approach of MACS.
Cataloging & Classification Quarterly
, 37 (3
-
4),

177
-
191.


Liang,

A.
,

Sini
,
M.

(2006)
Mapping AGROVOC and the Chinese Agricultural
Thesaurus: Definitions, tools, procedures
.
New Review in Hypermedia and
Multimedia
, 12 (1),
51
-

62
.


McGuinness, D. L., Harmelen F. van, Editors (2004) OWL Web Ontology Lang
uage
Overview
.
W3C Re
commendation, 10 February 2004. Latest version
a
vailable at
http://www.w3.org/TR/owl
-
features/


Miles, A., Brickley, D. (2005) SKOS Core Guide. W3C Working Draft. Work in
progress, latest version available at http://www.w3.org/TR/swbp
-
skos
-
core
-
guide/


Miller, G. (1995) Wordnet: a lexical database for English.
Communications of the
ACM
,

38 (11), 39
-
41.


Macgregor, G.
,

McCulloch, E.
,

Nicholson, D.

(2007)

A DDC spine based terminology
server for improved resource discovery, 2nd Internatio
nal Conference on Metadata
and Semantics Resea
rch.

Corfu,
Greece,
2007.


Shvaiko, P., Euzenat, J. (2005) Ontology Matching.
D
-
Lib Magazine
, 11 (12), In Brief.
Available at:
http://www.dlib.org/dlib/december05/12inbrief.html


Zhang, X. (2006)
Concept integr
ation of document databases using

different indexing
languages
.
Information Processing and Management
, 42,
121

135