E-Culture: Challenging Use Cases for the Semantic Web

nervousripSecurity

Nov 5, 2013 (3 years and 11 months ago)

44 views

Principles and
pragmatics of a

Semantic Culture Web


Tearing down walls

and

Building bridges

Overview


Virtual collections and Semantic Web


Semantic collection
-
search demonstrator


For cultural heritage objects


Metadata & vocabulary representation and
enrichment


Principles for knowledge engineering on the
Web


Part of large Dutch
knowledge
-
economy
project
MultimediaN


Partners: VU, CWI, UvA,
DEN,ICN


People
:


Alia Amin, Lora Aroyo, Mark van
Assem, Victor de Boer, Lynda
Hardman, Michiel Hildebrand, Laura
Hollink, Marco de Niet, Borys
Omelayenko,
Marie
-
France van
Orsouw, Jacco van Ossenbruggen,
Guus Schreiber

Jos Taekema,
Annemiek Teesing,

Anna Tordai, Jan
Wielemaker, Bob Wielinga



Artchive.com, Rijksmuseum
Amsterdam, Dutch ethnology
musea (Amsterdam, Leiden),
National Library (Bibliopolis)

Acknowledgements

Hypothesis


Semantic Web technology is in particular
useful in knowledge
-
rich domains


or formulated differently



If we cannot show added value in
knowledge
-
rich domains, then it may have no
value at all

The Web:

resources and links

URL

URL

Web link

The Semantic Web:

typed resources and links

URL

URL

Web link

ULAN


Henri Matisse

Dublin Core


creator

Painting

“Woman with hat

SFMOMA



Principle 1: semantic annotation


Description of
web objects with
“concepts” from
a shared
vocabulary

Principle 2: semantic search


Search for objects
which are linked via
concepts (semantic
link)


Use the type of
semantic link to
provide meaningful
presentation of the
search results

Paris

Montmartre

PartOf

Query

“Paris”

The myth of a unified vocabulary


In large virtual collections there are always multiple
vocabularies


In multiple languages


Every vocabulary has its own perspective


You can’t just merge them


But you can use vocabularies jointly by defining a
limited set of links


“Vocabulary alignment”


It is surprising what you can do with just a few links

Principle 3: vocabulary alignment

“Tokugawa”

SVCN period


Edo

SVCN is local in
-
house

ethnology thesaurus

AAT style/period


Edo (Japanese period)



Tokugawa

AAT is Getty’s

Art & Architecture Thesaurus

A link between two thesauri

Levels of interoperability


Syntactic interoperability


using data formats that you can share


XML family is the preferred option


Semantic interoperability


How to share meaning / concepts


Technology for finding and representing semantic
links

Distributed vs. centralized collection
data


Minimal requirement: collection object has
image URI


Preference for external metadata, accessed
through protocol such as OAI


In practice, external metadata access is still
cumbersome

http://e
-
culture.multimedian.nl/demo/search

Search strategies


Basic search: keyword
-
oriented


Advanced search:


Tweaking default search parameters


Time
-
related queries


Faceted search


Relation search


How are two URIs related?

Keyword search with semantic
clustering

1.
Btree of literals plus Porter stem and
metaphone index

2.
Find resources with matching labels


Default resources are “Work”s

3.
Find related resources by one
-
way graph
traversal


owl:inverseOf

is used


Threshold used for constraining search

4.
Cluster results (group instances)

Search: WordNet patterns that increase
recall without sacrificing precisions

Term disambiguation is key issue in
semantic search


Post
-
query


Sort search results based on different meanings
of the search term


Mimics Google
-
type search


Pre
-
query


Ask user to disambiguate by displaying list of
possible meanings


Interface is more complex, but more search
functionality can be offered

Faceted search


Use Dublin Core scheme to formulate
complex queries


Navigate through relevant metadata

Faceted search

Faceted

search

What do you need to do to make
your collection part of a Semantic
Culture Web?







Four activities

From metadata to

semantic metadata

1. Make vocabulary

interoperable

2. Align metadata

schema

3. Enrich

metadata

4. Align

vocabulary

Activity 1: syntactic vocabulary
interoperability



Making vocabularies available in the Web
standard RDF


Many organizations already do this


W3C provides the SKOS template to make
this almost straightforward


Effort required: at most a few days

33

Multi
-
lingual labels for concepts

34

Semantic relation:

broader
and
narrower


No subclass semantics assumed!

Activity 2: aligning the metadata
schema


Specify your collection metadata scheme as
a specialization of Dublin Core


With RDF/OWL this is easy/trivial!


Cf. DC Application Profiles


Aligning VRA with Dublin Core




VRA is specialization of Dublin Core for
visual resources


VRA properties “material.medium” and
“material.support” are specializations of
Dublin Core property “format



vra:material.medium rdfs:subPropertyOf
dc:fotmat .

vra:material.medium rdfs:subPropertyOf
dc:format .

Activity 3: enriching the metadata


Extracting additional concepts from an
annotation


Matching the string “Paris” to a vocabulary term


Information
-
extraction techniques exists (and
continue to be developed)


Effort required can be up to a few weeks


The more concepts, the better, but no need to be
perfect!

Example textual annotation

Resulting semantic annotation

(rendered as HTML with RDFa)

41

RDFa: embedding RDF in (X)HTML

Regular HTML

Resulting RDF statements

HTML with RDFa

Activity 4: aligning the vocabulary


Find semantic links between vocabulary links


Derain (ULAN) related
-
to Fauve (AAT)
)


Automatic techniques exists, but performance varies


Often combination of automatic and manual
alignment


Effort strongly dependent on vocabularies


But “a little semantic goes a long way” (Hendler)

Learning alignments


Learning relations between art styles in AAT
and artists in ULAN through NLP of art
historic texts


“Who are Impressionist painters?”

Extracting additional knowledge

from scope notes

Principles for

knowledge engineering

on the Web

Principle 1: Be modest!


Ontology engineers should refrain from
developing their own idiosyncratic ontologies


Instead, they should make the available rich
vocabularies, thesauri and databases
available in web format


Initially, only add the originally intended
semantics

Principle 2: Think large!

"Once you have a truly massive amount of
information integrated as knowledge, then the
human
-
software system will be superhuman, in
the same sense that mankind with writing is
superhuman compared to mankind before
writing."

Doug Lenat

Principle 3: Develop and use
patterns!


Don’t try to be (too) creative


Ontology engineering should not be an art
but a discipline


Patterns play a key role in methodology for
ontology engineering


See for example patterns developed by the
W3C Semantic Web Best Practices group

http://www.w3.org/2001/sw/BestPractices/


SKOS can also be considered a pattern


Principle 4: Don’t recreate, but enrich
and align


Techniques:


Learning ontology relations/mappings


Semantic analysis, e.g. OntoClean


Processing of scope notes in thesauri



Principle 5: Beware of ontological

over
-
commitment!

Principle 6: Specifying a data model
in OWL does ot make it an ontology!


Papers about your own idiosyncratic
“university ontology” should be rejected at
SW conferences


The qality of an ontology does not depend on
the number of OWL constrcts sed



Principle 7: Required level of formal
semantics depends on the domain!


In our semantic search we use three OWL
constructs:


owl:sameAs, owl:TransitiveProperty,
owl:SymmetricProperty


But cultural heritage has is very different from
medicine and bioinformatics


Don’t over
-
generalize on requirements for e.g.
OWL

Perspectives


Basic Semantic Web technology is ready for
deployment


Research themes:


Scalability, vocabulary alignment, metadata
extraction


Web 2.0 facilities fit well:


Involving community experts in annotation


Personalization


Social barriers have to be overcome!