Introduction to Semantic Web

sounderslipInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

180 εμφανίσεις

Introduction to

Semantic Web


What? Why? How? So far? Next?

Frank van Harmelen

AI Department

Vrije Universiteit Amsterdam

Creative Commons License:

allowed to share & remix,

but must attribute & non
-
commercial

Who am I


Frank van Harmelen


Prof in AI at Vrije Universiteit Amsterdam


Knowledge Representation


Early Semantic Web Projects (> 1999)


Co
-
designed OWL


Tech advisor of Aduna (Sesame)


Scientific Director of LarKC

(Large Knowledge Collider)


I know
nothing

about image analysis…

Who are you?


who knows roughly what Semantic Web is?


who has
heard

of



RDF & OWL?


who has
studied



RDF & OWL?


who has
used




RDF & OWL?


who expects ever to
use

RDF & OWL?



who is a
logician


who is a
KR researcher


who is a
Web researcher


who is an
image

researcher


General idea of

the Semantic Web

General idea of Semantic Web


Make current web
more machine accessible

(currently all the intelligence is in the user)



Motivating
use
-
cases


search


personalisation


semantic linking


data integration


web services


...

General idea of Semantic Web

Do this by:

1.
Making
data and meta
-
data

available on the Web

in machine
-
understandable form
(
formalised
)

2.
Structure the data

and meta
-
data

in
ontologies


These are non
-
trivial

design decisions.

Alternative would be:

Make current web
more machine accessible

(currently all the intelligence is in the user)

What’s wrong with the Web?

a web page

in English

about

Frank




And this

page is

about

LarKC

and another

web page

about

Frank

And this

page is

about

Stefano



This page

is about

the Vrije

Uniersitei


linked web
-
pages,


written by people,


written for people,


used only by people...

Many of these pages

already come from data,

usable by computers!

But we can’t link the data....

?

?

?

?

?

linked data,

usable by computers!

useful for people!

"Web of Data" (TBL)



1.
expose data on the web (“
facts
”)

in interoperable form (RDF)

2.
expose
knowledge

on the web

with interoperable semantics

(ontologies, RDF Schema, OWL)

3.
Apply lightweight inference for


Interoperability


Query answering


Search


Unexpected reuse




Semantic Web

Not just data,

also knowledge



All of this:


Low expressivity logic (
RDF
)


That allows some inference:

Property inheritance, domain/range inference



Some of this:


Medium expressive logic (
OWL
)


That allows more inference:

(in)equality, number restrictions, datatypes

Desideratum:

On the Web of Data, anyone

can say anything about anything


Need for total decoupling of


data


vocabulary


meta
-
data


x

T

[<x> IsOfType <T>]

different

owners & locations

<village>

Two versions of Semantic Web
story:


V1: Semantic Web = annotated Web ;

1 & 2 are embedded in text & images on the Web


V2: Semantic Web = Web of Data ;

1 & 2 live in dedicated repositories (triple stores)

x

T

[<x> IsOfType <T>]

different

owners & locations

<village>







Why is this hard?

machine accessible meaning


(What it’s like to be a machine)

<name>

<
symptoms
>

<
drug
>

<drug

administration
>

<
disease
>

<treatment>

IS
-
A

alleviates

META
-
DATA

What is meta
-
data?


it's just data


it's data describing other data


its' meant for machine consumption

disease

name

symptoms

drug

administration

What is required?

Required are:

1.

one or more
standard vocabularies


so search engines, producers and consumers

all speak the same language

2.

a
standard syntax
,


so meta
-
data can be recognised as such

3.

lots of resources with meta
-
data attached


Bluffer’s Guide to

RDF & RDF Schema

Bluffer’s Guide to RDF


Express
relations

between
things:





Results in labelled network (“graph”)


All labels are actually web
-
addresses (URIs)


You can “ping” any label and find out more


Bits of the graph can live at physically different
locations & have different owners


Frank

y

x

AuthorOf

MIT

publishedBy

Subject

Object

Predicate

Bluffer’s Guide to RDF Schema


types for subjects & objects & predicates


Types organised in a hierarchy


Inheritance of properties


Frank

y

x

AuthorOf

MIT

publishedBy

author

book

publisher

person

artifact

man

So what’s special about RDF(S)?


statements about an identifier can be distributed


<owl:Individual ID="CENTRAL
-
COAST" />


<owl:Individual rdf:about="CENTRAL
-
COAST">


<type rdf:resource="#CALIFORNIA
-
REGION"/>

</owl:Individual>




no unique name assumption


no closed world assumption

Remember

web
-
style

decoupling


Remember:


Need for total decoupling of


data


vocabulary


meta
-
data


x

T

[<x> IsOfType <T>]

different

owners & locations

<village>

RDF(S) have a (very small)

formal semantics


Defines what other statements are
implied

by a given set of RDF(S) statements



Ensures mutual

agreement on minimal content

between parties without further contact



In the form of “entailment rules”


Very
simple to compute


(and not explosive in practice)

RDF(S) semantics: examples


Aspirin
isOfType

Painkiller

Painkiller
subClassOf

Drug



Aspirin
isOfType

Drug



aspirin alleviates headache

alleviates
range

symptom



headache
isOfType

symptom



RDF(S) semantics: examples


Aspirin

isOfType

Painkiller

Painkiller

subClassOf

Drug



Aspirin

isOfType

Drug



aspirin alleviates headache

treats

range

symptom




headache

isOfType

symptom



RDF(S) semantics


X R Y + R
domain

T



IsOfType

T


X R Y + R
range

T



IsOfType

T


T1
SubClassOf

T2 +

T2
SubClassOf

T3


T1
SubClassOf

T3


X
IsOfType

T1 +

T1
SubClassOf

T2


X
IsOfType

T1

Semantics = predictable inference

Bluffer’s Guide to

OWL

OWL:

things RDF Schema can’t do


equality


enumeration


number restrictions


Single
-
valued/multi
-
valued


Optional/required values


inverse, symmetric, transitive


boolean algebra


Union, complement




Layered language



OWL Lite:


Classification hierarchy


Simple constraints



OWL DL:


Maximal expressiveness


While maintaining tractability


Standard formalisation



OWL Full:


Very high expressiveness


Loosing tractability


Non
-
standard formalisation


All syntactic freedom of RDF

(self
-
modifying)

Syntactic layering

Semantic layering

Full

DL

Lite

Language Layers

Full

DL

Lite


OWL Full



Allow meta
-
classes etc


OWL DL


Negation


Disjunction


Full Cardinality


Enumerated types



OWL Light


(sub)classes, individuals


(sub)properties, domain, range


conjunction


(in)equality


cardinality 0/1


datatypes


inverse, transitive, symmetric


hasValue


someValuesFrom


allValuesFrom

RDF Schema

Backward compatibility with RDF

<
owl:Class

rdf:ID
="
City
">


<
rdfs:subClassOf

rdf:resource
="
#GeographicEntity
"/>


<
rdfs:subClassOf
>


<
owl:Restriction
>


<
owl:onProperty

rdf:resource
="
#ruler
"/>


<
owl:allValuesFrom

rdf:resource
="
#Mayor
"/>


</
owl:Restriction
>


</
rdfs:subClassOf
>

</
owl:Class
>


OWL agents understand everything…

<
owl:Class

rdf:ID
="
City
">


<
rdfs:subClassOf

rdf:resource
="
#GeographicEntity
"/>


<
daml:subClassOf
>


<
daml:Restriction
>


<
daml:onProperty

rdf:resource
="
#ruler
"/>


<
daml:toClass

rdf:resource
="
#Mayor
"/>


</
daml:Restriction
>


</
daml:subClassOf
>

</
owl:Class
>


OWL agents understand everything…

… others still the most important aspects

Backward compatibility with RDF

OWL also has a

formal semantics


Defines what other statements are
implied

by a
given set of statements



Ensures
mutual agreement

on content

(both
minimal and maximal
)

between parties without further contact



Can be used for integrity/
consistency checking


Hard to compute

(and
rarely/sometime/always explosive in practice
)

OWL semantics: minimal


vanGogh isOfType Impressionist

Impressionist subClassOf Painter



vanGogh isOfType Painter



vanGogh painter
-
of sunflowers

painter
-
of domain painter




vanGogh isOfType painter



OWL semantics: maximal


vanGogh isOfType Impressionist

Impressionist disjointFrom Cubist



NOT:
vanGogh isOfType Cubist




painted
-
by has
-
cardinality 1

sun
-
flowers painted
-
by vanGogh

Picasso different
-
individual
-
from vanGogh



NOT:
sun
-
flowers painted
-
by Picasso

Remember:

Require are


1.

standard vocabularies

2.
a
standard syntax
,

3.

lots of resources

with meta
-
data attached



Ontologies: real life examples



handcrafted


music:
CDnow

(2410/5),
MusicMoz

(1073/7)


biomedical:

SNOMED

(200k),
GO

(15k),





Emtree
(45k+190k




Systems biology



ranging from lightweight


Yahoo
, UNSPC,
Open directory (400k
)


to heavyweight (
Cyc (300k)
)



ranging from small (
METAR
)


to large (
UNSPC
)

Biomedical ontologies
(a few..)


Mesh


Medical Subject Headings, National Library of Medicine


22.000 descriptions


EMTREE


Commercial Elsevier, Drugs and diseases


45.000 terms, 190.000 synonyms


UMLS


Integrates 100 different vocabularies


SNOMED


200.000 concepts, College of American Pathologists


Gene Ontology


15.000 terms in molecular biology


NCBI Cancer Ontology:


17,000 classes (about 1M definitions),

Remember:

Require are


1.

standard vocabularies

2.
a
standard syntax
,

3.

lots of resources

with meta
-
data attached



Who makes the meta
-
data?


Don’t throw away what we already have
:


Databases (Amazon.com)


Navigation structures


meta
-
data in documents


Office, Acrobat, MP3, jpg


As spin
-
off on what we already do


MIT Media Lab photo annotator


Automated analysis


Text, Images, Video

Summary so far

Linked Data/Semantic Web


Identification


Uniform Resource
Identifier

(URI)


Global
identifier (NB: persistent!)


Looks like a URL,

is often and internationalized Resource Identifier (IRI)


Description


Resource
Description

Framework

(RDF)


RDF Schema (RDFS)


Simple

Knowledge

Organization

System (SKOS)


Web
Ontology

Language

(OWL)


Querying


RDF
Triple

stores


SPARQL Query
Language



Hoe ziet RDF eruit?


Datamodel is een (directed) graph


Elk data
-
item is een ‘resource’ met een URI
als identifier


Elke eigenschap is een binaire relatie:


‘triple’


Tussen resources:

<subjectURI, predicateURI, objectURI>


Tussen een resource en een ‘literal’

<subjectURI, predicateURI, “literal value”>

Why is this a
Web

of data?


Global unique identifiers


Reuse of identifiers in other datasets


For data:

(two sources say something about over ‘Amsterdam’ )


For schema:

(two sources each use the same concept ‘City’)


This reuse builds “links” between datasets


Does this work in practice?

already many billions of facts & rules

Linked Open Data cloud

May ‘09 estimate > 4.2 billion triples +


140 million interlinks

It gets bigger every month

It gets bigger every month

And remember:

not just data



All of this:


Low expressivity logic (RDF/RDFS)


That allows some inference:

Property inheritance, domain/range inference



Some of this:


Medium expressive logic (OWL)


That allows more inference:

(in)equality, number restrictions, datatypes

Nice in the lab,

but are you getting anywhere

in practice?

Semantic Web

News Quiz



Google



Reuters



New York Times



Microsoft



Zemanta



Obama Government



BBC

(music, worldcup, wildlife)



BestBuy.com



Facebook

Challenges

What to do when success is
becoming a problem?


Heterogeneity


ontology mapping, instance identification


Scale (10^10 statements)


Dynamics, versioning

(Flickr: 3000 pictures/minute,


Wikipedia: 100 edits/minute)


Trust, attribution, provenance


Multimedia


In both directions