Bluffers Guide to The Semantic Web

looneyvillebiologistInternet and Web Development

Oct 21, 2013 (3 years and 7 months ago)

84 views

1

Bluffers Guide to

The Semantic
Web

Frank van Harmelen

CS Department

Vrije Universiteit Amsterdam

2

Semantics as your saviour?

3

4

Outline


The general idea: a Web of Data


What must be done to realise this


How far away is this


Nex steps, do’s, don’ts


5

The Scientist’s Problem

Too much unintegrated data:


from a variety of incompatible sources


no standard naming convention


each with a custom browsing and querying
mechanism (no common interface)


and poor interaction with other data
sources

6

What are the Data Sources?


Flat Files


URLs


Proprietary Databases


Public Databases


Spreadsheets


Emails





7

In which disciplines?


Archeology


Chemistry


Genomics, proteomics, ... (bio/life
-
sciences)


Communication science


Social history


Linguistics


Bio
-
diversity


Environmental sciences (climate studies)


....


libraries (KB), archives (sound&vision)

One dataset per site

a new database each month

historical data

laymen data

international data

(for their first time)

8

Outline


The general idea: a Web of Data


What must be done to realise this


How far away is this


Nex steps, do’s, don’ts


The Current Web of text and pictures

a web page

in English

about

Frank




And this

page is

about

LarKC

and another

web page

about

Frank

And this

page is

about

Stefano



This page

is about

the Vrije

Uniersitei


linked web
-
pages,


written by people,


written for people,


used only by people...

Many of these pages

already come from data,

that is usable by computers!

But we can’t link the data....

?

?

?

?

The Future Web of Data

?

linked data,

usable by computers!

useful for people!

10

Which Semantic Web?


Version 1:

“Enrichment of the current Web”



recipe
:

Annotate and classify web
-
content


enable

better search & browse,..

11

Which Semantic Web?


Version 2:

"Semantic Web as Web of Data" (TBL)




recipe
:

expose databases on the web,

use RDF, integrate


meta
-
data

from:


expressing DB schema semantics

in machine interpretable ways


enable

integration and unexpected re
-
use

12

Outline


The general idea: a Web of Data


What must be done to realise this


How far away is this


Nex steps, do’s, don’ts


13

machine accessible meaning


(What it’s like to be a machine)

<name>

<
symptoms
>

<
drug
>

<drug

administration
>

<
disease
>

<treatment>

IS
-
A

alleviates

META
-
DATA

14

What is meta
-
data?


it's just data


it's data describing other data


its' meant for machine consumption

disease

name

symptoms

drug

administration

15

Required are:

1.

a
standard syntax


so meta
-
data can be recognised as such

2.
one or more
shared vocabularies


so data producers and data consumers

all speak the same language

3.

lots of resources with meta
-
data attached



mechanisms for
attribution and trust


1. A standard syntax


things & relations between things

Semantic Web data model: RDF

17

RDF Triples in Life Sciences

18

RDF Triples in Geo

<rdf:RDF>


<geo:Point>


<geo:lat>55.701</geo:lat>


<geo:long>12.552</geo:long>


</geo:Point>

</rdf:RDF>


geo:point:_

55.701

12.552

geo:lat

geo:long

19

RDF Schema:

vocabulary for data
types


Classes + subclass hierarchy


rivers

are
waterways


Properties + subproperty hierarchy


father
-
of

implies
parent
-
of



Domain of properties


X capital
-
of Y



X has
-
type city


Range of properties


X capital
-
of Y



Y has
-
type country

20

OWL:

richer vocabulary for data
types


Things RDF Schema cannot express:


Description Logic
SHOIN(D)


equality, disjunction, negation,


min/max number restrictions


inverse, symmetric, transitive properties


and much more…

Example
:

Every country has precisely one capital:

Inference

TheHague ≠ A’dam & A’dam = capital


TheHague ≠ capital

Integrity checks

after data
-
merging

OWL

Web of Data:
a
nybody can say
anything about anything


All identifiers are URL's
(= on the Web)


Allows total decoupling of


data


vocabulary


meta
-
data


x

T

[<x> IsOfType <T>]

different

owners & locations

<prince>

22

2. Shared vocabularies


Mesh


Medical Subject Headings, National Library of Medicine


22.000 descriptions


EMTREE


Commercial Elsevier, Drugs and diseases


45.000 terms, 190.000 synonyms


UMLS


Integrates 100 different vocabularies


SNOMED


200.000 concepts, College of American Pathologists


Gene Ontology


15.000 terms in molecular biology


NCBI Cancer Ontology
:


17,000 classes (about 1M definitions)

23

Outline


The general idea: a Web of Data


What must be done to realise this


How far away is this


Nex steps, do’s, don’ts


24

How far away is this ?


Stable data formats & standardised inferences


Lots of shared vocabularies

(+ ways to convert them)


Lots of data sources

(+ ways to convert them)


Lots of tools


convert, construct, edit (data, vocabularies)


store, search, query, reason


interlink


visualise


...

already many billions of facts & rules

How far away is this ?

Not very far away!

rapidly growing Linked Open Data cloud.

It gets bigger every month

26

Example use
-
case:

bbc.co.uk/music/artists


Content is BBC + LOD


Use an ontology as basis for the site


Serve data back out as RDF





“The Web is becoming our content
management platform”

27

Outline


The general idea: a Web of Data


What must be done to realise this


How far away is this


Nex steps, do’s, don’ts


28

Next steps

1.
hunt for shared vocabularies


try to avoid building them

2.
wrap legacy data sources


your own


from others

3.
link wrapped sources

4.
publish linked data on the web


make noise

5.
reconstruct some old results

6.
produce new results

7.
get famous

papers in oncology,

in communication science,

dedicated conferences in
chemistry, earth
-
sciences, life
-
sciences, humanities

funding opportunities in humanities, social
sciences, life sciences

learn / get access to

some basic technology

in
-
use systems in
communication science, KB,
Beeld & Geluid, Europeana

29

Questions & discussion


Frank.van.Harmelen@cs.vu.nl

http://www.cs.vu.nl/~frankh/popularising.html