Semantic Web: the Story So Far

draughtplumpInternet and Web Development

Oct 22, 2013 (3 years and 7 months ago)

93 views

Semantic Web

The Story So Far

Ian Horrocks

<ian.horrocks@comlab.ox.ac.uk>

Oxford University

Computing Laboratory

Semantic Web


According to
W3C


“an evolving
extension of the World Wide Web

in which web
content can be …
read and used by software agents
, thus
permitting them to
find, share and integrate information

more
easily”


Data will use uniform syntactic structure (
RDF
)


(
OWL
) ontologies will provide


Schemas for data


Vocabulary for annotations


Ultimate goal is a “
more intelligent web


Semantic Web


Semantic Web led to requirement for a “web ontology language”



set up Web
-
Ontology (
WebOnt
) Working Group


WebOnt developed
OWL

language


OWL based on earlier languages
RDF
,
OIL

and
DAML+OIL


OWL now a W3C
recommendation

(i.e., a standard)


OWL is a family of 3 languages: OWL Lite, OWL DL

and OWL Full


OIL, DAML+OIL and OWL (DL & Lite) based on

Description Logics


Has facilitated development of wide range of high

quality tools & infrastructure


OWL now language of choice in many

applications

Web Ontology Language OWL

What Are Description Logics?


A family of logic based Knowledge Representation
formalisms


Descendants of
semantic networks

and
KL
-
ONE


Describe domain in terms of
concepts

(classes),
roles

(properties, relationships) and
individuals


Operators
allow for composition of complex concepts


Names

can be given to complex concepts, e.g.:


HappyParent

´

Parent
u

8
hasChild.(Intelligent
t
Athletic)

HappyParent

´

Parent
u

8
hasChild.(
Intelligent
t

Athletic
)

HappyParent

´

Parent
u

8
hasChild
.(Intelligent
t
Athletic)

HappyParent

´

Parent

u

8
hasChild.(Intelligent

t
Athletic)

HappyParent

´

Parent
u

8
hasChild.(Intelligent
t
Athletic)

Why (Description) Logic?


OWL exploits results of 15+ years of DL research


Well defined (model theoretic)
semantics










Most DLs are subsets of C2, i.e., decidable fragments of FOL

Why (Description) Logic?


OWL exploits results of 15+ years of DL research


Well defined (model theoretic)
semantics


Formal properties

well understood (complexity, decidability)

[Garey & Johnson. Computers and Intractability: A Guide
to the Theory of NP
-
Completeness. Freeman, 1979.]

I can’t find an efficient algorithm, but neither can all these famous people.

Why (Description) Logic?


OWL exploits results of 15+ years of DL research


Well defined (model theoretic)
semantics


Formal properties

well understood (complexity, decidability)


Known
reasoning
algorithms


Why (Description) Logic?


OWL exploits results of 15+ years of DL research


Well defined (model theoretic)
semantics


Formal properties

well understood (complexity, decidability)


Known
reasoning
algorithms


Implemented systems

(highly optimised)

Pellet

KAON2

CEL

Class/Concept Constructors



Concept can be thought of as a FOL formula with one free variable

Knowledge Base / Ontology Axioms

OWL RDF/XML Exchange Syntax



<owl:Class>


<owl:intersectionOf rdf:parseType=" collection">


<owl:Class rdf:about="#Parent"/>


<owl:Restriction>


<owl:onProperty rdf:resource="#hasChild"/>


<owl:allValuesFrom>


<owl:unionOf rdf:parseType=" collection">


<owl:Class rdf:about="#Intelligent"/>


<owl:Class rdf:about="#Athletic"/>


</owl:unionOf>


</owl:allValuesFrom>


</owl:Restriction>


</owl:intersectionOf>

</owl:Class>

E.g.,

Parent
u

8
hasChild.(Intelligent
t
Athletic):

Ontology based Information Systems


Similar to
relational databases


Ontology
¼

schema; instances
¼

data


Some important (
dis
)
advantages

+
(Relatively) easy to maintain and update schema


Both schema and data are “self organising”

+
Query answers reflect both schema and data

+
Able to answer both intensional and extensional queries


Semantics may be counter
-
intuitive or even inappropriate


Open
-
v
-

closed world; axioms
-
v
-

constraints


Query answering (logical entailment) much more difficult


Can lead to scalability problems

Ontology based Information Systems


Similar to
relational databases


Ontology
¼

schema; instances
¼

data


Some important (
dis
)
advantages

+
(Relatively) easy to maintain and update schema


Both schema and data are “self organising”

+
Query answers reflect both schema and data

+
Able to answer both intensional and extensional queries


Semantics may be counter
-
intuitive or even inappropriate


Open
-
v
-

closed world; axioms
-
v
-

constraints


Query answering (logical entailment) much more difficult


Can lead to scalability problems

Very useful, but no miracles!

Ontologies and Reasoning

Support for Ontology Engineering


Developing and maintaining
quality ontolgies

is very challenging


Users need
tools

and
services
, e.g., to help check if ontology is:


Meaningful



all named classes can have instances

Support for Ontology Engineering


Developing and maintaining
quality ontolgies

is very challenging


Users need
tools

and
services
, e.g., to help check if ontology is:


Meaningful



all named classes can have instances


Correct



captures intuitions of domain experts

Support for Ontology Engineering


Developing and maintaining
quality ontolgies

is very challenging


Users need
tools

and
services
, e.g., to help check if ontology is:


Meaningful



all named classes can have instances


Correct



captures intuitions of domain experts


Minimally redundant



no unintended synonyms



Banana split

Banana sundae

Support for Query Answering


In an
Ontology

based
Information System

(OIS),

Query answering
¼

computing
logical entailment


Reasoner

needed in order to answer queries, e.g.:


C

is a sub
-
class of
D

iff
O

²

8
x

.
C
(
x
)
!

D
(
x
)


a

is an instance of
C

iff
O

²

C
(
a
)



OIS with no reasoner
¼

DBMS with no query engine

Example Applications

e
-
Science


E.g., for “in silico” investigations and “
hypothesis testing



Comparing data (e.g., on proteins) to (model of) biological knowledge


Characteristics of proteins captured in an ontology
O


Goal is to
identify protein instances

based on characteristics

e
-
Science


E.g., for “in silico” investigations and “
hypothesis testing



Comparing data (e.g., on proteins) to (model of) biological knowledge


Characteristics of proteins captured in an ontology
O


Goal is to
identify protein instances

based on characteristics


Equivalent to
answering queries

of form:




O

²

P(i)? for protein P and instance i


Result may be discovery of new kinds of protein


And these may be potential
drug targets

if unique to a pathenogen


Result may also be discovery of errors in model


Which may reflect
gaps/errors in existing knowledge

Healthcare


UK NHS has a
£6.2 billion

“Connecting for Health” IT programme


Key component is
Care Records Service

(CRS)


“Live, interactive patient record service accessible 24/7”


Patient
data distributed

across local centres in 5 regional clusters,
and a national DB


Detailed
records

held by local service providers


Diverse
applications

support radiology, pharmacy, etc


Applications exchange
messages

containing “semantically rich clinical
information”


Summaries

sent to national database


SNOMED
-
CT

ontology provides common
vocabulary

for data


Clinical data uses terms drawn from ontology

SNOMED


Over
400,000 concepts


SNOMED


Over
400,000 concepts



Schema only



no instances


Language used is a (well known)
fragment of OWL


NHS version extended with 1,000s of additional classes


OWL reasoner

(FaCT++) used to classify and check ontology


Currently takes
¼

4 hours


180
missing subClass relationships

were found, e.g.:


Periocular_dermatitis subClassOf Disease_of_face


Fibrin_measurement subClassOf Coagulation_factor_assay

SNOMED


Vocabulary is
extensible

at point of use: “post coordination”


Users (e.g. clinicians) may add/define new vocabulary


Terminology service (reasoner) used to insert in ontology


Typical new term:



almond_allergy

´

“allergy caused_by almond”


OWL reasoner (FaCT++) used to classify new term



Takes <10 ms


Classified as a kind of “
nut allergy



Clearly of
crucial importance

to recognise patients with allergy caused
by almond as kinds of patient with nut allergy

Recent Developments

OWL 1.1


Is an
extension of OWL


Addresses deficiencies identified by users and developers
(at
OWLED workshop
)


Is based on more expressive DL:

SROIQ


(OWL is based on

SHOIN
)


W3C
working group

now chartered


Will develop recommendation based on

existing member submission


Already supported

by popular OWL tools


Protégé, Swoop, TopBraid,

FaCT++, Pellet

What’s New in OWL 1.1?

Four kinds of features:


More expressive logic (
SROIQ
)


qualified cardinality restrictions
(
>
n R.
C
)

and (
6
n R.
C
), e.g:


Person
v

Animal
u

=2 hasPart.
Legs


Car
v
=4 hasComponent.
Wheel


Person
v
6
1
bioParent.
Male



(OWL/
SHOIN
only allows for concepts (
>
n R)

and (
6
n R))



What’s New in OWL 1.1?

Four kinds of features:


More expressive logic (
SROIQ
)


Expressive
role axioms

(
R
), e.g.,
complex role inclusions
:





R1
o … o

Rn
v
S





R1
o … o

Rn
o

S
v
S





S
o
R1
o … o

Rn
v
S



(with some restrictions on cycles)








useful, e.g., for




owns
o

hasPart
v

owns
)


9
owns.Bicycle
v

9
owns.Wheels

partOf
o

locatedIn
v

locatedIn
)

Fracture
u

9
locatedIn.FemurShaft






v

Fracture
u

9
locatedIn.Femur
hasParent
o

hasBrother
v

hasUncle

What’s New in OWL 1.1?

Four kinds of features:


More expressive logic (
SROIQ
)


Expressive
role axioms

(
R
), e.g.,
asymmetry, reflexivity, etc
:


Tra(
R
) (supported by
SHOIN
)


Asy(
R
) e.g., Asy(
properpartOf
), Asy(
hasParent
)


Sym(
R
) (supported by
SHOIN
)


Refl(
R
) e.g., Refl(
knows
)


Irrefl(
R
) e.g., Irrefl(
properPartOf
), Asy(
hasParent
)


Disj(
R S
) e.g., Disj(
hasParent hasSibling)



ObjectExistsSelf(likes) [for narcissists]

What’s New in OWL 1.1?

Four kinds of features:


More expressive datatypes


OWL 1.1 allows for
user
-
defined

datatypes:


over18
´

base(xsd:integer) minInclusive("18"xsd:integer)


Adult
´

Person
u

9

age.over18



and
n
-
ary datatype predicates
:


Spendthrift
´

9

spends,earns.>



BUT
, still cannot:


define complex relationships between data properties on different
individuals, e.g., Women who earn more than their husbands.


declare a datatype property as inverse
-
functional (keys).

What’s New in OWL 1.1?

Four kinds of features:


Metamodelling and annotations


Names can be used as any or all of an individual, a class, or a
property


Allows for a restricted form of metamodelling (“punning”), e.g.:


subClassOf(
SnowLeopard

BigCat)


ClassAssertion(SnowLeopard EndangeredSpecies)


Annotations of axioms as well as entities


ClassAssertion(Comment(“source: WWF”) SnowLeopard
EndangeredSpecies)

What’s New in OWL 1.1?

Four kinds of features:


Syntactic sugar

(make things easier to say)


Disjoint unions, e.g.:


DisjointUnion(Element Earth Wind Fire Water)


Negative assertions, e.g.:


NegativeObjectPropertyAssertion(Ian hasChild Mary)


NegativeDataPropertyAssertion(Ian hasAge 21)

Tractable Fragments


OWL defines only one fragment (OWL Lite)


And it isn’t very tractable!


OWL 1.1 defines several different

fragments with
useful

computational properties


E.g., reasoning complexity in

range LOGSPACE to PTIME


Smaller fragments

implementable using

RDBs

Tractable Fragments


Tools and Methodologies


OWL 1.1 support already added to several tools:


Protégé, Swoop, TopBraid Composer, FaCT++, Pellet


New features available (soon) in OWL tools:


Diagnosis

and semi
-
automatic repair of errors


Support for
integration and modular design


Incremental

classification (addition and retraction)


Support for
bottom up design

Diagnosis


Editing tools use reasoner to identify inconsistent classes


May not be very useful without some explanation facility

Modularity in Ontology Engineering

Benefits of a modular ontology design: to simplify


ontology
refinement
/update

modifying a module should not lead to modifications in parts of
the ontology that are not conceptually related


understanding

relationships between different modules in an ontology
controlled and well
-
understood


integration

with other ontologies

no unexpected consequences


partial
reuse

reuse only the relevant part/module of an ontology

Tool Support for Modular Design


Check when integration of modules is “safe”


Interface between modules via
exported

vocabulary


Information flows
from

imported
to

importing ontology


No information flows back the other way


Formalised using
conservative extensions


What is the effect of merging
O
2

into
O
1
?


In general, check that
O
1

[

O
2

²

C

iff
O
1

²

C

for any concept
C

constructed using vocabulary occurring in
O
1



[Cuenca Grau & Kazakov, IJCAI
-
07 and WWW
-
07]

Tool Support for Modular Design


Extract smaller modules from large ontologies


E.g., starting with FMA, extract module for “Heart”


Tool should ensure that module


Is as
small

as possible, but


Still contains
all

relevant knowledge


More formally:


Extract a (small) module from
O

capturing all “relevant”
information about some vocabulary
V


In general, find
O

µ

O

s.t.
O

²

C

iff
O

²

C

for any concept
C

constructed using terms from
V

Incremental Reasoning


Modules can also be used to support incremental
addition and retraction of axioms, e.g:


When retracting C
v

D, reclassify only concepts whose
module includes this axiom


Typically this is only a very small subset of all concepts



Prototype now implemented in Swoop editor

Tool Support for Bottom
-
up Design


Bottom
-
up design


Find a (small and specific) concept describing a set of
individuals


In general, find most specific
C

s.t.
O

²

C
(
i
1
)
Æ


Æ

C
(
i
n
)


Where
C

may be “small” and/or in a sub
-
language (of
O
)



Prototype: SONIC system [Turhan et al]

Extending Expressive Power


Database style keys

[Lutz et al, JAIR 2004]


E.g., make + model + chassis
-
number is a key for Vehicles


Rule language extensions


W3C RIF WG (see
http://www.w3.org/2005/rules/)


First order extensions (e.g., SWRL)
[Horrocks et al, JWS, 2005]



Hybrid language extensions, e.g.,
[Eiter et al, KR
-
04; Motik et al, ISWC
-
04; Rosati,
JoWS, 2005]



LP/F
-
Logic/Common Logic
[Chen et al, JLP, 1993; de Bruijn et al, WWW
-
05]



Other extensions


Temporal


Fuzzy


Extended annotation framework


Macro language




Improving Scalability


Optimisation techniques


Improve performance of DL reasoners, e.g.,
[Tsarkov et al, JAR, ]


New Reasoning Techniques


Reduction to disjunctive Datalog

[Motik et at, KR
-
04]


Transform
SHOIN

ontology to Datalog
Ç

rules


Use LP techniques to deal with large numbers of ground facts


Hybrid DL
-
DB systems

[Horrocks et al, CADE
-
05]


Use DB to store “Abox” (individual) axioms


Cache inferences and use DB queries to answer/scope logical queries


Hypertableau based algorithms

[Motik et al, CADE
-
07]


Prototypical implementation in HermiT system


Polynomial time algorithms

for sub
-
ALC

logics


Graph based techniques for EL+
[Baader et al, IJCAI
-
05]


Database techniques for DL
-
Lite

[Calvanese et al, AAAI
-
05]

Developing Tools and Infrastructure


Editors/environments


Oiled, Protégé, Swoop, TopBraid, Ontotrack, …

Developing Tools and Infrastructure


Editors/environments


Oiled, Protégé, Swoop, TopBraid, Ontotrack, …


Reasoning systems


Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …

Pellet

KAON2

CEL

Developing Tools and Infrastructure


Editors/environments


Oiled, Protégé, Swoop, TopBraid, Ontotrack, …


Reasoning systems


Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …


Design methodologies


Foundational ontologies, etc.

Entity

Substantial

Quality

Event

Achievement

Stative

Accomplishment

Perdurant

Endurant

Summary


Semantic Web

aims to make web content more
accessible to automated processes


Adds semantic annotations to web resources


OWL Ontologies

provide vocabulary for annotations


Terms have well defined meaning


OWL

now being used in a wide range of applications


e
-
Science, medicine, geography, geology, …


Reasoning
enabled tools are of crucial importance


For both design and deployment of ontologies


Active research area


Expressive power, scalability, methodologies, tools, …

Thank you for listening

Thank you for listening

Any questions?

FRAZZ:
©
Jeff Mallett/Dist. by United Feature Syndicate, Inc.