Next Generation Semantic Web Applications

snufflevoicelessInternet and Web Development

Oct 22, 2013 (3 years and 9 months ago)

185 views

Next Generation Semantic Web Applications

Prof. Enrico Motta

Director, Knowledge Media Institute

The Open University

Milton Keynes, UK

Structure of the Talk


Quick Recap: What is the Semantic Web?


State of the art: 1st Generation SW Applications


Emphasis on ontology
-
driven data aggregation


Limited with respect to their ability to exploit large
scale, heterogeneous semantic markup


Key research issues


What needs to be done to enable the effective
development of the next generation of SW Applications


Need for a different approach to some key res. areas


How the SW itself can be exploited to address such key
research issues

Quick Recap: What is the Semantic Web?

The Semantic Web

A large scale, heterogenous collection
of formal, machine processable,
ontology
-
based statements (semantic
metadata) about web resources and
other entities in the world, expressed
in a XML
-
based syntax

Ontology

Metadata

UoD

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

Person

Organization

String

Organization
-
Unit

partOf

hasAffiliation

worksInOrgUnit

hasJobTitle

<akt:Person rdf:about="akt:EnricoMotta">


<rdfs:label>Enrico Motta</rdfs:label>


<akt:hasAffiliation rdf:resource="akt:TheOpenUniversity"/>


<akt:hasJobTitle>kmi director</akt:hasJobTitle>


<akt:worksInOrgUnit rdf:resource="akt:KnowledgeMediaInstitute"/>


<akt:hasGivenName>enrico</akt:hasGivenName>


<akt:hasFamilyName>motta</akt:hasFamilyName>


<akt:worksInProject rdf:resource="akt:Neon"/>


<akt:worksInProject rdf:resource="akt:X
-
Media"/>


<akt:hasPrettyName>Enrico Motta</akt:hasPrettyName>


<akt:hasPostalAddress rdf:resource="akt:KmiPostalAddress"/>


<akt:hasEmailAddress>e.motta@open.ac.uk</akt:hasEmailAddress>


<akt:hasHomePage


rdf:resource="http://kmi.open.ac.uk/people/motta/"/>

</akt:Person>

SW = A Conceptual Layer
over the web

SW is Heterogeneous!

Generating semantic markup

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

<RDF triple>

Key aspects of the SW


Size (= Huge)


Sem. markup (eventually to reach) the same order of magnitude
as the web


Conceptual Heterogeneity (= Big)


Sem. markup based on many different ontologies


Rate of change (= Very High)


Data generated all the time from human and artificial agents…


Provenance (= Very Heterogeneous)


….Hence provenance itself is extremely heterogeneous


Trust (= very variable and subjective)


A side
-
effect of heterogeneous provenance


Data Quality (= very variable)


No guarantee of correctness


Intelligence (= by
-
product of size and heterogeneity)


Rather than a by
-
product of sophisticated problem solving

Compare with traditional KBS


Size (= Small or Medium)


KBS normally small to medium size


Conceptual Heterogeneity (= Not an issue)


KBS normally based on a single conceptual model


Rate of change (= Very Low)


Change rate under developers' control (hence, low)


Provenance (= Not an issue)


KBS are normally created ad hoc for an application by a
centralised team of developers


Trust (= not a major issue)


Centralisation of devpt. process implies no significant trust issues


Data Quality (= not a major issue)


Again, centralisation guarantees data quality across the board


Intelligence (= by
-
product of complex, task
-
centric reasoning)


E.g., sophisticated diagnostic, planning systems…

The Semantic Web today

1st Generation SW Applications

CS Dept Data

AKT Reference Ontology

RDF Data

Bibliographic Data


Typically use a single ontology


Usually providing a homogeneous view over
heterogeneous data sources Limited use of existing SW
data


Closed to semantic resources


Limited interactivity


In contrast with typical web 2.0 applications

Features of 1st generation
SW Applications

Hence: current SW applications are far more similar
to traditional KBS (closed semantic systems) than to
'real' SW applications (open semantic systems)

1895

2006

It is still early days..

Next Generation SW Applications

Next generation SW
applications

NG SW Application


Able to exploit the SW at large


Hence: Multi
-
Ontology


Supporting interactivity


E.g., allowing users to add semantic data


Hence, open with respect to SW resources


Ideally also able to exploit non
-
SW data


E.g., folksonomies


Hence, embedding powerful information
extraction engines

Two systems we have built

Magpie

AquaLog

Magpie Components

Enriched

Web Page

Semantic Log


(found
-
item 3275578832
localhost
#u"http://localhost/peopl
e/motta/" john
-
domingue
john
-
domingue)

(found
-
item 3275578832
localhost

Jabber Server

Magpie

Hub

Ontology cache

(
Lexicon)

Problem Domain & Resources

Ontology based

Proxy Server

Web Page

AquaLog: Ontology
-
Driven
Question Answering

Which is the

capital of Spain?

NL SENTENCE

INPUT


QUERY

TRIPLES

ANSWER


(?, capital, Spain)

Linguistic Analysis

Mapping Engine

RESULT

TRIPLES

NL Generation

Madrid

<Spain, has
-
capital
-
city, Madrid>

Need for mechanisms for automatically

identifying semantic markup relevant to

the current page, user, browsing session,

etc..

PowerMagpie: Semantic
browsing on the 'open' SW

Need for mechanisms for automatically

locating ontologies relevant to the current

query, map user terminology to ontologies,

integrate info from different ontologies, etc..

PowerAqua: QA on the 'open'
semantic web

What needs to be done to facilitate the
development of such 2nd generation SW
applications?

Dynamic Ontology Selection


First: powerful support for ontology selection


Both PowerAqua and PowerMagpie heavily rely
on ontology selection to locate possibly
relevant knowledge in response to


User queries (PowerAqua)


Accessing web pages (PowerMagpie)


Hence, ontology selection is a crucial task for
both systems

Current support for ontology
selection

Limitations of Swoogle


Query/Search


Only keyword search, we need more powerful query methods
(e.g., ability to pose formal queries)


Repository structure


Very weak in Swoogle, not even duplicates are dealt with


Need for automatic derivation of relations between ontologies


E.g., same
-
ontology
-
as, ontology
-
extends, ontology
-
incompatible
-
with, etc…..


We need these relations to structure the repository and to
support more powerful ranking methods (see next bp)


Ontology ranking


Swoogle only uses a 'popularity
-
based' one, we need other
methods as well

We also need:


Methods for fast extraction of ontology
modules


Typically we only want the part of the ontology
relevant to our current needs


Methods for the integration of information
derived from different ontologies


In the context of QA this problem typically reduces
to that of deciding whether two instances denote the
same entity



Even more importantly..


Need to look at a number of key research
issues in the context provided by NG
-
SW
applications


Example: Ontology Mapping


Current work focuses on design
-
time mapping of
complete ontologies


Example: Ontology Selection


Current work focuses on user
-
mediated ontology
selection


Example: Ontology Modularization


Current work by and large assumes that the user is in
the loop

A new application scenario



NG
-
SW applications require algorithms able to
perform tasks such as selecting, modularizing,
and mapping ontologies at run time


Moreover, in such a context, mapping is
concerned with mapping ontology fragments,
rather than complete ontologies

So What?


Time to go beyond 1st generation applications


2nd generation SW applications will exploit
much more fully the large scale semantic
markup provided by the SW


Many issues to be addressed:


Better ontology crawling, indexing, retrieving and
ranking support


Mapping, selection, and modularization methods
appropriate for NG
-
SW applications


Further acceleration needed in the generation of
semantic markup

Exploiting the SW itself to
tackle its heterogeneity


Interestingly, a NG
-
SW
-
based approach can
also be used also to tackle key SW tasks, such
as Ontology Mapping


Based on the use of the SW itself as background
knowledge


Exploiting Large
-
Scale Semantics

Case Study:


Using the Semantic Web as background
knowledge in Ontology Mapping

Ontology Mapping: State of
the Art


State
-
of
-
the
-
art methods rely on a
combination of:


Label similarity methods


e.g., Full_Professor = FullProfessor


Structure similarity methods


Using taxonomic information or information about
domain and range of associated properties


However, as pointed out by Aleksovski et al
(EKAW, 2006):


In many cases there is no sufficient lexical overlap


In many cases source and target ontology have not
sufficient structure to allow effective structure
-
based
mapping

Use of bkg. knowledge for
ontology mapping

A

B

?

Background Knowledge

External Source = One Ontology

Alekszovski et al. EKAW’06



Map candidate terms into concepts from a richly axiomatized domain
ontology (anchors)



Derive a mapping based on the relation of the anchor terms

A

B

B’

A’

=

=

rel

rel

Advantages:




Handles dissimilar ontologies



Returns semantic mappings

Disadvantages:



Assumes that a suitable domain
ontology is available.


Approach only suitable for closed
domains

External Source = Web

van Hage et al. ISWC’05



rely on Google and an online dictionary in the food domain to extract
semantic relations between candidate mappings using IR techniques

A

B

rel

+ OnlineDictionary

IR Methods

Advantages:



General purpose

Disadvantages:



IR Methods introduce noise

External Source = WordNet

Lopez

et al. ESWC ’05


use wordnet to map queries expressed in the user's
terminology to a domain ontology to support question
answering

A

B

rel

WordNet

Advantages:



General purpose

Disadvantages:



Knowledge sparseness


Works best with concepts, not
so useful with relations


WordNet is not an ontology!!!

Knowledge
-
poor ontology
mapping


Actually isn’t a bit strange that such complex
and knowledge
-
poor methods are devised,
when the SW
already

provides so much
background knowledge?….

Proposal:



rely on online ontologies (Semantic Web) to derive mappings



ontologies are dynamically discovered and combined

A

B

rel

Advantages:



General purpose


Does not introduce noise


Works with any kind of domain
entities (concepts, relations,
instances)

Semantic Web

External Source = SW

Strategy 1
-

Definition

Find ontologies that contain equivalent classes to A and B and use their
relationship in the ontologies to derive the mapping.

A

B

rel

Semantic Web

A
1


B
1


A
2


B
2


A
n


B
n


O
1

O
2

O
n

For each ontology use these rules:



These rules can be extended to take into
account indirect relations between A’ and
B’, e.g., between parents of A’ and B’:

Strategy 1
-

Variants

A

B

Quick variant:

Stop as soon as a relation is found

Semantic Web

A
1


B
1


O
1

Strategy 1
-

Variants

Precise variant:

Derive all possible mappings from all ontologies
and combine them into a final mapping.

A

B

Semantic Web

A
1


B
1


O
1

A
2


B
2


O
2

Dealing with Contradictions:


Return all mappings even if contradictory


Return a mapping only when there is no
contradiction


Return the most frequent mapping (i.e., the
mapping derived from most ontologies)


Return the mappings with 'higher authority'
(based on metrics of ontology evaluation or
trust)


Try to combine mappings

Strategy 1
-

Examples

Beef

Food

Semantic Web

Beef

RedMeat

Tap

Food

MeatOrPoultry

SR
-
16

FAO_Agrovoc

ka2.rdf

Researcher

AcademicStaff

Semantic Web

Researcher

AcademicStaff

ISWC

SWRC

Strategy 2
-

Definition

Principle:

If no ontologies are found that contain the two terms then
combine information from multiple ontologies to find a mapping.

A

B

rel

Semantic Web

A’

B

C

C’

B’

rel

rel

Details:



(1) Select all ontologies containing A’ equiv. with A


(2) For each ontology containing A’:



(a) if find relation between C and B.


(b) if find relation between C and B.

Strategy 2
-

Examples

Vs.

(midlevel
-
onto)

(Tap)

Ex1:

Vs.

Ex2:

(r1)

(pizza
-
to
-
go)

(SUMO)

(Same results for Duck, Goose, Turkey)

(r1)

Vs.

Ex3:

(pizza
-
to
-
go)

(wine.owl)

(r3)

Conclusions


Using the SW as background knowledge for
ontology mapping has several benefits


Suitable for our NG
-
SW scenario as there is no need
for design
-
time selection of a background knowledge


Even when design
-
time selection is feasible, it is
suitable for those cases where a suitable domain
ontology cannot be found


Reduces noise by exploiting only ontologies


Can be tailored to handle multiple solutions


Can be integrated with other approaches, based on
lexical and structural analysis

If you would like to find out
more..


'Vision' papers


Motta, E., Sabou, M. (2006).
"Next Generation Semantic
Web Applications"
.
1st Asian Semantic Web Conference,
Beijing.


Motta, E., Sabou, M. (2006).
"Language Technologies and
the Evolution of the Semantic Web"
.
LREC 2006
, Genoa,
Italy.


Motta, E. (2006).
"Knowledge Publishing and Access on
the Semantic Web: A Socio
-
Technological Analysis"
.
IEEE Intelligent Systems, Vol.21, 3
, (88
-
90).


Ontology Modularization


D' Aquin, M., Sabou, M., Motta, E. (2006).
"Modularization:
A key for the dynamic selection of relevant knowledge
components
"
.
ISWC 2006 Workshop on Ontology
Modularization

If you would like to find out
more..


Ontology Mapping


Lopez, V., Sabou, M., Motta, E. (2006).
"Mapping the
real semantic web on the fly"
.
International
Semantic Web Conference
, Georgia, Atlanta.


Sabou, M., D'Aquin, M., Motta, E. (2006).
"Using the
semantic web as background knowledge for
ontology mapping"
.
ISWC 2006 Workshop on
Ontology Mapping.


Ontology Selection


Sabou, M., Lopez, V., Motta, E. (2006).
"
Ontology
Selection for the Real Semantic Web: How to Cover
the Queen’s Birthday Dinner?"
.
Proceedings of EKAW
2006
, Podebrady, Czech Republic.