Recent European Developments in the Semantic Web

snufflevoicelessInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 4 χρόνια και 22 μέρες)

96 εμφανίσεις

Recent European Developments

in the Semantic Web

Dr. Mark Greaves

Vulcan Inc.

markg@vulcan.com

(206) 342
-
2276

OIC
-
2007

Ontology for the Intelligence Community

2

Roots of this Talk: Agile Computing meets KR&R


DARPA PM: 5/2001 to 5/2004


A vision of
agile computing



a robust
distributed infrastructure for dynamic
reliable computing


Oriented towards responsiveness rather
than prespecified optimality


Provides “illity” and QoS arguments


Supports adaptive, survivable workflows


Leverages local rules over global ones


Is a step beyond interoperability


Agile computing requires goals, plans,
and other semantic/intentional notions



Currently with Vulcan (Seattle, WA)


Vulcan (
www.vulcan.com
) is the
corporate vehicle through which Paul
Allen manages his assets


Areas include music, movies, sports
teams, aerospace, philanthropy, personal
tech, energy and greentech, cable TV,
venture capital, genetic research, AI…


I am responsible for the AI/KR&R
research portfolio, including Project Halo
and KR technology for Vulcan Ventures



Work with McDonald
-
Bradley on IC
-
related matters


Data


Process

Represen
-
tation


DARPA Agent
Markup
Language
(DAML)


UltraLog


Semantic
Enabling and
Exploitation
(SEE)


Network
-
Centric
Logistics (NCL)


Fog Light


OMNI


Description
Logics


KR&R
Languages


Rule
Systems


Web
Languages


Ontologies,
Taxonomies



Software
Agents


Semantic Web
Services


Peer
-
to
-
Peer
Algorithms


Distributed
Planning


Complex
Adaptive
Systems and
Control


Situation
Theory


Task
Planning
Systems


Problem
Isomorph
-
ism


Semantic
Web
Services

Programs and

Seedlings

Core Technologies

3

At the End of the 90s: Traditional KR and the Google Property


We seek KR systems that have the “Google Property:”

they get (much) better as they get bigger


Google PageRank™ yields better relevance judgments as it
indexes more pages


Current KR&R systems have the antithesis of this property



So what are the components of a scalable KR&R system?


Distributed, robust, reliable infrastructure


Multiple linked ontologies and points of view


Single ontologies are feasible only at the program/agency level


Mixture of deep and shallow knowledge repositories


Simulations and procedural knowledge components


“Knowing how” and “knowing that”


Embrace uncertainty, defaults, context, and nonmonotonicity in all
components


Uncertainty in the KB


you don’t know what you know, things go
away, contradiction is rampant, computing must be resource
-
aware, surveying the KB is not possible


KR&R System Scale

(Number of Assertions

Number of Ontologies/POVs

Number of Rules

Linkages to other KBs

Reasoning Engine Types …)

Quality of Answers

Ideal KR&R

KR&R now

KR&R Goals

Scalable KR&R Systems should look just like the Web!!

(coupled with great question
-
answering technology)

4

The Beginnings of the SemWeb: DARPA’s DAML Program



Solution:



Augment the web to link machine
-
readable knowledge to web pages

Extend RDF with Description Logic

Use a frame
-
based language design

Create the first fully distributed web
-
scale
knowledge base out of networks of
hyperlinked facts and data

Approach:



Design a family of new web languages

Basic knowledge representation (OWL)

Reasoning (SWRL, OWL/P, OWL/T)

Process representation (OWL/S)


Build definition and markup tools



Link new knowledge to existing web
page elements


Test design approach in the IC and
others


Standardize the new web languages

People use implicit knowledge to
reason with web pages

Computers require explicit
knowledge to reason with web pages

Links via URLs

Problem:



Computers cannot process most of the information stored on web pages

5


Technical Problem


Representation of ontological (type
-
class
-
relation) metadata coupled to web data


Agent
-
based data integration and tractable reasoning across multiple www servers



Early semantic web pilots with various members of the IC


SWIG coordination and data sharing group within the IC

DAML Operational Problem

XML

XML

HTML

HTML

HTML

OWL

SWRL

OWL

Facility

Ontology

Threat

Ontology

Geo
-
Spatial

Ontology

SWRL

Inference

Sensor

Ontology

OWL/Trust

World Wide Web

No Automatic Semantic Integration of (Intelligence)

Data Sources on the Web

Existing Web:

Hand
-
coded

HTML and XML

pages

Semantic Web:

Knowledge

Integration

Layer

6

DAML Program Elements


Web Ontology Language (OWL) (2/10/04)


Enables knowledge representation and
tractable inference in a web standard format


Based on Description Logics and RDF



OWL Reasoning Languages


SWRL and SWRL
-
FOL: Supports business
rules, policies, and linking between distinct
OWL ontologies


OWL/P Proof Language: Allows software
components to exchange chains of reasoning


OWL/T Trust Language: Represents trust that
OWL and SWRL inferences are valid



Semantic Web Services (OWL/S)


Allows discovery, matching, and execution of
web services based on action descriptions


Unifies semantic data models (OWL) with
process models (Agent) and shows how to
dynamically compose web services



OWL Tools


www.semwebcentral.org

and
www.daml.org


Completed standards process

Started standards process

Unfinished

SWRL: Rules

OWL/P: Proof

OWL/S:

Semantic Web
Services

Web Ontology

Language (OWL)

OWL/T:

Trust

DAML Program Technical Flow

Each DAML Program Element includes

specifications, software tools,

coordination teams, and use cases











7

Impact

#2

#3

Google “darpa”

on 10/21/04

8

The Semantic Web in 2007

Cutting

Edge

Mature

Still

Research

“The Famous Semantic Web Technology Stack”

9

The Semantic Web in 2007

Commercial

Cutting

Edge

Mature

Active Research

and Standards

Activity

“The Famous Semantic Web Technology Stack”

10

Commercial

Cutting

Edge

The Semantic Web in 2007

Mature

Active Research

and Standards

Activity

“The Famous Semantic Web Technology Stack”

11

The Semantic Web in 2007

Mature

Active Research

and Standards

Activity

Commercial

Cutting

Edge

“The Famous Semantic Web Technology Stack”

12

Completing the Semantic Web Picture

Mature

Other Technologies Impact the Semantic Web

More Ontologies

Tag Systems

Microformats

Social Authorship

Combined
RDF/OWL and
RDBMS Systems

Better
Reasoning
Systems

A Huge Base of
RDF data

Active Research

and Standards

Activity

Commercial

Cutting

Edge

13

Where is the Current US Semantic Web Action?


Some Venture Capital


Vulcan, Crosslink, In
-
Q
-
Tel



A modest amount of Federal funding



Interesting corporate developments


Startup: Radar, Metaweb, Evri...


Mature: Yahoo!, Oracle, Lilly...



Focus is mostly
Database

dimension of Semweb


RDBMS scale and orientation, powerful analytics (= powerful logics and
inference engines)


Centralized workflows for ontology definition and management


Use cases surrounding data integration


Emerging microformats and structured blogging (e.g., Twine)


... But mainly enterprise concerns

14

Where is the Current European Semantic Web Action?


Follow the money


Currently >

50M/year public funding from the

European Commission (Mark’s estimate)


Framework 6 (2002
-
6)


17 separate semantics IT programs


Framework 7 (2007
-
13)



1B/year for information and

communications technologies



Two Dedicated Multi
-
site R&D Institutes


Semantic Technology Institute International


DERI: 100+ people, major sites in Galway, Innsbruck, Korea



Focus is the
Social

and
Web

Dimensions of Semweb


Web
-
scale, social networks, simple scalable imperfect inference


Ontology and data dynamism, imperfections, versioning


Semantically
-
boosted collaboration with limited knowledge engineer involvement


A base of socially
-
curated semantic data


Explicit European vs. US competitiveness theme

15

Talk Outline: European Work Beyond RDF and OWL


Web
-
Scale Semantics


Semantic MediaWiki


DBpedia and Linking Open Data


Networked Ontologies (NeOn)



Web
-
scale Inference


Shallow reasoning and the Large
Knowledge Collider

Social and Web Dimensions of Semantic Web

16

Semantic Wikis


The Main Idea


Wikis are tools for
Publication

and
Consensus



MediaWiki (software for Wikipedia, Wikimedia, Wikinews, Wikibooks, etc.)


Most successful Wiki software


High performance: 10K pages/sec served, scalability demonstrated


LAMP web server architecture, GPL license


Publication: simple distributed authoring model


Wikipedia: >2M articles, >180M edits, 750K media files, #8 most popular web site in October


Consensus achieved by global editing and rollback


Fixpoint hypothesis (2:1 discussion/content ratio), consensus is not static


Gardener/admin role for contentious cases



Semantic Wikis apply the wiki idea to basic (typically RDFS) structured information


Authoring includes instances, data types, vocabularies, classes


Natural language text for explanations


Automatic list generation from structured data, basic analytics


Searching replaces category proliferation


Reuse of wiki knowledge

Semantic Wiki Hypotheses:

(1) Significant interesting non
-
RDBMS Semantic Data can be collected cheaply

(2) Wiki mechanisms can be used to maintain consensus on vocabularies and classes

17

Semantic MediaWiki


Knowledge Authoring Capabilities (SMW 1.0 plus Halo Extension)


Syntax highlighting when editing a page


Semantic toolbar in edit mode


Displays annotations present on the page that is edited


Allows changing annotation values without locating the annotation in the wiki text


Autocompletion for all instances, properties, categories and templates


Increased expressivity through n
-
ary relations (available with the SMW 1.0 release)

18

Semantic MediaWiki


Semantic Navigation Capabilities (SMW 1.0 plus Halo Extension)


GUI
-
based ontology browser, enables browsing of the wiki's taxonomy and lookup of
instance and property information


Linklist in edit mode, enables quick access of pages that are within the context of the
page being currently edited


Search input field with autocompletion, to prevent typing errors and give a fast
overview of relevant content

19

Semantic MediaWiki


Knowledge Retrieval Capabilities (SMW 1.0 plus Halo Extension)


Combined text
-
based and semantic search


Basic reasoning in ask queries with sub
-
/super
-
category/
-
property reasoning and
resolution of redirects (equality reasoning)


GUI
-
based query formulation interface for intuitive assembly and output generation of
ASK queries (no SQL/MQL/SPARQL)



Fully open source under GPL


Extensive formal user testing


Download at:
http://www.ontoworld.org/wiki/Halo_Extension


20

Cool Stuff... But Does it Work?


User tests were performed in Chemistry


20 graduate students were each paid for 20 hours (over 1 month) to collaborate on
semantic annotation for chemistry


~700 Wikipedia base articles


US high
-
school AP exams were provided as content guidance



Results


Sparse: 1164 pages (entites), average 5 assertions per entity


226 Relations (1123 relation
-
statements) and 281 attributes (4721 attribute
-
statements)


Many bizarre attributes and relations


Very difficult to use with a reasoner



Ongoing Vulcan
-
sponsored work on Semantic MediaWiki


Higher
-
quality authoring: Phase II Halo Wiki extensions done by February


Higher
-
quality editing: support for Semantic Gardeners (RKF lesson learned)



Very little US
-
based awareness of these issues, let alone their solutions

Semantic Wiki Hypotheses:

(1) Significant interesting non
-
RDBMS Semantic Data can be collected cheaply

(2) Wiki mechanisms can be used to maintain consensus on vocabularies and classes

21

DBpedia: Populating the Semantic Web


Mine Wikipedia for assertions


Scrape Wikipedia Factboxes


~15M triples


High
-
confidence shallow English parsing



DBpedia dataset


~2M things, ~100M triples


Classifications via Wikipedia categories
and WordNet synsets


One of the largest broad knowledge bases
in the world



Simple queries over extracted data


Public SPARQL endpoint


“Sitcoms set in NYC”


“Soccer players from team with stadium
with >40000 seats, who were born in a
country with more than 10M inhabitants”



We created a Semantic MediaWiki
instance augmented by DBpedia data

22

Linking Open Data


W3C Project primarily
carried out in Europe



Goals


Create a single, simple
access mechanism for
web RDF data


Build a data commons
by making open data
sources available on the
Web as RDF


Set RDF links between
data items from different
data sources



Total dataset


~2B triples, and ~3B
RDF links


Growing all the time

23

Networked Ontology Project (NeOn)


Ever try to use 3
-
4 networked ontologies?


Location and characterization of ontology resources


Version control under multiple revisions


SOA and mapping management


Lifecycle issues



NeOn is an EC Framework 6 Program (2006
-
2009)


~

15M, 14 partners including UN FAO, pharmaceutical distribution


Goals:


To create the first ever service
-
oriented, open infrastructure, and associated
methodology


To support the overall development life
-
cycle of a new generation of large scale,
complex, semantic applications


To handle multiple networked ontologies in a particular context, which are highly
dynamic and constantly evolving.



Outputs: The open source (GPL) NeOn toolkit:
http://www.neon
-
toolkit.org/

24

Talk Outline: European Work Beyond RDF and OWL


Web
-
Scale Semantics


Semantic MediaWiki


DBpedia and Linking Open Data


Networked Ontologies (NeOn)



Web
-
scale Inference


Shallow reasoning and the Large
Knowledge Collider

Social and Web Dimensions of Semantic Web

25

The Larger KR Environment: The Evolving Web


The Intelligent

Web


Smart

Browsing

Blogosphere

Search

Evolution

Analytics/

Software agents/

Reasoning

Inference/

Intelligent

discovery

Semantic

Interconnect

Radar Networks

LinkedIn


RSS feeds


Furl


Pluck


Onfolio


UCMore


A9 (search history)

Datalator

PAL/CALO

Halo

Blog proliferation

Meme propagation

Intelligent filtering

Trust networks

Wikipedia

Intelligent data aggregation

Personalized search

Email/desktop searchEnterprise
intelligence

Context extraction

Data integration

Ontoprise


Schemalogic

A9 (discovery)

Blinkx


Google sets

Social networks

Web
-
Scale Reasoning: Scalable, Tolerant, and Dynamic

26

Semantics at Web Scale


Semantics are always changing


Per minute, there are:


100 edits in Wikipedia


200 tags in del.icio.us


270 image uploads to flickr


1100 blog entries


Will the Semantic Web be less dynamic?



There is no “right ontology”


Ontologies are abstractions


Different applications lead to different ontologies


Ontology authors make design choices all the time


Google Base: >100K schemas



Intentionally false material (Spam)


Lesson of the HTML <META> tag.

Material from Denny Vrandečić, AIFB

27

Consequences


Semantic Technologies at Web Scale?


Sindice (
www.sindice.com
) is now reporting over 3B triples


20% of 30 billion pages @ 1000 triples per page =
6 trillion triples


30 billion and 1000 are underestimates, imagine in 6 years from now…



Classical reasoning approaches to Semantic Web will not scale


Examples of current attempts at scaleability


Identify subsets of OWL (OWL
-
DL, OWL
-
DLP)


“Reducing the expressive power of a logic does not solve any problems faster; its only
effect is to make some problems impossible to state.”


John Sowa


Identify alternative semantics for OWL


e.g. LP
-
style semantics


Scalability by muscle
-
power



What do we know about distributed, heuristic, approximate, probabilistic
inference?


What do we know about average complexity over web
-
authored data?


Classical (worst case) complexity is a poor guide for usefulness

Gartner (May 2007, G00148725):

"By 2012, 70% of public Web pages will have some level of semantic markup,

20% will use more extensive Semantic Web
-
based ontologies”

Material from Frank van Harmelen,

Vrije Universiteit Amsterdam

28

More Consequences


Sloppy Ontologies and the need for sloppy reasoning


OWL has no support for “almost”, “yes, except for a few”, etc.


This was OK, as long as ontologies were well
-
designed, carefully populated, well
maintained over definite problem spaces


Increasingly, ontologies are


Made by non
-
experts


Made by automatic scraping from file directories, mail folders, todo lists and contact lists


Made by machine learning from examples


Example: “post
-
doc” ≈ “young
-
researcher”



Need for any time, any cost answers


Current inference systems are abrupt and

expensive


Want to select quality and timeliness



Completeness will be unachievable in

practice


Data sources will be partial


Insufficient time to wait for an answer


(Courtesy Dieter Fensel)

29

The Large Knowledge Collider (LarKC)


EC Framework 7 Program



Goals of LarKC


Scaling to infinity


Give up soundness & completeness


Combine reasoning/retrieval and search


Heavy emphasis on probability, decision
theory, anytime algorithms



Reasoning pipeline


Plugin architecture, with sampling


Explicit cost models


Public releases of LarKC
platform



Public APIs enabling
others to develop plug
-
ins



Encourage participation
through Thinking@home


Kind of like SETI@Home



Start in April 2008

30

Thank You