Knowledge Streams: Stream Processing of Semantic Web Content

radiographerfictionΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

88 εμφανίσεις

Knowledge Streams: Stream Processing
of Semantic Web Content

Mike Dean

Principal Engineer

Raytheon BBN Technologies

mdean@bbn.com

1

Assumptions


Technology


Intermediate


Familiarity with RDF and OWL


Interest in


Stream processing


Scalability

2

Presenter Background



Principal Engineer at Raytheon BBN Technologies (1984
-
present)


Principal Investigator for DARPA Agent Markup Language (DAML)
Integration and Transition (2000
-
2005)


Chaired the Joint US/EU Committee that developed DAML+OIL and SWRL


Developer and/or Principal Investigator for many Semantic Web tools,
datasets, and applications (2000
-
present)


Member of the W3C RDF Core, Web Ontology, and Rule Interchange
Format Working Groups


Co
-
editor of the W3C OWL Reference


Local co
-
chair for ISWC2009


Other SemTech presentations


Semantic Query: Solving the Needs of a Net
-
Centric Data Sharing Environment
(2007, w/ Matt Fisher)


Semantic Queries and Mediation in a RESTful Architecture (2008, w/ John Gilman and
Matt Fisher)


Use of SWRL for Ontology Translation (2008)


Semantic Web @ BBN: Application to the Digital Whitewater Challenge (2009, w/ John
Hebeler)


How is the Semantic Web Being Used? An Analysis of the Billion Triples Challenge
Corpus (2009)


Finding a Good Ontology: The Open Ontology Repository Initiative (2010, w/ Peter
Yim and Todd Schneider)



3

Outline


Motivation


Vision


Building Blocks


Demonstration


4

Motivations


Timeliness


Performance

5

Timeliness


Streaming minimizes latency


Processing elements see events as they occur


Resources are expended only when an event occurs


This is in contrast to polling


Latency averages half the polling interval


Resources are expended on every poll


Popular web syndication mechanisms such as RSS
and Atom involve polling



6

Performance


Many Semantic Web tools provide streaming
parsers rather than, or in addition to, model
access


Analogous to XML SAX vs. DOM


For suitable applications, this can be 10x
faster than loading all statements into
memory or a KB


7

2 Streaming Stories


dumpont of OpenCyc (circa 2003)


HTML
-
based ontology visualization tool periodically
bogged down daml.org server


Reimplementation using event
-
based Jena ARP
parser yielded 10x performance and scalability
improvements


Billion Triples Challenge 2009


Streaming analysis of the 2009 corpus was
performed at an overall rate of 103K
statements/sec on a Mac laptop with a portable
external disk


Compare to loading 10
-
20K statements/second
on a server


8

Stream Processing Examples


Unix pipes


Dataflow architectures


Streambase


IBM System S/InfoSphere Streams

9




aggregation

persistent

queries

augmentation

context

filter

alerts

correlation

translation

inference

distribution

Data

Sources

Distribution And Processing Elements

Users

CEP

NLP

Sensor

Network

Imagery

RSS

IM

Gazetteer

Sensor

Semantic
Web

Database

Persistent pipelines


Streams of statements comprising
object subgraphs


URI naming allows drill
-
down


Provenance, timestamps

Processing elements


Consume and produce subgraphs


Multiple functions may be combined

Archive

User 2

User 3

Community of Interest 1

Community of Interest 2

User 1

Vision: Knowledge Streams

10

Goals


Web
-
scale


Decentralized among multiple sites


Heterogenous implementations


Long
-
lived, persistent connections


User accountability


Introspection over the processing network for
control and optimization


E.g. aggregating subscriptions


Balance with security, privacy, and autonomy
concerns


11

Building Blocks


RDF Content


Existing stream processing frameworks


Workflow systems


Publish/subscribe message oriented middleware


12

RDF Payloads


Malleable data


Standards
-
based graph structure


Can easily add, remove, and transform statements


Self
-
describing


Unique naming via URIs


References to vocabularies and ontologies


Potential for inference


13

Workflow Systems


Graphical environments for developing
processing pipelines


Yahoo Pipes, DERI Pipes, SPARQLMotion


Nice user interfaces for development and execution

14

http://pipes.deri.org

Semantic Complex Event Processing


Complex Event Processing


One of the leading edges of rules technology


Formal specification of higher
-
level events in terms of lower
-
level
events


E.g. alert if the moving average increases 15% within a 10 minute window


Engine can be compiled/optimized for a specific rule set


High
-
volume deployments in finance and other industries


Most implementations focus on self
-
contained tuples


Semantic Complex Event Processing


Enrich CEP using Semantic Web technology


Emerging topic at recent conferences


Early implementations


Wrappers around open source CEP engines


Native implementation


Provides a powerful set of operators and engines for
Knowledge Streams


15

Implementation Approach


Well
-
defined APIs for implementing operators


Operator execution containers


Could encapsulate existing engines


Start with manual processing network
configuration, then automate

16

Use Cases


Dissemination of metadata for new satellite
imagery


Social network changes


Alerting of friends’ new publications




17

Demo


Processing using DERI Pipes with new
operators


Ingest of #SemTechBiz tweets using Twitter
Streaming API


Conversion of JSON to RDF


Mapping to SIOC vocabulary using SWRL rules


Enrich by matching Twitter @handles with contacts


Persistent buffering using Java Message Service


Monitoring

18