PPT - OpenLink Virtuoso - OpenLink Software

seaurchininterpreterInternet και Εφαρμογές Web

7 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

67 εμφανίσεις

© 2008 OpenLink Software, All rights reserved.

Mapping Relational
Databases to RDF with
OpenLink Virtuoso

Orri Erling
-

Lead Developer, Virtuoso Team

© 2008 OpenLink Software, All rights reserved.

Who Wants to Map?



Semantic Web Scalers


Expose whatever there is as RDF, the next guy will
unify terms, make search and apps



Data Warehouse Keepers


Data is spread out, has implicit semantics, complex
schemas, heterogeneous sources, ambiguous terms
but we must make it join and aggregate cleanly

© 2008 OpenLink Software, All rights reserved.

Present State



SPARQL to SQL exists but still, complex
integrations are data warehouses


We'd really like to map, but...


Can it be otherwise?

© 2008 OpenLink Software, All rights reserved.

Why RDF Data Warehouse?


Pros


Even query performance across all data


Possibility of forward
-
chaining inference


Some SPARQL features may be better supported,
e.g. Unspecified predicates




Cons


Keeping data up
-
to
-
date


Complex set up, needs dedicated servers: you don't
build them on a whim

© 2008 OpenLink Software, All rights reserved.

Why Map?



No copying, no timeliness issues


RDBMS outperforms RDF for analytics
workloads


Agile reconfiguration without reloading data

© 2008 OpenLink Software, All rights reserved.

Virtuoso


Mapping of SPARQL to SQL against
any existing schema
-

whether
stored in Virtuoso or elsewhere


Physical quad store


Federated/local RDBMS

© 2008 OpenLink Software, All rights reserved.

For Mapping to Deliver...


Tackle any SQL analytics workload in SPARQL without
extra cost


Deal with arbitrary SQL schema


Produce single SQL statements, optimizable by target
RDBMS


Have intelligence for cases where one RDF entity can
come from many relational sources

© 2008 OpenLink Software, All rights reserved.

The Cases of Integration


Bring similar but heterogeneous schemas into a unified
ontology
-

Union View


Translate FKs of one schema to PKs in another
-

Distributed Join


Hide differences in normalization
-

Views for hiding joins


-

Unit/Terminology conversions

© 2008 OpenLink Software, All rights reserved.

Defining a Mapping


Define URI formats and their subclass relations


Define which key
-
column
-
value combinations make a
triple


Arbitrary SQL is allowed for mapping values and
filtering


A single RDF node can be a composite of many
columns, e.g. multipart key

Use SPARQL/SQL to:

© 2008 OpenLink Software, All rights reserved.

The TPC
-
H Case




The 22 queries as extended SPARQL



Each generates a single SQL statement, executable by
Virtuoso, Oracle, Others



Next make several TPC
-
H databases on different servers
and run the queries against the union

http://demo.openlinksw.com/tpc
-
h/

© 2008 OpenLink Software, All rights reserved.

Where Problems Begin


In OpenLink Data Spaces, 6 Collaborative apps all mapped to SIOC:





Trivially becomes a union of everything, 1000+ lines of SQL


Intelligently (once per app) becomes a Union of :


select * from <ods>

where {?s ?p ?o . ?s has_comment ?c .


?c has_author <xxx> }

select post.* from post, comment, user

where c_post = p_id and


c_author = u_id and


u_name = f ('xxx')


© 2008 OpenLink Software, All rights reserved.

What One Must Know


Mapping for integration is not trivial


Be careful when mapping multiple tables/columns to one
class/property


Make URI schemes which encode type and source, so that
senseless joins are not attempted if types not specified in
query


Understand what the mapping logic can and cannot optimize


Understand what SQL can and cannot optimize


View resulting SQL for sanity check

© 2008 OpenLink Software, All rights reserved.

SQL Extensions


Mapping must work against any RDBMS/Schema, as is


But there is Virtuoso SQL between the mapping and
target RDBMS(s)


Location and latency
-

conscious distributed cost model


Breakup for making a wide result set into a row per
property


Inverse functions

© 2008 OpenLink Software, All rights reserved.

Use Cases


OpenLink Data Spaces
-

Blog, Wiki, News, Social
Network, Feed Aggregation, Tag Clouds, Bookmarks
etc.


OpenLink's own MIS
-

“total information awareness”:
URI for any CRM Object, Account, Product, Support
Case, Email etc..


Musicbrainz


phpBB, Drupal, MediaWiki, WordPress, Bugzilla, and
others.

© 2008 OpenLink Software, All rights reserved.

OpenLink Software

Thank You!

http://virtuoso.openlinksw.com