Foundations of Semantic Web Databases Gutierrez, Hurtado and Mendelzon

economickiteInternet and Web Development

Oct 21, 2013 (3 years and 9 months ago)

70 views

Foundations of Semantic We
b Databases

1

Foundations of Semantic Web
Databases

Gutierrez, Hurtado and
Mendelzon


summary by:
Nir Zepkowitz

Foundations of Semantic We
b Databases

2

1


Background

Currently, the

web is a huge collection of interconnected data.

However, t
he web lacks semantic
information so managing and processing the data i
s hard.

The idea of
Semantic web
is an idea

to
build an infrastructure of machine
-
readable semantic for the data on the web.

In words of others:
"If HTML and the Web made all the online documents look like one huge
book, RDF, schema, and inference language
s will make all the data in the world look like one
huge database." Tim Berners
-
Lee, Weaving the Web, 1999
.

In 1998 the W3C offered the language that will be the basis for that infrastructure


the
Resource Description Framework (RDF).

Query languages for
RDF were developed side by side with RDF.

Nevertheless, l
ittle research
about the foundations of RDF and its query languages

has been conducted
.

This research is
necessary because of the new features that arise in querying RDF graphs (as opposed to
standar
d DB)

and this is one of the reasons for this article.

2

Paper Goals



Study formal aspects of querying DBs containing RDF data.



New notation of normal form for RDF graphs.



Give formal definition of query language for RDF.



Investigate theoretical and complexit
y aspects related to query processing and
redundancy.

3

The RDF Model

Notations:



U



RDF URI references.



B



blank nodes

(similar to variables that we saw in TATA)
.



L



RDF literals.



RDF triple
:




V1


subject, v2


predicate, v3


object.

(the head of an a
rc the arc and the tail).


Definitions:



Graph

is a set of triples.



Universe(G)



set of UBL
(
)
elements that appear in a triple of G.



Vocabulary of G






A graph is
ground

if it has no blank nodes.



Map:

a function (UBL
-
>UBL) pr
eserving URIs and literals (
μ
(u) = u).



μ
(G)



a set (
μ
(s),
μ
(p),
μ
(o)) s.t. (s,p,o) in G.



μ

is
consistent

with G if
μ
(G) is
a

RDF graph.



In this case we
call

μ
(G) an instance of G.

Foundations of Semantic We
b Databases

3



An instance is
proper

if
μ
(G) has fewer blank nodes than G.



G1,G2 are
isomorphic

(
) if there are maps
μ
1,
μ
2 s.t.
μ
1(G1)=G2 and
μ
2(G2)=G1



Union

of graphs (G1UG2) is the union of their triples.



Merge

of graphs (G1+G2) is G1UG2

where G2
’ is
isomorphic to G2 and its blank
nodes are disjoint with
those
of G1. (there is no relation between
the graphs
)
.



G is
lean

if there is no map
μ

s.t.
μ
(
G) is a proper sub
-
graph of G.
the intuition is
that

a

lean graph cannot be “minimized” the lean sub
-
graph is the essence of the
graph.

3.1

Example of RDF graph



3.2

RDFS

RDFS is
an

e
xtended version of RDF.

It d
efines classes and properties that may be used for
describing groups of resources and relationships between resources.

This model supports:

reification (making statements about statements), typing and inheritance.

For example RD
FS defines the predicate SC (sub class) and this property has some rules that
come with it. For instance, If A is SC of B and B is SC of C then A is SC of C.

3.3

Core(G)

Theorem
: each RDF graph G contains a unique lean sub
-
graph which is an instance of G.

We
will denote this unique sub
-
graph:
core(G).

4

Semantics of RDF graphs

Theorem
*
: Let G1, G2 be simple (do not use predefined semantics

like RDFS classes and
predefined properties
) graphs. G1 entails G2 (G1


G2) iff there is a map
G2
-
>G1

(there is a map
s.t.

μ
(G
2
) is sub
-
graph of G
1
).


For example the graph that was presented before entails this graph:


Foundations of Semantic We
b Databases

4



G1 and G2 are equivalent (G1

G
2) if G1


G
2 and G2


G1.

Theorem
: if G is simple

(RDF model)
, then core(G) is the unique (up to isom
orphism) minimal
(w.r.t number of triples) graph equivalent to G.


4.1

RDFS Model

There is a sound and complete set of rules for


in graphs with RDFS
-
vocabulary.

For example: (a,sc,b), (b,sc,c)
-
> (a,sc,c).

In non
-
simple

graphs we can not use theorem
*

because

of issues like transitivity.

To avoid the problem we will “close” the graph with all possible triples that are entailed by the
existing ones.

A
closure

of G is a maximal set of triples G’ over universe(G’) plus the
RDFS
-
vocabulary s.t. G’
contains G and i
s equivalent to G.

There is another kind of closure
: RDFS
-
closure

-

Closure of G under the set of RDFS rules.

By using this definition we can prove that: G1


G
2 iff there is a map from G2 to the RDFS
closure of G1.

Notice that
f
rom the data representation point of view, “closure” and “RDFS
-
closure” may have
redundancies. They are not the best choice to work with.


5

Normal forms

G’s normal form (nf(G)) is core(
G’), where G’ is closure of G.

In the below the normal form of the right graph is the left one.



If G is
a
RDF graph:

1.

nf(G) is unique.

2.

G1


G
2 iff nf(G2)
-
>nf(G1).

3.

G1

G
2 iff



Normal forms are not the most compact representation.



A
reduction

of a graph G i
s a minimal graph Gr equivalent to G and contained
in G.



The writers of the article present an algorithm to get the reduction of a graph.

The basic idea is to delete triplets deduced by RDFS rules.


((a,sc,b), (b,sc,c)
-
> (a,sc,c)).


Foundations of Semantic We
b Databases

5

6

Querying RDF Databases

RDF graph can be viewed as standard relational database.

Each tuple in the table is a triplet with
the attributes: subject, predicate and object.



Variables

(disjoint from UBL) will be denoted ?X, ?Y, ?person.



The query language will be similar to datalog
:

(?A,cre
ates,?Y) <
-

(?A,type,Flemish),
(?A,paints,?Y), (?Y,exhibited,?Gordon)

We will define
A
tableau

as

a pair (H,B)
, where
H and B are RDF graphs.

Now we can say that a
Query

is a tableau (H,B) plus a set of premis
es P and a set of constraints
C
, where

P is a graph over UBL

and C

is a subset of the variables occurring in H.

6.1

Constrain
t
s

Constrain
t
s a
llow
discriminating

between blank and ground nodes in an answer (IS NOT NULL).

If we add the constraint

{?A} this means that ?A variable must be bound to a non
-
blank element
in each answer to the query.

6.2

Premises

The premise represents information that the user supplies to the database in order to answer the
query.

It a
Allows hypothetical analysis.

6.3

Answering

a query

6.3.1

Valuation and Matching



Valuation

is a function: V
-
>UBL. For a set

C of variables, the valuation
v

satisfies the
c
onstraint C, if for all x in C
v
(x) is not blank.

We denote
v(B)

is the graph obtained
after replacing every occurr
ence of a variable
x in B with
v
(x).



Matching

of a graph B in DB D is a valuation v s.t.






.


6.3.2

Single answer

Let

q=
(H,B,P,C) be a query and D a DB.

A p
r
e
-
answer

of q over D is:

o

preans(q,D)={v(H) : v is a matching of B in D+P and v satisfies C}.

A graph v(H) in preans(q,
D) is called a
single answer

of query q over D.

6.3.3

Complex queries

We would like complex queries to be composed form simple ones
, there are two options
:



ans
u
(q,D)


union

(set)

of the triples of the simple answers. This option is
good when
we want blank nodes

to play the role of bridges between two queries.



ans
+
(q,D)


(merge)
renaming blank nodes to avoid name clashes

before the union of
the triples
. Good when querying several unrelated DBs.

7

Query complexity

We consider
simpler versions

for calculating the
query complexity
:

Foundations of Semantic We
b Databases

6

o

Query complexity version: fixed DB D, given a query q, is q(D) is non
-
empty?

o

Date complexity version: fixed query q, given a DB D, is q(D) non
-
empty?

Theorem
: the evaluation problem is NP
-
complete for the query complexity version and
pol
ynomial for the data complexity version.

We can show that t
he size of the set of answers of a query q over a DB D is |D|
|q|
.

where
|D|
is
the

size of the normal form of D

and
|q|
is

the number of symbols in the query.