Foundations of Semantic We
b Databases
1
Foundations of Semantic Web
Databases
Gutierrez, Hurtado and
Mendelzon
summary by:
Nir Zepkowitz
Foundations of Semantic We
b Databases
2
1
Background
Currently, the
web is a huge collection of interconnected data.
However, t
he web lacks semantic
information so managing and processing the data i
s hard.
The idea of
Semantic web
is an idea
to
build an infrastructure of machine

readable semantic for the data on the web.
In words of others:
"If HTML and the Web made all the online documents look like one huge
book, RDF, schema, and inference language
s will make all the data in the world look like one
huge database." Tim Berners

Lee, Weaving the Web, 1999
.
In 1998 the W3C offered the language that will be the basis for that infrastructure
–
the
Resource Description Framework (RDF).
Query languages for
RDF were developed side by side with RDF.
Nevertheless, l
ittle research
about the foundations of RDF and its query languages
has been conducted
.
This research is
necessary because of the new features that arise in querying RDF graphs (as opposed to
standar
d DB)
and this is one of the reasons for this article.
2
Paper Goals
Study formal aspects of querying DBs containing RDF data.
New notation of normal form for RDF graphs.
Give formal definition of query language for RDF.
Investigate theoretical and complexit
y aspects related to query processing and
redundancy.
3
The RDF Model
Notations:
U
–
RDF URI references.
B
–
blank nodes
(similar to variables that we saw in TATA)
.
L
–
RDF literals.
RDF triple
:
V1
–
subject, v2
–
predicate, v3
–
object.
(the head of an a
rc the arc and the tail).
Definitions:
Graph
is a set of triples.
Universe(G)
–
set of UBL
(
)
elements that appear in a triple of G.
Vocabulary of G
–
A graph is
ground
if it has no blank nodes.
Map:
a function (UBL

>UBL) pr
eserving URIs and literals (
μ
(u) = u).
μ
(G)
–
a set (
μ
(s),
μ
(p),
μ
(o)) s.t. (s,p,o) in G.
μ
is
consistent
with G if
μ
(G) is
a
RDF graph.
In this case we
call
μ
(G) an instance of G.
Foundations of Semantic We
b Databases
3
An instance is
proper
if
μ
(G) has fewer blank nodes than G.
G1,G2 are
isomorphic
(
) if there are maps
μ
1,
μ
2 s.t.
μ
1(G1)=G2 and
μ
2(G2)=G1
Union
of graphs (G1UG2) is the union of their triples.
Merge
of graphs (G1+G2) is G1UG2
’
where G2
’ is
isomorphic to G2 and its blank
nodes are disjoint with
those
of G1. (there is no relation between
the graphs
)
.
G is
lean
if there is no map
μ
s.t.
μ
(
G) is a proper sub

graph of G.
the intuition is
that
a
lean graph cannot be “minimized” the lean sub

graph is the essence of the
graph.
3.1
Example of RDF graph
3.2
RDFS
RDFS is
an
e
xtended version of RDF.
It d
efines classes and properties that may be used for
describing groups of resources and relationships between resources.
This model supports:
reification (making statements about statements), typing and inheritance.
For example RD
FS defines the predicate SC (sub class) and this property has some rules that
come with it. For instance, If A is SC of B and B is SC of C then A is SC of C.
3.3
Core(G)
Theorem
: each RDF graph G contains a unique lean sub

graph which is an instance of G.
We
will denote this unique sub

graph:
core(G).
4
Semantics of RDF graphs
Theorem
*
: Let G1, G2 be simple (do not use predefined semantics
like RDFS classes and
predefined properties
) graphs. G1 entails G2 (G1
╞
G2) iff there is a map
G2

>G1
(there is a map
s.t.
μ
(G
2
) is sub

graph of G
1
).
For example the graph that was presented before entails this graph:
Foundations of Semantic We
b Databases
4
G1 and G2 are equivalent (G1
≡
G
2) if G1
╞
G
2 and G2
╞
G1.
Theorem
: if G is simple
(RDF model)
, then core(G) is the unique (up to isom
orphism) minimal
(w.r.t number of triples) graph equivalent to G.
4.1
RDFS Model
There is a sound and complete set of rules for
╞
in graphs with RDFS

vocabulary.
For example: (a,sc,b), (b,sc,c)

> (a,sc,c).
In non

simple
graphs we can not use theorem
*
because
of issues like transitivity.
To avoid the problem we will “close” the graph with all possible triples that are entailed by the
existing ones.
A
closure
of G is a maximal set of triples G’ over universe(G’) plus the
RDFS

vocabulary s.t. G’
contains G and i
s equivalent to G.
There is another kind of closure
: RDFS

closure

Closure of G under the set of RDFS rules.
By using this definition we can prove that: G1
╞
G
2 iff there is a map from G2 to the RDFS
closure of G1.
Notice that
f
rom the data representation point of view, “closure” and “RDFS

closure” may have
redundancies. They are not the best choice to work with.
5
Normal forms
G’s normal form (nf(G)) is core(
G’), where G’ is closure of G.
In the below the normal form of the right graph is the left one.
If G is
a
RDF graph:
1.
nf(G) is unique.
2.
G1
╞
G
2 iff nf(G2)

>nf(G1).
3.
G1
≡
G
2 iff
Normal forms are not the most compact representation.
A
reduction
of a graph G i
s a minimal graph Gr equivalent to G and contained
in G.
The writers of the article present an algorithm to get the reduction of a graph.
The basic idea is to delete triplets deduced by RDFS rules.
((a,sc,b), (b,sc,c)

> (a,sc,c)).
Foundations of Semantic We
b Databases
5
6
Querying RDF Databases
RDF graph can be viewed as standard relational database.
Each tuple in the table is a triplet with
the attributes: subject, predicate and object.
Variables
(disjoint from UBL) will be denoted ?X, ?Y, ?person.
The query language will be similar to datalog
:
(?A,cre
ates,?Y) <

(?A,type,Flemish),
(?A,paints,?Y), (?Y,exhibited,?Gordon)
We will define
A
tableau
as
a pair (H,B)
, where
H and B are RDF graphs.
Now we can say that a
Query
is a tableau (H,B) plus a set of premis
es P and a set of constraints
C
, where
P is a graph over UBL
and C
is a subset of the variables occurring in H.
6.1
Constrain
t
s
Constrain
t
s a
llow
discriminating
between blank and ground nodes in an answer (IS NOT NULL).
If we add the constraint
{?A} this means that ?A variable must be bound to a non

blank element
in each answer to the query.
6.2
Premises
The premise represents information that the user supplies to the database in order to answer the
query.
It a
Allows hypothetical analysis.
6.3
Answering
a query
6.3.1
Valuation and Matching
Valuation
is a function: V

>UBL. For a set
C of variables, the valuation
v
satisfies the
c
onstraint C, if for all x in C
v
(x) is not blank.
We denote
v(B)
is the graph obtained
after replacing every occurr
ence of a variable
x in B with
v
(x).
Matching
of a graph B in DB D is a valuation v s.t.
.
6.3.2
Single answer
Let
q=
(H,B,P,C) be a query and D a DB.
A p
r
e

answer
of q over D is:
o
preans(q,D)={v(H) : v is a matching of B in D+P and v satisfies C}.
A graph v(H) in preans(q,
D) is called a
single answer
of query q over D.
6.3.3
Complex queries
We would like complex queries to be composed form simple ones
, there are two options
:
ans
u
(q,D)
–
union
(set)
of the triples of the simple answers. This option is
good when
we want blank nodes
to play the role of bridges between two queries.
ans
+
(q,D)
–
(merge)
renaming blank nodes to avoid name clashes
before the union of
the triples
. Good when querying several unrelated DBs.
7
Query complexity
We consider
simpler versions
for calculating the
query complexity
:
Foundations of Semantic We
b Databases
6
o
Query complexity version: fixed DB D, given a query q, is q(D) is non

empty?
o
Date complexity version: fixed query q, given a DB D, is q(D) non

empty?
Theorem
: the evaluation problem is NP

complete for the query complexity version and
pol
ynomial for the data complexity version.
We can show that t
he size of the set of answers of a query q over a DB D is D
q
.
where
D
is
the
size of the normal form of D
and
q
is
the number of symbols in the query.
Comments 0
Log in to post a comment