Storing and Accessing Semantic Data - IKS Project

sunfloweremryologistData Management

Oct 31, 2013 (3 years and 11 months ago)

117 views

Co
-
funded by the
European Union

Semantic CMS Community

Semantic
Data
Access

Copyright IKS Consortium

1

Lecturer

Organization


Date
of

presentation

www.iks
-
project.eu

Page:

Copyright IKS Consortium


Introduction

of

Content
Management


Foundations

of

Semantic

Web Technologies

Storing

and
Accessing


Semantic

Data


Knowledge

Interaction


and
Presentation


Knowledge

Representation

and
Reasoning

Semantic

Lifting


Designing

Interactive


Ubiquitous

IS


Requirements

Engineering


for

Semantic

CMS

Designing


Semantic

CMS

Semantifying


your

CMS

Part I:
Foundations

Part II:
Semantic

Content
Management

Part III:
Methodologies

(2)


(1)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

www.iks
-
project.eu

Page:

What

is

this

Lecture

about
?


We

have

learned

...


...
which

languages

can

be

used

to

model
knowledge
.


...
how

to

extract

knowledge

from

content

in a
automatic

way

(
semantic

lifting
).



We

need

a
way

...


...
to

store

the

extracted

knowledge

technically

in an
accessible

way
.




Copyright IKS Consortium

3

Storing

and
Accessing


Semantic

Data


Knowledge

Interaction


and
Presentation


Knowledge

Representation

and
Reasoning

Semantic

Lifting

Part II:
Semantic

Content
Management

(3)

(4)

(5)

(6)

www.iks
-
project.eu

Page:

Outline


Semantic Data


Semantic Web


RDF


Semantic Data Storage


Triple Stores


Semantic Data Access


SPARQL


RQL


API Calls

Copyright IKS Consortium

4

www.iks
-
project.eu

Page:

Semantic Data


Stands for machine understandable information


Allows computers to figure out the data without user
interference


Allows computers act intelligently without programming
for each task



5

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Semantic Data


Provides infrastructure to get practical results


Applications find out subsequent information based on the
previous relations. (e.g. Eiffel Tower
-
> Paris
-
> France
)


Allows reasoning capabilities


Providing extraction of related information which is not
directly linked



6

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Semantic Web


A classical generic description:


“Web of data”


Extends the World Wide Web


By encouraging,


Common language for representing data



Transformable to/from disparate sources such as relational
databases, XML, etc (RDF)


Common reusable data model to represent data from different
domains in common terms (RDFS, OWL, etc)


Rules to enable applications reason over the information
(SWRL)



7

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Semantic Web Layer Cake

Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg

8

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Semantic Web


So many organizations publishing their data in different
domains


Media


Geographic


Government





Whole set contains approximately 30 billion triples


One of the largest collections is DBPEDIA


Semantified

version of Wikipedia


Example:


Obtain cities of China that have population over 20 million


Needs efficient storage and query for semantic data

Copyright IKS Consortium

9

www.iks
-
project.eu

Page:

Representation of Semantic
Data


RDF


The common data format


An abstract model with several serialization formats


Consists of statement referred as
triples
having the form
(subject, predicate, object) where,


Subject: any resource identifier


Predicate: a resource identifier of any property


Object: either a resource identifier or a literal value



10

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Storing Semantic Data


Need for specialized designs for triple collections


Two modalities:


Relational databases


Triple stores


Mostly used for storage


Lots of implementations


They can also be RDB based.

11

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Triple Store


A
purpose
-
built database for the storage and retrieval of
RDF

data.


Optimized

place

to add, remove and query for triples.
Each triple in the TripleStore
complies with the form
(subject, predicate, object)



12

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Considering XML Databases


XML databases are existing storage systems for semi
-
structured data


Idea: Transform RDF to XML and store it in XML databases


Yet, XML data model is not exactly same with semantic data



XML data model is a tree
-
like structure


RDF
data
is
represent
ed

through

a graph

without an hierarchy


Copyright IKS Consortium

13

www.iks
-
project.eu

Page:

Considering XML Databases


XML Databases are not suitable for storage and querying
RDF


Only simple manipulations can be handled through XML query
languages


RDF Schema processing

and inference is not possible


S
tandard RDF/XML mapping is unsuitable


Copyright IKS Consortium

14

www.iks
-
project.eu

Page:

Monolithic approach for DB
Based Triple Stores


Generic representation for all RDF schemas


Only two tables are used


Resources table


Triples table

Copyright IKS Consortium

15

www.iks
-
project.eu

Page:

Monolithic approach for DB
Based Triple Stores


Copyright IKS Consortium

16

predid

subid

objid

objvalue

6

2

1

5

3

7

5

1

8

5

9

2

3

9

Sunscal
e

id

uri

1

http://www.iks.og/topics.rdfs#Hotel

2

http://www.iks.og/topics.rdfs#HotelDirections

3

http://www.oclc.org/dublincore.rdfs#title

4

http://www.iks.og/schema.rdf#Ext.Resource

5

http://www.w3.org/1999/02/22
-
rdf
-
syntax
-
ns#type

6

http://www.w3.org/2000/01/rdf
-
schema#subClassOf


7

http://www.w3.org/1999/02/22
-
rdf
-
syntax
-
ns#Property

8

http://www.w3.org/2000/01/rdf
-
schema#Class

9

rl

www.iks
-
project.eu

Page:

Triples Stores


Can be categorized into 3 category:


In memory triple stores


Used for certain operations like benchmarking, caching, etc


Native triple stores


Provides their own implementations (Virtuoso,
Mulgara
,
AllegroGraph
, …)


Non memory non native triple stores


Are built on third party databases (Jena SDB,
Kaon
, …)



17

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Functionalities provided by
Triple Stores


RDBMS
-
support


General RDF model access


Query language support in the store such as RQL,

SPARQL



Some stores provide:


Provenance
-

tracking of who
-
said
-
what


APIs for accessing triple store over network



Very few stores provide:


Full text search


Inference and rule languages


Copyright IKS Consortium

18

www.iks
-
project.eu

Page:

Example Triple Store implementations


RDF Suite


Sofia
Alexaki
,
Vassilis

Christophides
, Gregory
Karvounarakis
,
Dimitris

Plexousakis
,
Karsten

Tolle
. The ICS
-
FORTH
RDFSuite
:
Managing Voluminous RDF Description Bases ,
SemWeb
, 2001


B
ased on an ORDBMS model


Sesame


http://www.openrdf.org/


Relational

databases

(
mysql
,
postgres
, oracle)


Jena


http://www.hpl.hp.com/semweb/jena2.htm


Relational

databases

(
mysql

,
postgres
, oracle)


Virtuoso


http://virtuoso.openlinksw.com/


Native RDF Quad Storage (Physical Quads)



Copyright IKS Consortium

19

www.iks
-
project.eu

Page:

RDFSuite

(ICS
-
Forth)*

* IST
-
1999
-
13479 C
-
Web, IST
-
2000
-
26074 Mesmuses

20

Copyright IKS Consortium

www.iks
-
project.eu

Page:

How triples are stored and
accessed in RDF Suite


Separate tables are created to store resources


Properties,
subClasses
,
subProperties

and instances


Indices on attributes like
URI, source and target


Querying is possible through RQL



Copyright IKS Consortium

21

www.iks
-
project.eu

Page:

How triples are stored and
accessed in RDF Suite


Copyright IKS Consortium

22

[
Figure from *]

*Sofia
Alexaki
,
Vassilis

Christophides
, Gregory
Karvounarakis
,
Dimitris

Plexousakis
,
Karsten

Tolle
. The ICS
-
FORTH
RDFSuite
: Managing Voluminous RDF Description Bases ,
SemWeb
, 2001


www.iks
-
project.eu

Page:

Sesame Architecture


DBMS
-
independent API for
accessing triple
repositories


SAIL API


A set of Java interfaces
between other modules and
repository


Abstract from the actual
storage mechanism


Query Module


RQL support


Different ways to
communicate with clients


Through Protocol handlers



Copyright IKS Consortium

23

*
Jeen Broekstra and Arjohn Kampman and Frank van Harmelen
, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International

Semantic Web Conference, 2002

www.iks
-
project.eu

Page:

SAIL API over
PostgreSQL


PostgreSQL


Object
-
relational

DBMS


Support
sub
-
table
relations between its
tables
for providing
RDF Schema class
and property
subsumption


Individuals are
represented under
separate tables
created for resources


Difficult
to add
table

*
Jeen Broekstra and Arjohn Kampman and Frank van Harmelen
, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International

Semantic Web Conference, 2002

24

Copyright IKS Consortium

www.iks
-
project.eu

Page:

SAIL API over
MySQL


MySQL


The
database
schema does
not change
when the
RDFS
changes


Has advantage
where RDFS is
unstable


*
Jeen Broekstra and Arjohn Kampman and Frank van Harmelen
, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International

Semantic Web Conference, 2002

25

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Jena2 Architecture


Copyright IKS Consortium

26

www.iks
-
project.eu

Page:

Jena2 Architecture

Copyright IKS Consortium

27

*
Kevin Wilkinson
,
Craig Sayers
,
Harumi A. Kuno
,
Dave Reynolds
: Efficient RDF Storage and Retrieval in Jena2
, Proceedings of SWDB'03, The first International Workshop on


Semantic Web and Databases

www.iks
-
project.eu

Page:

Jena2


Jena2


Denormalized

schema


Avoids unnecessary joins by merging URIs, literals in
statements table


Multiple statement tables


Better locality and caching


Property Tables

Copyright IKS Consortium

28

www.iks
-
project.eu

Page:

Normalized
vs

Denormalized

Tables


Copyright IKS Consortium

29

www.iks
-
project.eu

Page:

Property Tables

Copyright IKS Consortium

30

Subject

Property

Object

person1

name

Alice

person1

age

32

person1

twinOf

person2

person1

faxPhone

x1234

person1

adminPh

x5678

person2

name

Bob

person2

age

35

person2

adopteeOf

person6

person2

friendOf

person8

person2

gender

male

Subject

Property

Object

person1

twinOf

person2

person1

faxPhone

x1234

person1

adminPh

x5678

person2

adopteeOf

person6

person2

friendOf

person8

ID

name

age

gender

p1

Alice

32

-

p2

Bob

35

male

Triple Store

Person Property Table

Triple Store Only

*
Kevin Wilkinson
,
Craig Sayers
,
Harumi A. Kuno
,
Dave Reynolds
: Efficient RDF Storage and Retrieval in Jena2
, Proceedings of SWDB'03, The first International Workshop on


Semantic Web and Databases

www.iks
-
project.eu

Page:

Jena Persistence Options


SDB


Scalable storage and query for RDF


Specifically designed for SPARQL support


Supports:
MySQL
,
PostgreSQL
, Oracle 11g, Microsoft
SQL server and IBM DB2


Scales to graphs of 100 million triples

Copyright IKS Consortium

31

www.iks
-
project.eu

Page:

Jena Persistence Options


TDB


Provides for large scale storage and query of RDF
datasets using a pure Java engine


Supports SPARQL


A non
-
transactional, faster database solution for use by a
single system


It scales well beyond SDB and is simpler to setup


Copyright IKS Consortium

32

www.iks
-
project.eu

Page:

Virtuoso


General purpose RDBMS with extensive RDF
adaptations


RDF data is stored as RDF quads, i.e. it supports RDF
with named graphs


i.e. graph, subject, predicate, object
tuples


The columns are G for graph, P for predicate, S for subject
and O for object


Copyright IKS Consortium

33

www.iks
-
project.eu

Page:

Querying Semantic Data


Semantic data can be queried from triple stores by


Various query languages


SPARQL


Different endpoints provided


RQL


RDQL


SeRQL





API Calls


Through proprietary APIs of different projects


Linked Data

34

Copyright IKS Consortium

www.iks
-
project.eu

Page:

SPARQL


Is an RDF query language


Standardized by W3C consortium


Similar concept of SQL for databases


Syntactically resembles to SQL


RDF Graphs instead of databases




35

Copyright IKS Consortium

www.iks
-
project.eu

Page:

SPARQL Endpoints


Provides functionality to query the knowledge base via
the SPARQL language


Accepts queries and returns results through HTTP
protocol


Query results can be in different formats such as


RDF


XML


HTML


JSON


CSV

36

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Semantic Data Access With API
Calls


Open source projects provides APIs to manipulate RDF
data


Jena


Apache
Clerezza


Sesame


JRDF

37

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Jena


Jena provides a rich API to manipulate the RDF stored in
the underlying triple store.


Model to represent graphs


CRUD methods for triples


Querying methods for existing resources


See the next slide for the code snippet…

38

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Jena Code Snippet

String
personURI

= "http://somewhere/JohnSmith";

String
givenName

= "John";

String
familyName

= "Smith";

String
fullName

=
givenName

+ " " +
familyName
;


// create an empty Model which represents an RDF graph

Model
model

=
ModelFactory.createDefaultModel
();


// create the resource which will produce the triples in the next slide

Resource
johnSmith



=
model.createResource
(
personURI
)



.
addProperty
(VCARD.FN,
fullName
)



.
addProperty
(VCARD.N,




model.createResource
()





.
addProperty
(
VCARD.Given
,
givenName
)


















.
addProperty
(
VCARD.Family
,
familyName
));

39

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Jena


Created triples with the code snippet in previous slide:


(<http://somewhere/JohnSmith>, VCARD.FN, “John
Smith”)

(<http://somewhere/JohnSmith>, VCARD.FN, _)

(_,
VCARD.Given
, “John”)

(_,
VCARD.Family
, “Smith”)


Note that

_ symbol represents a blank node

40

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Apache
Clerezza


Provides an API regardless from the different triples
stores it supports


Its API provides a model to represent RDF graphs and
manipulate those graphs


Also provides an SPARQL endpoint to query the stored
knowledge


41

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Apache
Clerezza

Code Snippet

String base = “http://www.example.org#”;

MGraph

g = new
SimpleMGraph
();

g.add
( new
TripleImpl
(


new
UriRef
(base + “
JohnSmith
”),


new
UriRef
(
rdf:Type
)


new
UriRef
(
foaf:Person
)));

g.add
( new
TripleImpl
(


new
UriRef
(base + “
JohnSmith
”),


new
UriRef
(VCARD:FN)


LiteralFactory.getInstance
().
createTypedLiteral
(“John”)));






Simple code snippet adding two triples to the graph:

42

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Linked Data


Interrelated datasets on the Web so that computers can
explore them


Has a standard format to be accessed and managed


Provides integration and reasoning on a huge amount
of data on the Web


43

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Linked Data


Four famous principles of linked data represented by
Tim Berners
-
Lee


Use URIs as names of things


Use HTTP URIs to provide
dereferencable

data to people


When an URI is
dereferenced

provide useful information in
standard format (RDF, SPARQL)


Provide links to other URIs to make possible discovery of
related data

44

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Linked Data


45

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Linking Open Data Project


Is an W3C SWEO Project


Aims to make data freely to everyone


Aims to publish open data sets as RDF and set
semantic relationships between them


Serves information in a machine readable format


Enriches content


Reduces duplication


Linked datasets increasing rapidly


A large number of datasets are linked already

46

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Linked Datasets As of October
2008


47

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Linked Datasets As of September
2010


48

Copyright IKS Consortium

www.iks
-
project.eu

Page:


2011

49

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Access Data In The Cloud


Follow the RDF links representing the “things”


SPARQL Endpoints


Ready to use software to discover linked data (See the
next slide)

50

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Linked Data Applications


Lots of application on top of the linked data


Tabulator


Marbles


Openlink

RDF Browser





Just
google


RDF Crawlers


RDF Browsers


Also see the following link containing a number of linked data
applications:


http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/
LinkingOpenData/Applications

51

Copyright IKS Consortium

www.iks
-
project.eu

Page:

Available SPARQL Endpoints


http://dbpedia.org/sparql


http://www4.wiwiss.fu
-
berlin.de/dblp/


To see possible SPARQL endpoints providing a certain
URI see


http://void.rkbexplorer.com/endpoint
-
search/


See also a list of alive SPARQL endpoints


http://www.w3.org/wiki/SparqlEndpoints

52

Copyright IKS Consortium

www.iks
-
project.eu

Page:

References


http://www.w3.org/TR/rdf
-
sparql
-
query


http://jena.sourceforge.net/tutorial/RDF_API/index.html


http://www.slideshare.net/ldodds/sparql
-
tutorial


http://www.slideshare.net/shamod/a
-
hands
-
on
-
overview
-
of
-
the
-
semantic
-
web?src=related_normal&rel=1702851


http://www.cambridgesemantics.com/2008/09/sparql
-
by
-
example


http://linkeddata
-
specs.info/


http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData


http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf


Sofia
Alexaki
,
Vassilis

Christophides
, Gregory
Karvounarakis
,
Dimitris

Plexousakis
,
Karsten

Tolle
. The
ICS
-
FORTH
RDFSuite
: Managing Voluminous RDF Description Bases ,
SemWeb
, 2001


Jeen Broekstra and Arjohn Kampman and Frank van Harmelen
, Sesame: A Generic Architecture for
Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web
Conference, 2002


Kevin Wilkinson, Craig Sayers, Harumi A.
Kuno
, Dave Reynolds: Efficient RDF Storage and Retrieval in
Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases



http://jena.sourceforge.net/DB/index.html


http://virtuoso.openlinksw.com/



53

Copyright IKS Consortium