# An RDF and XML Database

AI and Robotics

Nov 15, 2013 (4 years and 6 months ago)

128 views

An RDF and XML Database

John Snelson, Lead Engineer

23
rd

October 2013

Slide
2

®

MarkLogic

SEARCH

DATABASE

APPLICATION SERVICES

Slide
3

®

Data ≠

Information

Slide
4

®

Data +

Context =

Information

Slide
5

®

Dynamic Semantic Publishing

BBC Sports

Size and Complexity:

# of athletes

# of teams

# of assets (match
reports, statistics, etc.)

# of relations (facts)

Rich user
experience

See information in
context

Personalize content

Intelligently

(
outside of UK)

Manageable

Static pages? Too many,
changing too fast

Limited number of
journalists

Automate as much as
possible

The Challenge

Goals

Slide
6

®

Dynamic Semantic Publishing

A Solution

Store, manage documents

Stories

Blogs

Feeds

Profiles

Store, manage values

Statistics

Full
-
Text search

Performance, scalability

Robustness

documents

Tagged by journalists

-
)automatically

Inferred

Facts
reported by
journalists

Linked Open Data for
real
-
world facts

XML Database

Triple Store

Slide
7

®

Dynamic Semantic Publishing

Understanding Data

Slide
8

®

Dynamic Semantic Publishing

Scaling Up

Slide
9

®

What is RDF?

:birth
-
place

:birth
-
place

:has
-
child

:has
-
parent

:person20

:person5

:place5

:first
-
name

:person4

“John”

Slide
10

®

What is RDF?

Schema
-
less

Triple granularity

Open world assumption

Joins
-

the cost of granularity

RDF

Slide
11

®

Data

stored in Triples

Expressed as
Subject

:
Predicate

:
Object

Example
:

"John Smith" :
l
ivesIn

: "London"

"London" :
isIn

: "England"

What is Semantics?

Slide
12

®

Data

stored in Triples

Expressed as
Subject

:
Predicate

:
Object

Example
:

"John Smith" :
l
ivesIn

: "London"

"London" :
isIn

: "England"

Rules

tell us something about the triples

Example
:

If (
A
l
ivesIn

X)
AND (
X
isIn

Y)
then (
A
livesIn

Y)

Inference: "John Smith"
:
livesIn

:
"England"

What is Semantics?

Slide
13

®

Data

stored in Triples

Expressed as
Subject

:
Predicate

:
Object

Example
:

"John Smith" :
l
ivesIn

: "London"

"London" :
isIn

: "England"

Rules

tell us something about the triples

What is Semantics?

"John Smith
"

"England"

livesIn

"London"

i
sIn

livesIn

Slide
15

®

Semantics Architecture

TRIPLE

XQY

XSLT

SQL

SPARQL

GRAPH

SPARQL

Slide
16

®

Triple Index

3 triple orders

Cached for performance

Works seamlessly with other indexes

Security

150 bytes per triple on disk

Billions of triples per host

Scaling out horizontally

TRIPLE

Slide
17

®

RDF

Slide
18

®

Triples Embedded in Documents

<
sem:triple
>

<
sem:subject
>

http://example.org/kennedy/person12

</
sem:subject
>

<
sem:predicate
>

http://example.org/kennedy/last
-
name

</
sem:predicate
>

<
sem:object

datatype
=
"http://www.w3.org/2001/XMLSchema#string"
>

Lawford

</
sem:object
>

</
sem:triple
>

Slide
19

®

Content, Data, and Semantics

<SAR>

<title>

Suspicious
vehicle…

Suspicious vehicle near
airport

<date>

<type>

<threat>

2012
-
11
-
12Z

observation/surveillance

<type>

suspicious activity

<category>

suspicious vehicle

<location>

<
lat
>

37.497075

<long>

-
122.363319

<subject>

IRIID

<subject>

IRIID

<predicate>

<predicate>

isa

value

<triple>

<triple>

<object>

-
plate

<object>

ABC 123

<description>

A
blue van…

A
blue van
with license plate ABC 123 was observed parked behind the airport sign…

</title>

</date>

</type>

</type>

</category>

</threat>

</
lat
>

</long>

</location>

</subject>

</subject>

</predicate>

</predicate>

</object>

</object>

</description>

</SAR>

</triple>

</triple>

Slide
20

®

Content, Data, and Semantics

<SAR>

<title>

Suspicious vehicle…

<date>

2012
-
11
-
12Z

<type>

<threat>

suspicious activity

<category>

suspicious vehicle

<location>

<
lat
>

37.497075

<long>

-
122.363319

<description>

A blue van…

<subject>

<subject>

<predicate>

<object>

IRIID

IRIID

isa

value

-
plate

ABC 123

<predicate>

<object>

observation/surveillance

<type>

<triple>

<triple>

Slide
21

®

RDF Values

<http://example.org/kennedy/person4>

“string
value”^^xs:string

“987”^^xs:double

“2013
-
04
-
09”^^xs:date

bonjour”@fr

_:blank1

“simple”

Slide
22

®

Datatype

Mapping

Datatype

SPARQL

XQuery

Typed Literal

“2013
-
04
-
09”^^xs:date

xs:date(“2013
J

J
09”)

IRI <http://example.com>
sem:iri(“http
://

example.com
”)

Blank Node _:blank1
sem:blank
(“…”)

Simple Literal “simple”

xs:string(“simple
”)

Language “
bonjour”@fr

Tagged

Literal

rdf:langString(“bonjour
”,

fr
”)

Slide
23

®

SPARQL

Executed using the triple index

SPARQL 1.0 + much of SPARQL 1.1

Cost
-
based optimization

Join ordering and algorithms

select * where {

?person :birth
-
place ?place;

:first
-
name “John”

}

SPARQL

Slide
24

®

Executing SPARQL

sem:sparql
(

prefix : <http://
example.org/kennedy
/>

select * {

?person :first
-
name ?first;

:last
-
name ?last;

:alma
-
mater [:ivy
-
league :true]

}”
,

map:entry
(
“first”
,
“John

),

(),

cts:collection
-
query
(
“mycollection

)

)

Slide
25

®

Returning Binding Solutions

select

*
where

{

?person :birth
-
place :place5

}

select

*
where

{

?person :birth
-
place ?place;

:first
-
name
“J
o
hn”

}

Slide
26

®

Solution Results

person

place

:person22

:place13

:person4

:place5

Slide
27

®

SPARQL Query Results XML Format

sem:query
-
result
-
serialize
(

sem:sparql
(
“select

* { … }”
),

“xml”

)

Slide
28

®

Returning Triples

describe

:person4

construct

{

?
bp

:uses
-
name ?fn

}
where

{

?person :birth
-
place ?
bp
;

:first
-
name ?fn

}

Slide
29

®

Triple Results

:place0 :uses
-
name

“Ethel”, “Jeffrey”, “Kara” .

:place1 :uses
-
name

“Edward”, “James” .

:place10 :uses
-
name

“Robert”, “Sheila”, “Stephen” .

Slide
30

®

Querying Named Graphs

select

*

from

<http://my_graph>

where

{ ?
s

?
p

?
o

}

select

*
where

{

graph <http://my_graph> {

?
s

?
p

?
o

}

}

Slide
31

®

Restricting The Datasets

let

\$options :=
“properties”

let

\$query :=
cts:and
-
query
(

cts:directory
-
query
(
“/triples/”
),

cts:element
-
range
-
query
(

xs:QName
(
“date

),
“>”
,\$date)

)

return

sem:sparql
(
“…”
,(),(),

\$
options,\$query
)

Slide
32

®

Creating Triples

sem:triple
()

sem:rdf
-
parse()

sem:rdf
-
get()

sem:rdf
-
builder()

sem:rdf
-

sem:rdf
-
insert()

Returning
sem:triple

values

Inserting to a database

Slide
33

®

Graph Store API

declare function
graph
-
insert
(

\$
graphname

as
sem:iri
,

\$triples as
sem:triple
*,

[\$permissions as
element(sec:permission
)*,

\$collections as
xs:string
*,

\$quality as
xs:int
?,

\$forest
-
ids as
xs:unsignedLong
*]

) as
xs:string
*;

declare function
graph
-
delete
(

\$
graphname

as
sem:iri

) as empty
-
sequence();

Slide
34

®

Conclusion

Semantics can enhance your data
-
oriented and search applications.

XQuery and SPARQL work well together.

A combination RDF and XML database
simplifies working with the technologies
together.

Try MarkLogic 7:
http://www.marklogic.com/early
-
access/

Slide
35