An RDF and XML Database

hurriedtinkleAI and Robotics

Nov 15, 2013 (3 years and 4 months ago)

100 views

An RDF and XML Database

John Snelson, Lead Engineer

23
rd

October 2013

Slide
2

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

MarkLogic

SEARCH

DATABASE

APPLICATION SERVICES

Slide
3

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Data ≠


Information

Slide
4

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Data +

Context =

Information

Slide
5

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Dynamic Semantic Publishing

BBC Sports

Size and Complexity:


# of athletes


# of teams


# of assets (match
reports, statistics, etc.)


# of relations (facts)



Rich user
experience


See information in
context


Personalize content


Easy navigation


Intelligently
serve ads

(
outside of UK)


Manageable


Static pages? Too many,
changing too fast


Limited number of
journalists


Automate as much as
possible


The Challenge

Goals

Slide
6

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Dynamic Semantic Publishing

A Solution


Store, manage documents


Stories


Blogs


Feeds


Profiles


Store, manage values


Statistics


Full
-
Text search


Performance, scalability


Robustness


Metadata about
documents


Tagged by journalists


Added (semi
-
)automatically


Inferred


Facts
reported by
journalists


Linked Open Data for
real
-
world facts

XML Database

Triple Store

Slide
7

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Dynamic Semantic Publishing

Understanding Data

Slide
8

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Dynamic Semantic Publishing

Scaling Up

Slide
9

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

What is RDF?

:birth
-
place

:birth
-
place

:has
-
child

:has
-
parent

:person20

:person5

:place5

:first
-
name

:person4

“John”

Slide
10

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

What is RDF?



Schema
-
less



Triple granularity



Open world assumption



Joins
-

the cost of granularity

RDF

Slide
11

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Data

stored in Triples

Expressed as
Subject

:
Predicate

:
Object

Example
:

"John Smith" :
l
ivesIn

: "London"

"London" :
isIn

: "England"

What is Semantics?

Slide
12

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Data

stored in Triples

Expressed as
Subject

:
Predicate

:
Object

Example
:

"John Smith" :
l
ivesIn

: "London"

"London" :
isIn

: "England"


Rules

tell us something about the triples

Example
:

If (
A
l
ivesIn

X)
AND (
X
isIn

Y)
then (
A
livesIn

Y)

Inference: "John Smith"
:
livesIn

:
"England"

What is Semantics?

Slide
13

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Data

stored in Triples

Expressed as
Subject

:
Predicate

:
Object

Example
:

"John Smith" :
l
ivesIn

: "London"

"London" :
isIn

: "England"


Rules

tell us something about the triples

What is Semantics?

"John Smith
"

"England"

livesIn

"London"

i
sIn

livesIn

Slide
15

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Semantics Architecture

TRIPLE

XQY

XSLT

SQL

SPARQL

GRAPH

SPARQL

Slide
16

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Triple Index



3 triple orders



Cached for performance



Works seamlessly with other indexes



Security



150 bytes per triple on disk



Billions of triples per host


Scaling out horizontally


TRIPLE

Slide
17

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

RDF Loading

RDF

Slide
18

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Triples Embedded in Documents



<
sem:triple
>


<
sem:subject
>


http://example.org/kennedy/person12


</
sem:subject
>


<
sem:predicate
>


http://example.org/kennedy/last
-
name


</
sem:predicate
>


<
sem:object


datatype
=
"http://www.w3.org/2001/XMLSchema#string"
>


Lawford


</
sem:object
>

</
sem:triple
>



Slide
19

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Content, Data, and Semantics

<SAR>

<title>

Suspicious
vehicle…

Suspicious vehicle near
airport

<date>

<type>

<threat>

2012
-
11
-
12Z

observation/surveillance

<type>

suspicious activity

<category>

suspicious vehicle

<location>

<
lat
>

37.497075

<long>

-
122.363319

<subject>

IRIID

<subject>

IRIID

<predicate>

<predicate>

isa

value

<triple>

<triple>

<object>

license
-
plate

<object>

ABC 123

<description>

A
blue van…

A
blue van
with license plate ABC 123 was observed parked behind the airport sign…

</title>

</date>

</type>

</type>

</category>

</threat>

</
lat
>

</long>

</location>

</subject>

</subject>

</predicate>

</predicate>

</object>

</object>

</description>

</SAR>

</triple>

</triple>

Slide
20

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Content, Data, and Semantics

<SAR>

<title>

Suspicious vehicle…

<date>

2012
-
11
-
12Z

<type>

<threat>

suspicious activity

<category>

suspicious vehicle

<location>

<
lat
>

37.497075

<long>

-
122.363319

<description>

A blue van…

<subject>

<subject>

<predicate>

<object>

IRIID

IRIID

isa

value

license
-
plate

ABC 123

<predicate>

<object>

observation/surveillance

<type>

<triple>

<triple>

Slide
21

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

RDF Values

<http://example.org/kennedy/person4>

“string
value”^^xs:string

“987”^^xs:double

“2013
-
04
-
09”^^xs:date


bonjour”@fr

_:blank1

“simple”

Slide
22

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Datatype

Mapping

Datatype

SPARQL

XQuery

Typed Literal

“2013
-
04
-
09”^^xs:date

xs:date(“2013
J

J
09”)

IRI <http://example.com>
sem:iri(“http
://


example.com
”)

Blank Node _:blank1
sem:blank
(“…”)

Simple Literal “simple”

xs:string(“simple
”)

Language “
bonjour”@fr

Tagged

Literal

rdf:langString(“bonjour
”,


fr
”)

Slide
23

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

SPARQL



Executed using the triple index



SPARQL 1.0 + much of SPARQL 1.1



Cost
-
based optimization



Join ordering and algorithms



select * where {


?person :birth
-
place ?place;


:first
-
name “John”

}

SPARQL

Slide
24

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Executing SPARQL

sem:sparql
(



prefix : <http://
example.org/kennedy
/>


select * {


?person :first
-
name ?first;


:last
-
name ?last;


:alma
-
mater [:ivy
-
league :true]


}”
,


map:entry
(
“first”
,
“John

),


(),


cts:collection
-
query
(
“mycollection

)

)

Slide
25

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Returning Binding Solutions

select

*
where

{


?person :birth
-
place :place5

}

select

*
where

{


?person :birth
-
place ?place;


:first
-
name
“J
o
hn”

}

Slide
26

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Solution Results

person

place

:person22

:place13

:person4

:place5

Slide
27

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

SPARQL Query Results XML Format

sem:query
-
result
-
serialize
(


sem:sparql
(
“select

* { … }”
),


“xml”

)

Slide
28

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Returning Triples

describe

:person4

construct

{


?
bp

:uses
-
name ?fn

}
where

{


?person :birth
-
place ?
bp
;


:first
-
name ?fn

}

Slide
29

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Triple Results

:place0 :uses
-
name


“Ethel”, “Jeffrey”, “Kara” .

:place1 :uses
-
name


“Edward”, “James” .

:place10 :uses
-
name


“Robert”, “Sheila”, “Stephen” .

Slide
30

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Querying Named Graphs

select

*

from

<http://my_graph>

where

{ ?
s

?
p

?
o

}

select

*
where

{


graph <http://my_graph> {


?
s

?
p

?
o


}

}

Slide
31

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Restricting The Datasets

let

$options :=
“properties”

let

$query :=
cts:and
-
query
(


cts:directory
-
query
(
“/triples/”
),


cts:element
-
range
-
query
(


xs:QName
(
“date

),
“>”
,$date)

)

return

sem:sparql
(
“…”
,(),(),


$
options,$query
)

Slide
32

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Creating Triples



sem:triple
()



sem:rdf
-
parse()



sem:rdf
-
get()



sem:rdf
-
builder()




sem:rdf
-
load()



sem:rdf
-
insert()

Returning
sem:triple

values

Inserting to a database

Slide
33

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Graph Store API

declare function
graph
-
insert
(


$
graphname

as
sem:iri
,


$triples as
sem:triple
*,


[$permissions as
element(sec:permission
)*,


$collections as
xs:string
*,


$quality as
xs:int
?,


$forest
-
ids as
xs:unsignedLong
*]

) as
xs:string
*;


declare function
graph
-
delete
(


$
graphname

as
sem:iri

) as empty
-
sequence();


Slide
34

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Conclusion



Semantics can enhance your data
-
oriented and search applications.



XQuery and SPARQL work well together.



A combination RDF and XML database
simplifies working with the technologies
together.



Try MarkLogic 7:
http://www.marklogic.com/early
-
access/


Slide
35

Copyright © 2013 MarkLogic
®

Corporation. All rights reserved.

Any Questions?