SW-S: DBMS S W M

farmpaintlickInternet and Web Development

Oct 21, 2013 (3 years and 10 months ago)

141 views

SW
-
S
TORE
:
A

VERTICALLY

PARTITIONED


DBMS
FOR

S
EMANTIC

W
EB

DATA

M
ANAGEMENT













Surabhi
Mithal







Nipun Garg

Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate
Hollenbach. 2009.

The VLDB Journal.


Group 4

Surabhi Mithal 4282643

Nipun Garg 4282567

http://www
-
users.cs.umn.edu/~smithal/

O
UTLINE


Introduction to Semantic Web


Motivation


Problem Statement


Challenges


Major Contributions


Related Work


Key Concepts


Assumptions


Validation Methodology


Results


Improvements







I
NTRODUCTION

TO

SEMANTIC

WEB

: A
N

EXAMPLE



ISBN

Author

Title

Publisher

Year

0006511409X

id_xyz

The Glass Palace

id_qpr

2000

ID

Name

Homepage

id_xyz

Ghosh, Amitav

http://www.amitavghosh.com

ID

Publisher’s name

C楴i

id_qpr

Harper Collins

London

Source
: http://www.w3.org/People/Ivan/CorePresentations/SWTutorial
/

A

simplified bookstore data (dataset “A”)

EXAMPLE CONT : GRAPH REPRESENATION


http://
…isbn/0006
51409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:author

A
NOTHER

BOOKSTORE

DATA

(
DATASET

“F”)


A

B

C

D

1

ID

Titre

Traducte
ur

Original

2

ISBN 2020286682

Le Palais des
Miroirs

$A12$

ISBN 0
-
00
-
6511409
-
X

3

4

5

6

ID

Auteur

7

ISBN 0
-
00
-
6511409
-
X

$A11$

8

9

10

Nom

11

Ghosh, Amitav

12

Besse, Christianne

EXAMPLE CONT : GRAPH REPRESENATION


http://
…isbn/000651409
X

Ghosh, Amitav

Besse,
Christianne

Le palais des
miroirs

f:nom

f:traducteur

f:auteur

http://
…isbn/20203866
82

f:nom


DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB

http://
…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:nom

f:traducte
ur

f:auteur

http://
…isbn/2020386682

f:nom

http://
…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:autho
r

DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC
WEB

http://
…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:nom

f:traducte
ur

f:auteur

http://
…isbn/2020386682

f:nom

http://
…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:name

a:homepage

a:autho
r

SAME
URI

DATA INTEGRATION ACROSS THE TWO
DATASETS :SEMANTIC
WEB

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:origina
l

f:no
m

f:traducte
ur

f:auteur

http://
…isbn/2020386682

f:nom

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:nam
e

a:homepage

a:autho
r

http://
…isbn/000651409X

User of data “F” can now ask queries like:

“give me the title of the original”


M
OTIVATION


Integration and sharing of data across different
applications and organizations.


The Semantic Web logical data model is called
“Resource Description Framework.


Semantic web concept has issues related to
scalability and performance due to the nature of
the data. Current data management solutions for
RDF scale poorly.

P
ROBLEM

S
TATEMENT



Input : RDF data in the form of triples
<subject,property,object>


e.g. The Glass Palace
hasAuthor
Amitav Ghosh


Output : Efficient storage system for RDF data.








Objective : Improve the query performance for complex real
world queries.







C
HALLENGES

Find all authors of
books whose title has
the word “Transaction”.

5 way self join!

M
AJOR

C
ONTRIBUTIONS

AND

N
OVELTY



Introduction
of a new concept of vertically
partitioning RDF data
and use of a
column
-
oriented database
to
improve performance and
increase simplicity.


The performance evaluation of the new and
existing techniques with a real world
example.


A
new column oriented database SW
-
store is
proposed which is based
on the above approach
.


R
ELATED

W
ORK


P
ROPERTY

TABLES

HP L
ABORATORIES

-

J
ENA



Property Clustered
Tables and Property Class Tables



Approach 1: A
data clustering
approach.


Approach 2: Creates
clusters based on subject’s
type.




Limitations:


Accuracy of Clustering algorithms.


NULLs
in
data.


Multivalued attributes.


S
AMPLE

DATABASE


Source
:
-

SW
-
Store: a vertically partitioned DBMS for Semantic Web data management


Too many NULLs

K
EY

C
ONCEPTS
:

V
ERTICAL

PARTITIONING

AND

C
OLUMN

O
RIENTED

S
TORE


Vertical
partitioning

of data and further storing this vertically
partitioned data into a
column oriented database
.


Subject
-
object columns for each property. Advantages:


Effective handling of Multivalued attributes.


Elimination
of
NULLs


The
number of unions is
less.



Column
oriented
storage. Advantages:


no
wastage of bandwidth

as projections on data happen before it is pulled into main
memory.


record
header is stored in separate columns

thus reducing the tuple width and
letting us choose different compression techniques for each column
.





K
EY

C
ONCEPTS
:
SW
-
STORE


SW
-
store

is
a column oriented DBMS optimized for storing
RDF


Single column table for subjects.







Representing Sparse data









Overflow tables

A
SSUMPTIONS


Postgres is assumed to be the best available choice for a row
oriented RDBMS because of effective handling of NULLs.


Queries that do not restrict on property values are very rare for
RDF applications.


Moderate amount of Insert/Updates on RDF store.



Critique for Assumption
: Limited Insert/Update


If the
overflow tables get filled rapidly, the batch operation to update the
column oriented store will occur more often degrading the performance as a
whole.

V
ALIDATION

METHODOLOGY


Barton
Libraries dataset provided by the Simile Project at MIT
(http://
simile.mit.edu/rdf
-
test
-
data/barton).


The
benchmark is set
of
7 queries which is
based on a browsing
session of
Long well,
a UI built by Simile group for querying the
library dataset.
These queries are executed on:


Triple data store (subject, property, object table with no improvements on
Postgres).


Property tables ( on Postgres)


Vertically partitioned data in a row oriented store (Postgres).


Vertically partitioned data in a column oriented store (C
-

Store).


V
ALIDATION

METHODOLOGY


Strengths
:


Real world data
and query scenarios.


Comparison
of all the
existing techniques the
proposed
technique.



Weaknesses :
-


Avoiding queries involving
unrestricted
property problem which
are particularly prevalent for vertical partitioned
scenarios.


Accuracy of clustering for
property
tables.


Performance may differ when using different underlying
databases.




R
ESULTS














From the results, it is clear that proposed storage scheme
outperforms the exiting methods in terms of query time.





I
MPROVEMENTS



S
PATIAL

P
ERSPECTIVE


Schema design
-

Queries are fired on vertically partitioned tables as
well as overflow tables. Owing to the heaviness of spatial data, there
should be some spatial indexing like R* TREE or GRID to make these
queries faster.


Restrictive nature
-

Spatial queries are not restricted to only specific
“properties” which is an important assumption on their part.


E.g. Landmarks


Tables should be partitioned in a better way rather than just
handling one property per table!


e.g. Grouping similar properties together based on domain knowledge.