Efficient Processing of Semantic

farmpaintlickInternet and Web Development

Oct 21, 2013 (4 years and 18 days ago)

93 views

Efficient

Processing
of

Semantic

Information on
the

Web


Georg Lausen

Technische Fakultät

Universität Freiburg


The
amount

of available information on Web still is
increasing

rapidly
.



(Semi
-
)Automatic Data
Extraction

.



Resource Description Framework (RDF)
.



SPARQL
is the standard query language for RDF
.



Efficiency and Scalability of query processing
.

Processing
of

Semantic

Information on
the

Web

Efficiency
and

Scalability
: A
Variety

of

Approaches


Single
machine

RDF
stores



Parallel Database Approach:
Vertica

and

others




Approaches
based

on
Hadoop

(
MapReduce

Paradigm
)


Hadoop


Hadoop
++


Integration
of

databases
:
HadoopDB


Language
translation


Mapping SPARQL
to

Hadoop
/
HBase

directly


Mapping SPARQL
to

Pig

Latin




Non
Hadoop

clusters


Cluster
-
based

Parallelism

vs

Parallel Database/Single
Machine

RDF
-
Store


Each

technology

has

its

own

advantages

and

problems
.


Rough

characterization
:

Querying

Loading

Parallel Database / Single
Machine

RDF
-
Store

+

-

Cluster
-
based

Parallelism

-

+

Loading

in
the

context

of

Web
research
:
E
xtract

T
ransform
L
oad

schema
.


SPARQL
provides

a
declarative

way

for

specifying

the

transformation

and

querying
.

ETL
and

Querying

in
the

context

of

Web
research

Web
documents

Initial RDF
graph

RDF
store

E

L

T

Efficient

Loading

Efficient

querying

SPARQL

PigSPARQL
: Mapping SPARQL
to

PigLatin
;
to

appear

Semantic

Web Information Management


SWIM 2011