Effective querying of web-distributed ontology repositories

walkingceilInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 4 χρόνια και 11 μήνες)

151 εμφανίσεις

Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)



Roshan Maharjan


Department of Computer Engineering

Kyung Hee University

Yongin, Gyeonggi, Korea, 446
-
701

E
-
mail:roshan@oslab.khu.ac.kr



Young Koo
-
Lee, Sung Young Lee

Department of Computer Eng
ineering

Kyung Hee University

Yongin, Gyeonggi, Korea, 446
-
701

E
-
mail:
yklee@khu.ac.kr,sylee
@oslab.khu.ac.kr





Abstract


The o
ntology data in Semantic Web
applications is distributed in nature, located in multiple data
sources. Integrated access to the
se data, storing and querying
is essential in distributed environments. Popular ontology
repositories, Jena, Sesame and YARS are in early stages of
development for storing and querying distributed ontology
repositories.

They still lack reasoning capabiliti
es and its
consequences

along with

poor effectiveness.

In this paper, we
present a method for effective querying of web
-
distributed
ontology repositories using SPARQL protocol. With reasoning
capabilities for OWL
-
Lite in consideration, we describe how
we c
an achieve completeness of results by careful formulation
of ontology data from different repositories. We also present
detail analysis of reasoning strategy and some optimization
techniques in order to achieve complete results while querying
distributed r
epositories.


I.

I
NTRODUCTION

Ontologies are formal representations of real world
concepts and relationships among them. In order to provide
semantics to web
-
resources, instances of such concepts and
their relationships are used to annotate them.
Recently
ont
ologies have also been accepted as an effective solution to
represent ubiquitous contexts.
W3C recommendations for
knowledge representation and exchange are
based on
the
resource description framework (RDF)

[
8
]. RDF represents
the ontology in the form of t
riples of subject
-
predicate
-
object.
RDF is the basic data model of knowledge representation with
RDFSchema being extension of RDF with more vocabulary.
Even with RDFSchema, RDF expressiveness is limited. Still it
provides a good foundation for interchangin
g data and
enabling true semantic web languages to be layered on top of
it.

Recent systems use OWL, a w3c standard for ontology
definition language on the web. OWL imports more
vocabularies and rules providing more expressiveness than
RDFSchema. It is divi
ded into three sub
-
languages OWL
-
Lite,
OWL
-
DL and OWL
-
FULL based on the express ability.
SPARQL [14] is a query language for querying ontology data
represented in RDF.

It provides facilities to

extract
information in the form of resources, blank Nodes, or
literals,
and also extract subgraphs based on queries.

Since ontology data in Semantic web applications are
scattered over different sources, information integration is a
problem. Reasoning means deriving facts that are not
expressed in ontology or knowled
ge base explicitly. This
condition of information integration becomes worse when
reasoning about the vocabularies of ontology data is
considered, since results (combination of triples) from one
repository affect the execution of results from another
reposi
tory which forms the distributed knowledge. This
problem leads to incomplete results when reasoning is taken
into account with more vocabularies like OWL
-
Lite. This
problem is further illustrated from Figures 3 and 4 in section
V
.
In this paper we present
method on how we can achieve
completeness of results by careful formulation of triples from
different distributed repositories.

The organization of the paper is as follows. Section
II

briefly reviews related works regarding our proposed method.
We present

a problem statement and

motivating example in
section III
.Section
IV

explains about the preliminaries of
OWL data model.
Requirements for reasoning completeness
and query answering has been described in Section
V and
section VI

respectively. The proposed
method is further
illustrated by experimental results using popular data sets that
are shown and explained in Section
VII
. Finally conclusions
and future enhancements are presented in Section
VIII
.


II.

RELATED

WORKS

Though data distribution has been a researc
h topic in the
field of database systems for long time, some systems like
Sesame, Jena are in the early stages of development in terms
of ontology data distribution in repositories. Stuckerscmidt et
al [
13
] mentions about the development of storage and que
ry
infrastructure on top of sesame. The system extends Sesame in
terms of mediator component that provides centralized access
to a collection of local and remote data sources. They
proposed index structures and algorithms for optimized
querying of distribu
ted repositories without having to retrieve
and store data in a single model. Certain approaches and
techniques from query algorithms and optimizations were
derived in the context of databases. However their mechanism
doesn’t focus on reasoning capabilitie
s and inference
completeness is not achieved. Quilitz et al [
11
] presented
Effective querying of web
-
distributed ontology
repositories

Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)



DARQ, an engine for querying distributed SPARQL queries.
It provides transparent query access to multiple SPARQL end
points. It decomposes a query into sub
-
queries, each of which
ca
n be answered at an individual repository. It also uses query
rewriting and cost based query optimization to speed up query
execution. Harth et al [
1
] studied distributed index methods
for graph structured data and parallel query evaluation
methods on a cl
uster of computers. They focused on data
scalability, devised distributed hash tables as data structures,
optimized network transfer and multi
-
threaded query
processing to achieve acceptable query performance on large
datasets in a distributed system. Neve
rtheless, reasoning is still
an

issue here and for large distributed repositories, accurate
and complete query answering is an important aspect.


Georgia et.al

[6]

proposed mechanism that facilitates
querying of distributed RDF repositories using SPARQL
pr
otocol. They claim to retrieve data exploiting knowledge of
reasoning capabilities but don’t present any detailed analysis
of vocabularies. They still lack support of much important
expressive knowledge of OWL
,

its
requirements for reasoning
completeness

and how reasoning is performed. We propose to
implement RDFS
chema

concepts and OWL
-
Lite features to
achieve more completeness. We present some analysis of
overheads and optimization of query answering.


III.

PROBLEM

STATEMENT

As stated earlier, r
easoning means

deriving facts that are
not expressed in ontology or knowledge base explicitly.

OWL
consists of different vocabularies like RDFSchema
fe
atures(Class, rdfs:subClassOf,
Property, rdfs:subPropertyOf
etc), OWL equality and inequal
ity features (equivalentClas
s,
equivalentProperty etc), OWL property characteristics
(
inverseOf, TransitiveProperty,
S
ymmetricProperty) and
others [15
].
When reasoning about the vocabularies of
ontology data is considered, information integration is more
complicated as results (combin
ation of triples) from one
repository affect the execution of results from another
repository
which

forms the distributed knowledge. This
problem leads to incomplete results when reasoning is taken
into account with more vocabularies like OWL
-
Lite.

In othe
r
words, when we collect inferred statements from each
individual repositories and combine the results only, some
results are left out. So using reasoning analysis, we define the
requirements to get all the complete results from OWL
-
Lite
vocabulary in the
next section.

Let

us
further illustrate

our
problem definition
with a motivating example

from Figures 3
and 4
.

W
h
en we query to get the super classes of sub11 in Figure
3
, then by evaluating against each repository separately and
merge the results, we woul
d get the results only obj11 and
obj22. However by careful formulation of triples in each
distributed repository that take part in the final results and
taking proper
requirements for reasoning completeness
, we
can find actually four results Obj11, Obj21,O
bj31 and
Obj32.So as in the figure
4
, we will be able to get the
complete set of results by using our query answering method.
We try to incorporate all OWL
-
Lite vocabularies using its
detail
study of requirements of

reasoning

completeness
.



Figure 3
: Ex
ample of objects retrieved during query
answering from 3 diff. repositories




Figure 4: Example of objects retrieved after some
formulation of necessary data for evaluation after
introducing reasoning/inferencing capabilities.



IV.

OWL

DATA

MODEL

RDF is a f
irst ontology definition language that defines a
model for describing relationships among concepts. RDF
stands for resource description framework which basically
provides data model for describing and manipulating metadata
on web resources. It provides a s
imple tuple model subject,
predicate, and object as a statement <S, P, O> to express
knowledge.



Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)




Figure 1:
OWL

data model example


RDFSchema is an extension of RDF which provides
additional tags to define more vocabulary such as class,
subclass, range,
domain relations. RDF in common with
RDFSchema can be used to define simple ontologies in the
semantic web. These statements can be expressed using XML
together with XML namespaces .One of such formats is
RDF/XML notations.

The Web ontology language (OWL)
[12] was developed
after RDFSchema and allows for more complex ontologies,
now becoming standard. OWL supports all RDFSchema
concepts and restrictions like Property cardinality and class
intersection/unions of other classes. Figure 1 shows an
example of OW
L data model. As suggested by Hstar [4] we
can formalize data model of OWL as shown in Figure 2. It
summarizes most of the characters of OWL
-
Lite vocabularies.
Based on this model, we can analyze the reasoning
completeness or completeness of results.

OWL
d
ata has been divided into three categories by this
data model: one consists

of C, P, I, which respectively
represent OWL Class, OWL Property and Individual
Resource;

one consists of
R
C
,
R
P
,
R
I
,
R
CP
,
R
CI
, which
respectively represent relation

of element
s in C, relation of
elements in P, relation of elements in I, relation between
elements

in C and elements in P, relation between elements in
C and elements in I; the last

one is
T
P
, which represents
characters defined on OWL Property, including transitive

P
T
,
symmetric
P
S
, functional
P
F
and inverse functional
P
IF
.
C
,
P
,
R
C
,
R
P
,
R
CP
and

T
P
are used to define. Most of OWL data
focus

on
R
I
, which use Ontology to describe type and relation
information of elements in
I
.

Completeness of inference
includes tw
o aspects: one is to get complete relation of
R
C

and
R
P
, the other is to get complete relation of
R
I
and
R
CI
. The
former represents

complete ontology and the latter represents
complete ontology instances.


V.

REQUIREMENTS

FOR

REASONING

COMPLETENESS

In secti
on 4, we discussed whole OWL data model and
relationships occurring between classes, properties and
instances. As per our previous discussions; we analyze
requirements for reasoning completeness for all the
relationships found in OWL
-
Lite vocabulary. Our g
oal is to
get complete relationships of
R
C
,
R
P
,
R
I
,
R
CP
,
R
CI


as part of
reasoning completeness and analysis. We proceed discussions
on these relationships as follows:


Basically, there are
three categories of OWL
-
Lite
vocabulary which lead to imcomplet
eness , namely
relationship between classes, relationship between properties
and relationship between instances. As we have seen an
example of subClassOf relation of classes which occur
incompleteness. For relationship
R
C
,
to be complete, we need
to meet s
ome requirements
i.e equivalentClass and subClassOf
relation to get complete results. Similar to relationship
R
C
, we
can analyze for


R
P

,
R
I

,
R
CP

,
R
CI
.



Reasoning completeness of
R
C
,
R
P
.


Elements in
R
C
, is characterized by equivalence relation
Ci≡Cj and inheritance relation Ci

Cj. Inheritance has both
transitive characteristic while equivalence has both transitive
and symmetric characteristics. So the complete relationship
R
C

can be satisfied by follow
ing requirements below.





Ci


C,can get all {Cj|Cj


C




Ci≡Cj}




Ci


C,can get all {Cj|Cj


C




Cj

Ci}




Ci


C,can get all {Cj|Cj


C




Ci

Cj}



Figure 2: OWL
-
Lite data model



From the first relation, we can get all explicit and
implicit data
that satisfies equivalence relationship
Ci≡Cj
.
Both

other
relations are related with inheritance relationship. For
inheritance relation
Ci

Cj
, transitive character makes Ci

Cj

Ck => Ci

Ck
. Similarly, we can get inheritance relation and
equiv
alence relation defined in
R
p

as those in
R
C

. Also for
relation {Pi↔Pj | <Pi owlInverseOf Pj>


D} in
R
p
, we can
get data using the rule
<Pi owlInverseOf Pj>

<URIx Pi
URIy>=> <URIy Pj URIx>. This means when two properties
Pi and Pj are defined as i
nverse of each other in OWL
-
Lite
vocabulary, if any URIx has property Pi with URIy, then
URIy has property Pj with URIx.





Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)



Reasoning c
ompleteness
of
R
I


Relationship in
R
I

is affected by two sub
-
relations as defined
below.

R
I1=
{[URIi, URIj] |
∃ <
URIi P
x URIj>


D.

R
I2=
{URIi

≡URIj |
∃ <
URIi owl:sameAs URIj>


D.


Similar to
R
C

or


R
P

relations,
R
I
2
relation is an
equivalence relation that affects
R
I
1
relation.


We can compute the equivalence relation among instances
from the functional property (P
F
) and inverse
functional

property (P
IF
) by using the following rules as well.

<URI
i

P
x

URI
k
>

<URI
j

P
x

URI
k
>

Px


P
IF

=>
URIi

≡URIj

<URI
k

P
x

URI
i
>

<URI
k

P
x

URI
j
>

Px


P
F

=>
URIi

≡URIj


In addition we need to consider other three relations that
affe
ct
R
I
1
,

such as
P
T
,

P
S

or R
P
.

1.

P
T

has transitive
characteristic

which can be
computed similar to those in
R
C

and
R
P
.

2.

P
S

or symmetry relation is computed by using rule

Px


P
S



<URI
i

P
x

URI
j
=> <URI
j

P
x

URI
i>

3.

R
P

as

discussed above affect
the

R
I

relation.

For
instance, if property is transitive, we use the rule


P
i

P
j


<URIx Pj URIy>=> <URIx Pi URIy
>.


We first compute inheritance relation and then bring
instances from the rule above.


Reasoning completeness
of
R
CI


R
CI

relation

is affected by
R
C

and
R
CP

which involves class,
property and instance relationships. We can compute complete
R
CI

by using following rules.


<
URIx

rdf:type C
j>


Ci

Cj => <URI
x

rdf:type Ci
>

<P
x

rdfs:domain

Cy
>


<URI
i

P
x


URI
j
>=> <URI
i

rdf:type Cy
>

<P
x

rdfs:range

Cy
>


<URI
i

Px URI
j
>=> <URI
j

rdf:type Cy
>.



VI.

QUERY

ANSWERING


Our query answering mechanism has two distinct features.
At first , our method queries each distributed ontology
repositories and collect all
the statements/triples

which can be
combined in

order to infer the new statements. Secondly it
uses reasoning in the mediator repository which infers new
statements using rules required; subsequently achieves
reasoning completeness in the final results.After all the triples
participating in the reaso
ning process are accumulated,
evaluation of the initial query is done. Details of each of these
features are better explained in the following algorithms.

On the other hand, unlike our system, sesame executes part
of the initial query in each distributed r
epositories to locate
relevant information and retrieves it and combine

it

in the
mediator repository. They propose sophisticated indexing
structures for deciding which part of query to direct to which
information source. Besides they don’t focus on reason
ing
capabilities because of which they don’t exphasize on
retrieving useful data from repositories.

According to reasoning

completeness requirement

analysis,
we present our query answering method and our infrastructure
is based on popular sesame architectu
re. It has a mediator
component that provides centralized access to a
collection of

local

and remote sources. Some algorithms for distributed
querying are derived from sesame architecture itself.
SPARQL

[14]

is a query language for accessing such RDF graph
s. It
provides facilities to

extract information in the form of URIs,
blank Nodes, plain and typed literals, to extract RDF sub
graphs and to construct new RDF graphs based on information
in the queried graphs.

We

now analyze the different parts of a
SPARQ
L query .Query patterns are those which occur after
WHERE clauses of a SPARQL Query as shown in
Table 1
.

In order to make sure that reasoning
completeness

is
achieved, we need to retrieve all the participating triples and
store them in a mediator repositor
y. Algorithm 1 accomplishes
this goal.

Here we treat each query as a simple string and our
method searches for different fields of query based on the
specifications of SPARQL query. Triple patterns occur after
the WHERE clause of SPARQL query and are writ
ten as
whitespace separated list of subject, predicate and object. In
this way we extract subjects, predicates and objects of an
original query and search for matching triples in each
distributed repositories.
This algorithm basically collects all the
tripl
es that matches either of the subject, predicate or object of
the query.This avoids unncessary triples being fed for query
answering since retrieval process adheres only to term’s
restrictions. For example in Figure 4, after using this algorithm,
four trip
les are collected that gives the answers for the given
query.

Let us further illustrate the algorithm by using some
example queries in Table 1. In query 5, using our method we
get all the triples that matches individual components of triples
such as rdf:ty
pe, ub:Person, ub:memberOf and <

http://www.Department0.University0.edu>
. Here, ?X is
ignored since it matches all the triples. However in Sesame,
they
first
consider the original query
( Select ?X where
{
?X
rdf:type ub:Person .
?X ub:memberOf
<http://www.De
partment0.University0.edu>}
)
, they compare
the original query expression with the index in each repository.
If they find the results, they don’t need to consider joining
answers from sub
-
queries in same repository.
Further, t
hey
decompose original query in
to sub
-
query like
{
select ?X
where
?X rdf:type ub:Person
} and

{

select ?X

where
?X
ub:memberOf
<
http://www.Department0.University0.edu>}
.

Turning to
sub
-
queris
, they also find out that if each repository
contains results for the
sub
-
queries

{select ?X wher
e ?X
rdf:type ub:Person} and { select ?X where ?X ub:memberOf
<http://www.Department0.University0.edu>}

so that they can
join answers in order to compute results. Similarly for Query
12, using our method we find all the triples from each
repository that ma
tches rdf:type, ub:Chair, ub:Department,
ub:worksFor, ub:subOrganizationOf and
Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)



<
http://www.University0.edu
>. Sesame, on the other hand,
first finds if original query

( Select ?X, ?Y where


{?X rdf:type ub:Chair .

?Y rdf:type ub:Department .

?X ub:worksF
or ?Y .

?Y ub:subOrganizationOf <http://www.University0.edu>}
)

can give the results in each repository. If they find the results,
they don’t need to consider joining answers from sub
-
queries
in same repository. They further decompose the query into
sub
-

qu
eries
to
( Select ?X, ?Y where

{?X rdf:type ub:Chair .


?Y rdf:type ub:Department .


?X ub:worksFor ?Y }) and



( Select ?X, ?Y where

?Y rdf:type ub:Department .

?X ub:worksFor ?Y .

?Y ub:subOrganizationOf <http://www.University0.edu>})

and subquerie
s are executed in each repository
.
Furthermore they decompose into
smaller

subqueries like


( Select ?X, ?Y where

{?X rdf:type ub:Chair .


?Y rdf:type ub:Department
},


( Select ?X, ?Y where

{

?Y rdf:type ub:Department .


?X ub:worksFor ?Y })

and




( Select ?X, ?Y where


{
?X ub:worksFor ?Y

?Y ub:subOrganizationOf <http://www.University0.edu>})

and finally into individual triple queri
e
s like

( Select ?X, ?Y where


{?X rdf:type ub:Chair
}),


( Select ?X, ?Y where

?Y rdf:type ub:Department

}),


( Select ?X, ?Y where

?X ub:worksFor ?Y
})
.


( Select ?X, ?Y where

?Y ub:subOrganizationOf <http://www.University0.edu>})



All these subqueries are executed in each repository till we
get the answers and
combine the answers to get the complet
e
results.

It is to be noted that subsequent sub
-
queries don’t need
to executed for the same repository if the answers are found as
these answers are already covered by longer queries.



Algorithm 1:

formulation of triples


subject

S
=

{}
,

predicate

P
=
{}
;
Object
O
={};

Q
:
Query

M
:
RDF model in mediator repository

N
:

set of statements st.

Input: Q

Output: st

1:
extractQuery (

Q
)

2: Start

3; While not end_of_String(Query string Q)

4: S

findSubject()

5: P
←find
Predicate
()

6: O
←find
Object
()

7: Return S,P,O

8: End

9
:
For
each repository

10
:
For each
statement st in

∑ N

11
:

Find statement st such that

12
:



(S=st.S
ubject

or P=st.P
redicate

or O=st.O
bject
)

13
:

Add st to M

14
:


End for

15
:

End for


After that, reasoning algorithm according to our OWL
-
Lite
completeness analysis has to be executed as part of query
evaluation process. Some part of algorithm is shown in
Algorithm 2.


Algorithm
2
:

reasoning algorithm


Q
:
Query

M
:

RDF model in mediator repository

N
:

set of statements st.

K
:
set of new inferred statements

Ki
:

the set of statements inferred during iteration i.

Input: rule r

Output:
res
r



1:
let K0 = K


2:
let i = 0


3:
while (Ki≠0;):


4:

let i = i + 1


5:

for each rule r:


6:

if (triggered(r)) then


7:

res
r

← applyRule(r, Ki−1)


8:

add res
r

to Ki


9:

endif

10:

endfor

11:
endwhile



Al
gorithm for reasoning

based on Sesame and
Rete

Algorithm

[
5
]


It consists of a simple loop that

iterates over the set of
entailment rules and terminates when no new statements have
been

derived in the last iteration.
Lines 1
-
12 in algorithm 2
s
ummarizes th
e application of entailment rules.
It only applies
a particular entailment rule in iteration i when

this rule has
been triggered: a fact has been newly derived in iteration i

1
that matches

a premise of the entailment rule.


Procedure
applyRule(i, K) :


1: let K0=
ϕ

;



2:
if i=

1
-
1 then //
case

1
-
1


3:

executeRule(
1
-
1
,
K);

Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)




4:
elseif i=

1
-
2

then //
case

1
-
2



5:

executeRule(1
-
2
,
K);


6:
elseif i = 2 then //
case

2



7:

executeRule(2
,
K);


8:
elseif i=

3 then //
case

3


9:

executeRule(3
,
K);

10:
elseif i = 4 then //
case

4

11:

for each
statement
st

in

∑ K do:

12:

if(st.subject owl:equivalentClass st.object)

13:

e
xecute
Rule(
4
,K) // symmetric property

14:



execute
Rule
(
5
,K) //transitive property

15:

endif

16:

endfor

17:

elseif i=

5 then //
case

5

18:

for each
statement
st

in

∑ K do:

19:

if(st.subject owl:subCla
ssOf st.object)

20:

executeRule
(
5
,K) /// transitive property

21:

endif

22:

endfor

23:
elseif i= 6 then //
case
6

24:

executeRule(6,K)

25:
elseif i = 7 then //
case
7

26:

for each
statement
st

in

∑ K do:

27:


if(st.subject owl:equivalentProperty st.object)

28:

execute
Rule(
4
,K) // symmetric property

29:

execute
Rule(
7
,K) //transitive property

30:

endif

31:

endfor

32:
elseif i=

8 then //
case

8

33:

for each
statem
ent
st

in

∑ K do:

34:

if(st.subject owl:subPropertyOf st.object)

35:

execute
Rule(
7
,K) /// transitive property

36:

endif

37:

endfor

38:
elseif i= 9 then //
case

9

39:

executeRule(
4
,K)

40:
elseif i = 10 then //
ca
se

10

41:

executeRule(
5
,K)

41:.
endif



/// Similar for other rules in
cluding instance relationship
R
I

,
R
CP


Procedure
execute
Rule(
r
, K) :


1: if r

= 1
-
1

then // rule 1
-
1



2:

for each
statement
st

in
∑ K do:


3:

if(st.subj
ect rdf:type rdf:Class)


4:

create new statement t:


5:

(t.subject =st.subject rdf:type rdf:Resource)


6:

add t to M


7:

add t to K’


8:

endif


9:

endfor

10: else
if
r
=

1
-
2

then

11
:
for each
statement
st

in

∑ K do:

12:

if(st.object rdf:type rdf:Class)

13:

create new statement t:

14:


(t.subject=st.object rdf:type rdf:Resource)

15:

add t to M

16:

add t to K’

17:

endif

18:

endfor

19: …………………………

20
: ……//Similar as above for rule

2
-
3
……

21:
elseif
r
=
4

then // rule
4

22:

for each
statement
st

in

∑ K do:

23:

create new statement t:

24:

(t.subject =st.object t.predicate=st.predicate

25:

t.obje
ct=st.subject)

26:

add t to M

27:

add t to K’

28:

endif

29:

endfor

30:

endif

31: elseif r

=
5

then // rule
5

32:

for each
statement
st

in

∑ K do:

33:

find statement m ∑ M :

34:


if(st.subject rdf:subClassOf st.object)

35:



create new statement t

(t.subject =st.subject


36:
rdf:SubClassOf

t.object=m.object)

37:

add t to M

38:

add t to K’

39:

endif

40:


endfor

41:

endif

//similar for other rules, rules are explained below.


For all the rules, depending on the rule type, the rules are
executed which infers the new statements. In procedure
applyRule (i, K)
, the first
case

is (Lines 2 and 3) which
is
executed by Sub
-
procedure executeRule (i, K). It states that
for every statement where the subject is a type of class, we can
also infer that this particular subject is also a type of resource.
Similarly
Rule 1
2

states that for every statement where the

object

is a type of class, we can also infer that this particular
object
is also a type of resource. Lines (10
-
18) in sub
-
procedure executeRule (i, K) apply this rule. Other rules are
executed in the similar way. In addition, one OWL
-
Lite
specific example

is
case

4. This
case

states that every
equivalent class is symmetric and transitive. Lines 10
-
16 in
procedure applyRule (i, K) apply this
case

which in turn,
executes two rules regarding symmetric and transitive
property. Rules
4

and
5

in sub
-
procedure
executeRule (i, K)
are responsible for inferring new statements adhering to these
rules. Rule
4

(Lines 21
-
30) infers that if any statement has
symmetric property, subject of newly inferred statement is
object of that statement and subject becomes object fo
r that
newly inferred statement. Similarly Rule
5

(Lines 31
-
41)
infers new statements from the statements which conform to
transitive property. Since equivalence affects inheritance,

complete equivalence relation

should be computed first before
inheritance

.
We can take a look at premises and conclusions
of some rules in the following section. Since description of all
Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)



the rules are out of scope of this paper, we show examples of
some specific rules only.


Here, as stated earlier, we want to execute all the
rules related
to rdf:Class to achieve the completeness of
Rc
(Relationship
between classes)

and the completeness of
R
p
.

In the
explanation of rules, premise (condition) and result is
delimited by a single dotted line.


Rule 1
-
1

:Every
subject of a statemen
t, is a type of
R
esource
. In other words, if there exists a statement with
any subject as Subject, then that subject is a type of
Resource.


(
S
ubject


P
redicate


O
bject
)



(
S
ubject

type

Resource)


Rule 1
-
2

:Every
object

of a statement

is
a

type

of

Resource
.

In other words, if there exists a statement with
any
object

as
Object
, then that
object

is a type of
Resource.



(
Subject


Predicate


Object
)


(
Object

type Resource)


Rule 2
:

If any subject is a type of Class, then that subject
is a subC
lassOf Resource.


(
S
ubject

type Class)



(
S
ubject

subClassOf Resource)


Rule 3
:

Every class is a subclass of itself (reflexivity)
.





(
S
ubject


type Class)



(
S
ubject

s
u
bClassOf
S
ubject
)


Rule 4
:

Symmetric property
. For any statement whose
predicate is a type of symmetric property, then subject
becomes object and object becomes subject for new
statement.

(
Subject


Predicate


Object
)

(
O
bject


Predicate


Subject
)


Rule
5
:

Class

transitive property


(
Subj
1

SubClassOf


Obj
1
)
^


(
Obj1


SubClassOf


Obj
2
)



(
S
ubj
1

SubClassOf

O
bj
2
)


Rule 6
:

Inverse property.

If property Pi is defined as
inverse of property Pj and Some URIx has predica
te Pi
with URIy, then URIy has predicate Pj with URIx.

<Pi owlInverseOf Pj>


<URIx Pi URIy>

<URIy Pj URIx>


Rule
7
:

Predicate

transitive property.

If property Pi is
defined as
transitivve to

property Pj and Some
URIx has
predicate Pj with URIy, the
n URIx has predicate Pi with
URIy
.

P
i

P
j


<URIx Pj URIy>


<URIx Pi URIy
>.



From the example from Figure 3 and Figure 4 respectively,
Algorithm 1 formulates all the triples that will participate in
the reasoning process in the mediator repository a
s in Figure
4.Now we use reasoning algorithm that derive new facts from
OWL vocabulary and explicit facts. Here in this example we
have rdfs:subClassOf property description. From the
statements <sub11> <rdfs:subClassOf> <Obj21> and
<Obj21> <rdfs:subC
lassOf> <Obj31>, after using
reasoning algorithm, we can derive new fact or statement
<sub11> <rdfs:subClassOf> <Obj31>. In this way, we get
hidden statements from distributed repositories which
otherwise could have never been retrieved leading to
inc
ompleteness of results.
Finally we can get the complete
results from the mediator repository.


The first algorithm brings all the triples that take part in
the formulation of results.
The

second

algorithm

infers the
new statements that adhere to the
OWL
-
Lite vocabulary rules.
Besides, this algorithm is guaranteed to terminate where

each
new iteration is applied only to statements

newly derived in
the previous iteration. Since the total set of statements in the

closure
(combination of both explicit and
inferred or implicit
statements)

is finite, algorithm terminates when no new
statements can be derived (that is,

when the complete closure
has been computed).

Furthermore, this method applies all the
rules specified for the OWL
-
Lite as defined by Horst.et.
al[7]
as a proof of concept This proves that our method works for
OWL
-
Lite semantics and assures
the completeness of results
when reasoning about these semantics is taken into account.


O
p
timizations



Some optimizations can be made to prune the search spa
ce
of the algorithm
.

Some rules though are dependent on other
rules, rules can be executed on remote repositories way before
query is executed.

For instance, rule 1 affects the premise
pattern of virtually
every
rules but it
doesn
’t

assert any new
predicat
es and statements.

As a consequence, it
doesn’t

affect
execution of other rules in other repositories.

We can identify
these kind
s

of rules and execute during load time in each
remote repository.

Some other optimizations are also possible
like query rewrit
ing and duplicate elimination which we try to
incorporate in our mechanism.


VII.

EXPERIMENTAL

RESULTS

We used two servers and four different
repositories

in order
to test for our
method.

For our experiments, we choose Lehigh University
benchmark

[15] containin
g 2.8 million triples as it is most
widely used benchmark for testing ontology repositories. We
created a university dataset, LUBM (1, 0), which file size in
RDF/XML format is 392 MB. Each query was evaluated for
10 times and averaged to get the completene
ss. We split the
dataset into multiple parts located at different end points i.e
distributed through web randomly.

Degree of completeness
can be calculated as:

Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)



Degree of completeness= Number of results
retrieved/Number of actual results in LUBM dataset *10
0%.

We repeated the experiments for LUBM (5, 0) and LUBM
(10, 0) datasets for further analysis. Here we show the results
of only the first dataset since there are only minor or no
difference between other two datasets.

As we can see in the test results all

the queries through 1 to
5, both sesame and our system can answer completely since
these queries don’t need reasoning analysis or they don’t
assume any inference.


Table 1
: Test Queries

Query#

Queries

1

{?X rdf:type ub:GraduateStudent .


?X ub:takesCo
urse

http://www.Department0.University0.edu/Gr
aduateCourse0}


2

{?X rdf:type ub:GraduateStudent .

?Y rdf:type ub:University .

?Z rdf:type ub:Department .

?X ub:memberOf ?Z .

?Z ub:subOrganizationOf ?Y .

?X ub:undergraduateDegreeFrom ?Y}

3

{?X rdf:type ub
:Publication .


?X ub:publicationAuthor
http://www.Department0.University0.edu/Ass
istantProfessor0}

4

{?X rdf:type ub:Professor .

?X ub:worksFor
<http://www.Department0.University0.edu> .

?X ub:name ?Y1 .

?X ub:emailAddress ?Y2 .

?X ub:telephone ?Y3}

5

{
?X rdf:type ub:Person .

?X ub:memberOf
<http://www.Department0.University0.edu>}

6

{?X rdf:type ub:Student

7

{?X rdf:type ub:Student .

?Y rdf:type ub:Course .

?X ub:takesCourse ?Y .

<http://www.Department0.University0.edu/As
sociateProfessor0>,

ub:teacher
Of, ?Y}

8

{?X rdf:type ub:Student .

?Y rdf:type ub:Department .

?X ub:memberOf ?Y .

?Y ub:subOrganizationOf
<http://www.University0.edu> .

?X ub:emailAddress ?Z}

9

{?X rdf:type ub:Student .

?Y rdf:type ub:Faculty .

?Z rdf:type ub:Course .

?X ub:advisor ?
Y .

?Y ub:teacherOf ?Z .

?X ub:takesCourse ?Z}

10

{?X rdf:type ub:Student .

?X ub:takesCourse

<http://www.Department0.University0.edu/Gr



aduateCourse0>}

11

{?X rdf:type ub:ResearchGroup .

?X ub:subOrganizationOf
<http://www.University0.edu>}

12

{?X
rdf:type ub:Chair .

?Y rdf:type ub:Department .

?X ub:worksFor ?Y .

?Y ub:subOrganizationOf
<http://www.University0.edu>}

13

{?X rdf:type ub:Person .

<http://www.University0.edu>
ub:hasAlumnus ?X}


For Query 6, not only does it bring the results as expli
cit
relationship i.e. SubClassOf relationship between
UnderGraduate and Student but also the implicit one between
Graduate and Student. Since our system use proper reasoning
analysis among existing repositories and proper formulation of
triples, we are abl
e to get 100% of results. However sesame
gives less number of results than ours as shown in the Figure.
Queries 7, 8, 9 are similar in the sense they add some more
classes or properties only. Query 9 brings only the implicit
SubClassOf relationship between

GraduateStudent and
Student. Query 11,12 and 13 are related more with OWL
-
Lite
vocabularies. For instance in Query 11,
property
subOrganizationOf is defined as transitive.

Since in the
benchmark data, instances of ResearchGroup are stated as a
suborganiza
tion

of a Department individual and the later
suborganization of a University

individual, inference about the
subOrgnizationOf relationship between instances of
ResearchGroup

and University is required to answer this
query
. Our system successfully provides

all the results
regarding this query. In query 12,
the benchmark data do not
produce any instances of class Chair. Instead, each
Department

individual is linked to the chair professor of that
department by property headOf.

Hence this query requires
realiz
ation, i.e., inference that that professor is an instance of

class Chair because he or she is the head of a department.




Figure
5: LUBM (
1
, 0)


VIII.

CONCLUSIONS

Because of
heterogeneous

nature of ontology data,
distributed querying and information integrati
on from various
Proceedings of the 1
st

International Conference on Emer
ging Databases (EDB2009)



sources is essential. From the
experiments

we presented how
we can achieve effective querying of web distributed ontology
repositories. With OWL
-
Lite vocabulary and SPARQL
protocol in mind, we showed how our method allows
having
completenes
s

of results from proper
study of requirements of
reasoning completeness

and careful formulation of triples
from distributed repositories. We also used some optimization
techniques to increase effectiveness and invites us to take
further steps in terms of
improvement through various query
processing optimizations techniques like join order statistical
analysis, distributed joins etc. Besides, further improvements
can be made for query
efficiency

studying overheads through
querying process. Furthermore expre
ssive level of
vocabularies of OWL like OWL
-
FULL can be implemented
for more matured and large distributed ontology repositories
where reasoning of semantic web language and
completeness

of results is crucial.



ACKNOWLEDGEMENT

This work was supported by
the Korea Science and
Engineering Foundation (KOSEF) grant funded by the Korea
Government (MEST)(No. R0A
-
2007
-
000
-
20101
-
1).


REFERENCES


[1]

Andreas Harth, Stefan Decker, "Optimized Index Structures for
Querying RDF from the Web," la
-
web, pp.71
-
80, Third Latin

American
Web Congress (LA
-
WEB'2005), 2005
.

[2]

Barton,

Stanislav. Designing Indexing Structure for Discovering
Relationships in RDF Graphs. In

Proceedings of the Dateso 2004
Annual International Workshop on Databases, Texts, Specifications and
Objects.

Ostrav
a: VSB

-

Technical University of Ostrava, 2004.

[3]

Broekstra

,

J., Kampman, A., Harmelen, F. (2002)."Sesame: a Generic
Architecture for Storing and Querying RDF and RDF Schema". 1st
International Semantic Web Conference (ISWC2002).

[4]

Chen

, Y
.
, Ou, J,Jiang, Y,
and Meng, X,

HStar
-

A Semantic
Repository for Large Scale OWL Documents (2006)
.

[5]

Forgy, C. (1982). Rete: A Fast Algorithm for the Many Pattern / Many

Object Pattern Match Problem. Artificial Intelligence, 19:17

37.

[6]

Georgia D. Solomou,

Dimitrios A. Koutsomit
ropoulos
, and Theodore S.
Papatheodorou. Semantics Aware Querying of Web
-
Distributed RDF(S)
Repositories. In

Proc. of 1st Workshop on Semantic Interoperability in
the European Digital Library (SIEDL 2008), 5th European Semantic
Web Conference (ESWC 08),

pp
. 39
-
50, 2008.

[7]

Herman J. ter Horst

Combining RDF and Part of OWL
w
ith
Rules:Semantics, Decidability, Complexity

In Proc. of ISWC 2005,
Galway, Ireland

[8]

Klyne, G., Carroll, J. J., (eds): Resource Description Framework
(RDF):Concepts and Abstract Syntax. W3C
Recommendation,
http://www.w3.org/TR/2004/REC
-
rdf
-
concepts
-
20040210/

[9]

K. Wilkinson

, C. Sayers, and H. Kuno "Efficient RDF Storage and
Retrieval in Jena2", Int. Conf. on Semantic Web and Databases, 2003.

[10]

Pan, Z. and Heflin, J. DLDB: Extending Relational Dat
abases to Support
Semantic Web Queries. In Workshop on Practical and Scalable
Semantic Web Systems, ISWC 2003.

[11]

Quilitz
, B., Leser, U.:
Querying Distributed RDF Data Sources with
SPARQL
. In: The Semantic Web: Research and Applications. Springer
Berlin / Hei
delberg (2008) 524
-
538

[12]

OWL Web Ontology Language Reference


W3C Working Draft, 10
February 2004. http://www.w3.org/TR/owl
-
ref/, 2004.

[13]

Stuckenschmidt, H., Waard, A. de, Bhogal, R., Fluit, C., Kampman, A.,
Buel, J. van, Mulligen, E. van, Broekstra, J., Cr
owlesmith, I., Harmelen,
F. van, Scerri, T.: Exploring large document repositories with rdf
technology
-

the dope project. IEEE Intelligent Systems (2004)

[14]

W3C SPARQL Query Language for RDF http://www.w3.org/TR/rdf
-
sparql
-
query.

[15]

Y

.

Guo

, Z. Pan, and J. Hef
lin. An Evaluation of knowledge Base
Systems for Large OWL Datasets. In
Proceedings of the 3rd
International Semantic Web Conference, Hiroshima
, pages 274

288.
LNCS 3298, Springer, 2004.