Comparison of Ontology Reasoning Systems Using Custom Rules

addictedswimmingΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

114 εμφανίσεις

WIMS 2011,
Sogndal
, Norway

1

Comparison of Ontology Reasoning
Systems Using Custom Rules


Hui

Shi, Kurt Maly, Steven Zeil, and Mohammad Zubair


Contact
: maly@cs.odu.edu

WIMS 2011, Sogndal, Norway

2

Outline


Introduction


What are we evaluating


What is the approach we are taking?


Background



Existing benchmarks


Ontology systems supporting custom rules


Experimental design


Data and custom rules


Metrics and evaluation procedure


Results


Setup time


Query processing


Transitive rule


Caching


Conclusions



WIMS 2011, Sogndal, Norway

3

Introduction
-

Problem


Problem



Scalability issues in the context of a question/answer system
(called ScienceWeb) in the domain of a knowledge base of
science information that has been harvested from the web



ScienceWeb is being built using
ontologies
, reasoning systems
and custom based rules for the reasoning system



Approach


Use existing benchmarks extended with


Custom inference rules


Generate more realistic data


In the ScienceWeb environment

WIMS 2011, Sogndal, Norway

4

Background


Existing semantic application: question/answer systems


AquaLog
,
QuestIO
, QUICK
-

natural language input


Semantic Web


Resource Description Framework(RDF)


RDF schemas


Web Ontology Language (OWL) for specific knowledge
domains


SPARQL query language for RDF


SWRL web rule language


Reasoning systems


Jena proprietary Jena rules


Pellet and KANON supporting SWRL


ORACLE 11g


OWLIM




Background



Existing performance studies on OWL based reasoning
systems only on native rule sets


Varying complexity of
Abox

and
Tbox


Tbox

(contains the axioms defining the classes and relations in
an ontology)


Abox

(assertions about the individuals in the domain)


Existing benchmarks to generate
ontologies


Leigh University Benchmark(LUBM)


University Ontology Benchmark (UOBM) and extension of LUBM


WIMS 2011, Sogndal, Norway

5

Background


Ontologies with custom rule support



Jena: in memory and persistent store, SPARQL,
forward chain and backward chain


Pellet: open source, descriptive logic, SQRL


KAON2: free, SWRL,F
-
logic, SPARQL


Oracle 11g: native inference using database,
forward chaining, OWL


OWLIM: OWL, rules and axiomatic triples


WIMS 2011, Sogndal, Norway

6

WIMS 2011, Sogndal, Norway

7

General comparison among ontology reasoning systems

Experimental Design
-

ontology


Baseline: LUBM


ScienceWeb: use own data generator for
ontology instance data (
UnivGenerator
)


Classes are more detailed


Data are more realistic (e.g., faculty with
advisors in different universities, co
-
authors at
different universities)


WIMS 2011, Sogndal, Norway

8

Class tree of research community ontology.

WIMS 2011, Sogndal, Norway

9

WIMS 2011, Sogndal, Norway

10

Size range of datasets (in triples)

Experimental Design


Rule Set

WIMS 2011, Sogndal, Norway

11

WIMS 2011, Sogndal, Norway

12

Rule set 1: Co
-
author

authorOf
(?x, ?p)
authorOf
(?y, ?p)
coAuthor
(?x, ?y)

Rule set 2: validated Co
-
author
authorOf
(?x, ?p)

authorOf
(?y, ?p)
notEqual
(?x, ?y)
coAuthor
(?x, ?y)



Rule set 3: Research ancestor (transitive)

advisorOf
(?x, ?y)


researchAncestor
(?x, ?y)

researchAncestor
(?x, ?y)
researchAncestor
(?y, ?z)




researchAncestor
(?x, ?z)

WIMS 2011, Sogndal, Norway

13

Rule set 4: Distinguished advisor (recursive)

advisorOf
(?
x,?y
)
advisorOf
(?
x,?z
)
notEqual
(?
y,?z
)


worksFor
(?
x,?u
)




distinguishAdvisor
(?x, ?u)

advisorOf
(?
x,?y
)
distinguishAdvisor
(?
y,?u
)
worksFor
(?
x,?d
)
distinguishAdvisor
(?x, ?d)


Rule
set 5: combination of above 4 rule sets.



WIMS 2011, Sogndal, Norway

14

Jena encoding: @include
<OWL>.

[rule1: (?x
uni:authorOf

?p) (?y
uni:authorOf

?p)
notEqual
(?
x,?y
)
-
>(?x
uni:coAuthor

?y)]

[rule2: (?x
uni:advisorOf

?y)
-
> (?x
uni:researchAncestor

?y)]

[rule3: (?x
uni:researchAncestor

?y)(?y
uni:researchAncestor

?z)

-
>(?x
uni:researchAncestor

?z
)]

[rule4: (?x
uni:advisorOf

?y) (?x
uni:advisorOf

?z
notEqual
(?
y,?z
) (?x
uni:worksFor

?u)
-
> (?x
uni:distinguishAdvisor

?u)]

[rule5: (?x
uni:advisorOf

?y) (?y
uni:distinguishAdvisor

?u) (?x
uni:worksFor

?d)
-
>
(?x
uni:distinguishAdvisor

?d)]





WIMS 2011, Sogndal, Norway

15

In SWRL these rules are less compact. Rule
1:

<
swrl:Variable

rdf:about
="#x"/>


<
swrl:Variable

rdf:about
="#y"/>


<
swrl:Variable

rdf:about
="#p"/>


<
swrl:Imp

rdf:about
="rule1">


<
swrl:head

rdf:parseType
="Collection">


<
swrl:IndividualPropertyAtom
>


<
swrl:propertyPredicate






rdf:resource
="#
coAuthor
"/>


<swrl:argument1
rdf:resource
="#x"/>


<swrl:argument2
rdf:resource
="#y"/>


</
swrl:IndividualPropertyAtom
>


</
swrl:head
> ……

WIMS 2011, Sogndal, Norway

16

Query in SPARQL notation:


Query 1: Co
-
author


PREFIX
uni
:<http://www.owl
-
ontologies.com/OntologyUniversityResearchModel.owl#
> SELECT ?x ?y

WHERE {?x
uni:coAuthor

?y. ?x
uni:hasName

\
"FullProfessor0_d0_u0
\
" }

WIMS 2011, Sogndal, Norway

17

Query 2: Research ancestor


PREFIX
uni
:<http://www.owl
-
ontologies.com/OntologyUniversityResearchModel.owl#>

SELECT ?x ?y

WHERE {?x
uni:researchAncestor

?y. ?x
uni:hasName

\
"FullProfessor0_d0_u0
\
" };

WIMS 2011, Sogndal, Norway

18

Query 3: Distinguished advisor


PREFIX
uni
:<http://www.owl
-
ontologies.com/OntologyUniversityResearchModel.owl#>

SELECT ?x ?y

WHERE {?x
uni:distinguishAdvisor

?y. ?y
uni:hasTitle

\
"department0u0
\
" };

Experimental Design
-

Metrics


Setup time


This stage includes loading and
preprocessing time before any query can be
made


Query processing time


This stage starts with parsing and executing
the query and ends when all the results have
been saved in the result set.

WIMS 2011, Sogndal, Norway

19

Experimental Design
-

Procedure


Scale with the size the instance data


Scale with respect to the complexity of
reasoning


Transitive chain


Caching effect


Realism of model (ScienceWeb
vs

LUBM)

WIMS 2011, Sogndal, Norway

20

Results


Setup time

WIMS 2011, Sogndal, Norway

21

WIMS 2011, Sogndal, Norway

22

Setup time of rule set 1 for LUBM dataset

WIMS 2011, Sogndal, Norway

23

Setup time of rule set 4 for LUBM dataset

WIMS 2011, Sogndal, Norway

24

Setup time of rule set 1 for ScienceWeb dataset

WIMS 2011, Sogndal, Norway

25

Setup time of rule set 4 for ScienceWeb dataset

Results


Setup time


Some systems have no data points because of size of
Abox

(Jena, Pellet, KAON2 load into memory for
inferencing
)


For small(<2 Million) KAON2 best


Oracle and OWLIM scale to 4 million triples with no
problem; Oracle scales best


Great variation with different rule sets


OWLIM not good on rule set 4 for ScienceWeb(more triples in
Abox

than LUBM) as compared to Oracle


Oracle not good on rule set 2 as it needs to set up “filter” to
implement “
notEqual



WIMS 2011, Sogndal, Norway

26

Results


Query Processing

WIMS 2011, Sogndal, Norway

27

WIMS 2011, Sogndal, Norway

28

Query processing time of query 1 for ScienceWeb dataset

Results


Query Processing


OWLIM best for all but largest (4 Million)
triple set


Oracle best for largest set


Query returned in seconds


Setup time can take hours

WIMS 2011, Sogndal, Norway

29

Results


Caching Effect


Caching ratio:


first query processing time /average over next
ten identical queries


OWLIM little effect


In other systems effect becomes weaker
as the size of dataset grows


WIMS 2011, Sogndal, Norway

30

WIMS 2011, Sogndal, Norway

31

Caching ratios between processing time of single
query and average processing time on ScienceWeb
ontology for query 1



Results


Transitive Rule


created a group of separate instance files
containing different number of individuals
that are related via the transitive rule in
rule set 3

WIMS 2011, Sogndal, Norway

32

WIMS 2011, Sogndal, Norway

33

Setup time for transitive rule

WIMS 2011, Sogndal, Norway

34

Query processing time after inference over transitive rule

Results


Transitive Rule



Pellet only provides the results before time
-
out when the
length of transitive chain is 100


Jena’s performance degrades badly when the length is
more than 200


Only KAON2, OWLIM and Oracle 11g could complete
inference and querying on long transitive chains

WIMS 2011, Sogndal, Norway

35

WIMS 2011, Sogndal, Norway

36

Conclusions



When more realistic models (ScienceWeb) than provided by
LUBM are used, serious issues arise when the size
approaches million of triplets


OWLIM and Oracle offer the best scalability for the kinds of
datasets anticipated for ScienceWeb


heavy front
-
loading of the
inferencing

costs by pre
-
computing the
entailed relationships at set
-
up time


negative implications for evolving systems


Real
-
time queries over large triplet spaces will have to be
limited in their scope


How we can specify what can be asked within a real
-
time
system? We do not know yet.