Semantos: XML-based Query Enhancement of RDF for Agents in the Semantic Web

elbowsspurgalledInternet and Web Development

Oct 21, 2013 (3 years and 11 months ago)

138 views

Semantos: XML
-
based Query Enhancement of RDF

for Agents in the Semantic Web



Theo Crous, and Judith Bishop

Department of Computer Science, University of Pretoria

Pretoria, South Africa

{tcrous, jbishop}@cs.up.ac.za



Abstract


In this paper, we propose
a framework to enable
automated agents to search the semantic web. It is the
position of this paper that the World Wide Web has
seen such tremendous success because of the search
engines that allow people to find web pages that would
be relevant to them. I
n order for the semantic web to
enjoy the same manner of success, a new kind of
search engine needs to be introduced. A major
difference is that in this instance the search engine
would not be used by people, but by software agents.
This paper illustrates
the use of the RDF query
language Semantos and introduces a Query
Enhancement Service (QES) which allows software
agents loaded with a Semantos query to improve query
results. Our approach to searching on the Semantic
Web is to amend the triple elements co
ntained inside
the query with more commonly used triple elements
from the QES. Early test results show that this
approach may yield better results than an un
-
optimized
query would in a heterogeneous knowledge
environment like the World Wide Web.


Keyword
s
: Semantic Web, Semantos, Software agents,
Query enhancement


1.

Introduction


The World Wide Web is a very large, information rich,
data store that can be described as the “library of
humanity”. The key problem with all very large
libraries is how to get to

the right information at the
right time. Considerable effort has gone into building
very complex information retrieval systems for the
Web, most notably the modern search engine which
stands out as the “librarian” of the digital age. Without
this digital
librarian there would be no hope of finding
pertinent information at the required time. So far, these
“librarians” have done a good job of keeping track of
the individual “books” or documents on the web, but as
we are now starting to realize, accessing the

books
alone will not satiate our information retrieval needs.
We need access to the “paragraphs” of information
contained inside the “books”. We need to be able to
search, aggregate, categorize and analyze the details or
data of these Web documents.

This
key problem is the main reason for the
invention of the Semantic Web. The Semantic Web
hopes to realize the full potential of the enormous
digital library that is the Web. By tagging individual
elements of data, the Semantic Web opens the door for
very ref
ined searches on the information contained
inside web documents. These tags are different from
the traditional HTML mark
-
up tags used on the Web
today. In the first place, they have nothing to do with
document layout and visual styles. More importantly,
th
ey describe the semantic relationships between data
elements, and so provide a reasonable conceptual
framework for automated software agents to “think”
about the information contained within the document.

Tim Berners
-
Lee’s initial vision for the Semantic
Web in 2001 [1] saw information avatars or software
agents crawling the web looking for information that
would be relevant to our personal needs. This is one of
the foundations of the Semantic Web: making
information available to automated software systems
.
Such a system would allow us to manage the vast
quantity of information on the web by allowing
machines to do the searching, categorization and
analysis for us. Now, several years later, this agent
assisted framework has not realized its full global
pote
ntial and is still relegated to academic
implementations (with a few exceptions) [22].

The Semantic Web is built up of Resource
Description Framework (RDF) [3] tags that associate
data by building a semantic or concept graph. The RDF
tags build this graph

by specifying subject
-
predicate
-
object triples. Anyone publishing information in RDF
format is free to use any subjects, predicates or objects
that they wish. There is no “master list” of standard
elements that may be used, as it is simply impossible to
f
ormulate such a standard list. Everyone is therefore
free to create their own RDF vocabulary or ontology.
The publishing community therefore has endless
freedom, but the consumers of this published
information have endless problems! Without a detailed
know
ledge of the ontology that the information is
expressed in, it is difficult in general to construct a
query against the information.

When considering small, established communities in
the Web, the problem is manageable


arguably
everyone writing about tr
opical fish could agree on a
single ontology that describes their little corner of the
digital library very well. However it would most
certainly not be possible to construct a single all
encompassing ontology for the entire World Wide
Web. The diverse set

of ontologies existing and yet to
be created are therefore an expressive necessity and it
is the work of the information retrieval system to make
do with what it has at hand.

Modern search engines use links and references to
sort the best results for a gi
ven search. There is a need
to distil this approach and apply it to the Semantic Web
and its growing list of RDF data sources so as to search
and index RDF triples in a similar manner. If for
example we created an RDF triple equivalent of the
search engine
, it should be possible for software agents
to query this RDF search engine and find the “most
used” RDF triple, describing the concept it wishes to
search for. It would then get far more search results by
using the “most used” triple element, instead of i
ts
own.

The outline of the paper is as follows: section 2
provides a brief introduction to the Semantos
information query language. Section 3 covers the ideas
behind the query enhancement process and section 4
details a physical implementation of the theor
y
provided in the preceding section. Section 5 highlights
some future directions for the work presented in this
paper.


2.

Semantos


Semantos (from the Greek word for “marked”) is an
information query language expressed in XML that
enables consumers to build
semantically enriched
information queries across heterogeneous information
sources. The main purpose of Semantos is to serve as a
fully fledged information integration query language in
Enterprise Information Integration (EII) systems [6].
Semantos is impl
emented in C# 3.0 on the Microsoft
.Net 3.5 framework and makes heavy use of the
Language Integrated Query or LINQ technology which
has been introduced with C# 3.0 and VB.NET 9.0 [17].
Semantos is also capable of exploiting ontologies
expressed in the Reso
urce Description Framework
Schema format (RDFS). Fig. 1 shows a complete
Semantos query that will query a data source on the
Web and return the names of women and the names of
their spouses. This is achieved by looking for all
people that are of type human
:Woman and that have a
relationship with human:hasSpouse.


2.1

RDF/RDFS


At its core, the RDF model defines subject
-
predicate
-
object triples that are used to tag pieces of data and
align them to a bigger picture or RDF graph [5]. This
graph then represents all

the information in the data
source. RDFS defines additional relationships between
these triples and provides for the ability to create rich
ontologies or namespaces that define the objects,
terminologies and semantics that we use in an RDF
graph.


<seman
tos:fetch>


<semantos:source name="rdf source" uri="http://www.retrorabbit.co.za/data/example1.rdf"/>


<semantos:namespace name="rdf" value="http://www.w3.org/1999/02/22
-
rdf
-
syntax
-
ns#"/>


<semantos:namespace name="people" value="http:// www.retrorabbi
t.co.za/data/humans.rdfs#"/>


<semantos:entity name="firstnames">


<semantos:attribute name="name"/>


<semantos:attribute name="spousename"/>


<semantos:graph>


<semantos:triple subject="person" predicate="human:name" object="@name"/>



<semantos:triple subject="person" predicate="rdf:type" object="human:Woman"/>


<semantos:triple subject="person" predicate="human:hasSpouse" object="spouse"/>


<semantos:triple subject="spouse" predicate="human:name" object="@spousen
ame"/>


</semantos:graph>


</semantos:entity>

</semantos:fetch>


Figure
1

Example of a complete Semantos query


With RDFs we can query two related data sources,
even if they use different triple assignments, as long as
there
is a unifying ontology that maps triples from the
one graph to the other. Semantos is a RDF/RDFS query
language based in XML.

Although RDF/RDFS has been commonly
acknowledged as the language to describe metadata
information on the Semantic Web, the questi
on of
which query language to use to query the RDF/RDFS
metadata, has been hotly contested for the last couple
of years. These languages include RQL [12], RDQL
[19], SPARQL [21] and even purely mathematical
languages [9].

Semantos is the latest addition t
o this group of
languages. Formal definitions of RDF query languages
have also been compiled [11]. Although comparisons
of these various languages have been attempted [11],
we will not reproduce those results here. For the
purposes of understanding Semanto
s it is pertinent to
note that it is the first query language to be fully
represented in XML. Unlike XQueryX [18] which is
simply a mapping of XQuery syntax to an XML
representation, Semantos is a 100% XML native
construct.


2.2

Elements


Each of the XML eleme
nts in the Semantos query
language serves a specific goal. In order to express the
syntax and structure of the language we will describe
the main elements of the XML structures here,
referring to the example in Figure 1.


Fetch
. The fetch element is the ou
termost container for
the language. It may be embedded inside another XML
document (i.e. an XHTML page) in order to stream the
results directly into the documents format.


<semantos:fetch>… </semantos:fetch>


Source
. The source element identifies the sou
rce of the
data. For integration purposes, multiple data sources
may be declared. Currently Semantos only supports
RDF compatible data sources. Each fetch element
requires at least one source element.


<semantos:source name= "rdf source"

uri="http://www.re
trorabbit.co.za/

data/example1.rdf"/>


Namespace
. In order to circumvent identifying node
elements through long fully qualified names, Semantos
supports namespaces, as declared with the namespace
tag. Each fetch element may have zero or more
namespaces.


<
semantos:namespace name="rdf"

value="http://www.w3.org/1999/02/

22
-
rdf
-
syntax
-
ns#"/>


Entity. The result of a Semantos query is an entity list.
An entity is a logical processing unit, similar to a class
structure in object oriented programming. The entity

element helps to define the structure of the results.
Each fetch element has exactly one entity element.


<semantos:entity name="a">…</semantos:entity>


Attribute. In order to structure results fully, the entity
requires projected attributes. Each attribu
te maps to a
variable node in the query graph. Each entity element
may have one or more attribute elements.


<semantos:attribute name="name"/>


Graph. Creating a semantic query requires the creation
of a context graph. The graph identifies the structure of

the data being queried. A graph is a set of triples [10],
which is responsible for mapping the tree/graph
structure of information into a structured tuple data set,
which can be used in our result set. Each entity node
must have exactly one graph element.



<semantos:graph>…</semantos:graph>


Triple. The triple tag is used to create the triples that
build the context graph. A graph element may contain
one or more triple elements.


<semantos:triple subject="person"

predicate="human:name" object="@name"/>


2.3

Language Integrated Query


The main philosophy around Language Integrated
Query (LINQ) is the integration it provides between
object, relational and semi
-
structured data models [8].
It achieves this integration by way of generalization,
rather than by ad
-
h
oc specializations. Semantos uses
the XLinq branch of LINQ; which strives to make
XML documents or document fragments first class
citizens. In this way, XML values can be constructed,
loaded, passed, transformed and updated in a type
-
safe
manner, as in Fig
ure 2.


XElement contacts = new XElement("contacts",

from c in customers


where c.Country == "USA"


select new XElement("contact",


new XElement("name", c.CompanyName),


new XElement("phone", c.Phone)


)

);

Figure
2

Xlinq in action (C# excerpt)

XDocument query = new XDocument(


new XDeclaration("1.0", null, null),


new XElement(semantos + "fetch",


new XElement(semantos + "source",


new XAttribute(semantos + "name", "WhoRU Full User Profi
le"),


new XAttribute(semantos + "uri", @"http://localhost/whoru/fullprofile.aspx")),


new XElement(semantos+"entity",


new XAttribute(semantos+"name", "knownpeople"),


new XElement(semantos + "attribute",


new XA
ttribute(semantos + "name", "@name"))),



new XElement(semantos + "graph",




new XElement(semantos + "triple",







new XAttribute(semantos + "subject", whoru + "person"),


new XAttribute(semantos + "predicate", whoru + "myname"
),


new XAttribute(semantos + "object", "@name"))


)



)

);


Figure
3

Semantos query in C#


XLinq is a great improvement over the existing DOM
and SAX
-
based approaches to reading and writing
XML. This improvement in
ease of use is due to the
functional style that XLinq possesses [15]. A more
pertinent example of a Semantos query is illustrated
below. Figure 3 illustrates the combination of
Semantos XML constructs with the LINQ XML
construction methods.


2.4

Language Compa
rison


There are several languages available that query RDF
data sources. Although they all support different
subsets of features, they also have a lot in common. All
the query languages provide some syntax with which to



project the attribute result set;



define the data graph that needs to be queried;



limit the result set;



include external namespaces.




What these languages lack is the ability for people,
software and more importantly agents to manipulate
these queries in a simple manner. This can be achi
eved
by using a query language built from XML structures,
instead of a language built on custom built syntax
constructs. We provide here a short list of common
RDF query languages. There is an intended similarity
between the language properties of Semantos

and that
of other XML query languages. Characteristics that the
languages should have are defined in [2].


RQL
. RQL is a query language for RDF and RDF
Schema, which is loosely based on OQL [12]. RQL has
a powerful feature in its ability to address RDF Sc
hema
semantics in the language itself. Specific language
constructs cater for class instance relationships, class
property subsumption, domains and ranges. An
example of a typical RQL query is shown in Figure 4.


SELECT


name, spousename

FROM


{person}, h
uman:name, {name},


{person}, human:hasSpouse, {spouse},


{spouse}, human:name, {spousename},


{person}, rdf:type, {X : human:Woman}

WHERE


name = “Theo”

USING NAMESPACE


rdf = http://www.w3.org/1999/02/


22
-
rdf
-
syntax
-
ns#,


human = http://www.inri
a.fr/2006/12/05/


humans.rdfs#

Figure
4

RQL Query


RDQL
. The RDF Data Query Language or RDQL is a
query language for RDF based on SquishQL [19]. The
syntax for RDQL follows a SQL like select pattern,
where the “from” clause
is omitted [11]. RDQL does
not support the incorporation of RDF Schema
information. A typical RDQL query is shown in Figure
5.


SELECT


?name, ?spousename

WHERE


(?person, human:name, ?name),


(?person, human:hasSpouse, ?spouse),


(?spouse, human:name
, ?spousename),



(?person, rdf:type, human:Woman)

AND


?name = “Theo”

USING


rdf FOR <http://www.w3.org/1999/02/22
-
rdf
-
syntax
-
ns#>


human FOR
<http://www.inri.fr/2006/12/05/humans.rdfs#>

Figure
5

RDQL Query


SPARQL
. SPARQL (pro
nounced "sparkle") is an RDF
query language. The name is a recursive acronym
standing for SPARQL Protocol and RDF Query
Language. SPARQL is being designed and
standardized by the RDF Data Access Working Group
(DAWG) of the World Wide Web Consortium [21]. A

SPARQL query example is shown in Figure 6.

The three languages are compared to Semantos in
Table 1.


PREFIX rdf:

<http://www.w3.org/1999/02/22
-
rdf
-
syntax
-
ns#>

PREFIX human:
<http://www.inria.fr/2006/12/05/humans.rdfs#>

SELECT


?name ?spousename

WH
ERE {


?person human:name ?name.


?person human:hasSpouse ?spouse.


?spouse human:name ?spousename.


?person rdf:type human:Woman.

FILTER


(?name = “Theo”)

}

Figure
6

SPARQL Query


As with any techn
ology there will be strengths and
weakness that may limit or expand the pool of possible
applications to which a technology is suited. Semantos
is no exception to this rule. By providing a critique of
both the strengths and weaknesses of the language here,

we hope to highlight the languages most prominent
features.


2.5


Strengths


In the final analysis the Semantos language possesses
the following strengths; this makes it a strong candidate
as an RDF query language:


Built on XML
. This makes it a simple matter

to parse,
extend and work with the language. The XML base of
Semantos makes it a very open and accessible query
language. With more advanced XML processing
technologies (like XLINQ) becoming available,
working
with XML becomes a logical choice.

Serializab
le
. For any web language it is critical for the
query to be serialise
-
able. This gives a language the
strength to travel effortlessly over the web as is very
often required. This provides a double benefit when
considering the language’s role as an EII quer
y
language, as it is very often necessary to save and later
analyze the results of a query.

Human and machine readable
. In the modern age
man and machine are collaborating more than ever to
achieve goals or tasks. This virtual “symbiosis” will
benefit from

the flexibility of Semantos to be
interpretable to both people and computers.

Supports semantics
. In an information rich
environment, there is often a mismatch in meaning
between two or more different data sources. This is
more apparent on the World Wide
Web than any where
else. If there should be any hope of achieving some
heterogeneous analysis of these data sources it becomes

critical for the semantics of these data sources to be
made concrete and available.


2.6 Weaknesses
.

For all its strengths Sema
ntos also suffers from a few
shortcomings. These shortcomings are a direct result of
the strengths provided above, and therefore it would be
necessary to manage these shortcomings, as they
cannot be mitigated all together:


Bulky
. As a direct result of the

XML nature of the
language, queries tend to be longer than languages that
make use of custom syntax. This weakness can be
overcome by using tools to process the language
instead of hand crafting queries, which may become
confusing if the query is longer a
nd more complex.

Query optimization
. Given that the Semantos query
may be split up into different segments and distributed
over the web for processing in different domains, to
yield a single result, it is possible for the queries to be
too complex for any
form of useful optimization to take
place. This is more a result of the distributed and
heterogeneous nature of the data sources than the query
language, but as this is the only environ in which the
language will operate it becomes an innate part of the
la
nguage itself.


Table
1

Comparison between RDF Query Languages


Data Model

Ontologies

Construction

Abstraction level


SQL

Relational

Not supported

Standard syntax

Syntactic

XQuery

Hierarchical

Not supported

Custom syntax

Syntact
ic

SPARQL

Graph

Supported

Custom syntax

Semantic

Semantos

Graph

Supported

XML syntax

Semantic




Figure
7

Agent enhancement through query


Standards
. In a global landscape it becomes essential
for
any new technology to be backed by a standard, for
it to achieve any success. Languages like XQuery and
SPARQL are backed by the World Wide Web
Consortium and that gives these languages a distinct
advantage over Semantos. This is a weakness that may
hopefu
lly be overcome with time.


3.

Query Enhancement


When querying information on a network as large as
the web, query enhancement could fall into one of two
categories, viz.



optimizing the efficiency of the query in terms of
speed and resource utilization, or



enhancing the quality of the results retrieved by the
query.

These goals may also only be achieved by altering the
structure or makeup of the query, as it would be
extremely challenging to change the structure of the
data across a large distributed networ
k of indeterminate
nodes. This paper is concerned with the optimization of
the quality of the results. Specifically it attempts to
increase the volume of results by altering elements of
the query to be in line with what is most commonly
used on the web. In

Semantic Web terms, we are
looking to replace context graph edges, with edges that
are used more regularly on the Semantic Web.


3.1

Agent Enablement


The process of query optimization is relatively simple.
On the user end of the process the following steps
o
ccur, as can be followed from figure 3. The query
originator which may be a person or a piece of software
builds a Semantos query that would retrieve the desired
results. This query is then loaded into a software agent
capable of traversing the Semantic We
b. The agent
starts looking for RDF data sources and executes his
queries against those sources. At some indeterminate
stage, the agent may visit a query enhancement service.
This may be a deliberate response to not achieving any
a satisfactory amount of r
esults from the first few data
sources, or it may be by happen stance, as the agent
just so happened to pass by a service in any case.
Regardless of when the agent visits the service, the
QES subsequently modifies the agent’s query so as to
yield better re
sults. After being enhanced the agent
moves along to the following data sources, and perhaps
even some more query enhancement services. When
the agent has completed its run it returns to the query
originator with its payload.


3.2

Crawling the Semantic Web


Qu
ery enhancement attempts to replicate the successes
of search engines on the web by reproducing key
elements of these search engines. Prime amongst these
is the ability of search engines to “crawl” the web and
harvest information about websites. One of the

most
important pieces of information that is harvested in this
fashion is the count of how many web links reference a
particular page. By determining this count of references
it is possible to determine which web pages are more
popular than others and are

therefore better search
results. We duplicate this behaviour by crawling the
Semantic Web and counting the number of times and
individual predicate or subject occurs on any of the
Semantic Web documents. By counting the occurences
of individual elements i
n the Semantic Web, it is
possible to determine trends as to which mark
-
up
elements have favour. This information is then stored
by the query enhancement service.

The next step is to determine the “likeness” or
similarity of elements in our dictionary. Se
veral
algorithms and techniques have been proposed to solve
this problem and all are from the ontology mapping or
alignment domain. Examples of techniques include
Anchor
-
PROMPT [20], GLUE [16] and Quick
Ontology Mapping (QOM). In this paper we use string
s
imilarity to measure the similarity of two elements on
a scale from 0 to 1 [14] based on Levenshtein’s edit
distance [13].





Once we have determined the similarity between
elements and the occurrence probability in the World
Wide W
eb we have all the information we need to
enhance queries.


3.3

Enhancement§


Once a query is submitted to the enhancement service it
goes to work by replacing the predicate values of the
triple elements. It is also possible to replace the subject
and object e
lements, although experimentation proves
this to be a somewhat untrustworthy enhancement. The
predicate values are compared to the values contained
in the dictionary that we have built up by crawling the
web. Although it would be far more effective to use
statistical methods to compare elements we propose a
simpler solution here. Replacement is proposed as a
function of the occurrence ratio in
-
the
-
wild multiplied
by the similarity index provided by Levenshtein’s edit
distance. This gives a weighted “appropr
iateness”
value for each value in the dictionary, which leaves us
to select the one with the highest value.




Using LINQ, it is then a simple matter to replace the
value of the predicate attribute with the most suitable
value; which

translates into replacing the context graph
edge with a more appropriate edge. This operation is
repeated for each of the triple elements in the query.
After the query has been modified, it is returned to the
requestor; which in turn is now free to execut
e the
query with the knowledge that the predicates used in
the queries context graph occur with some regularity on
the web.


4

Implementation


The practical implementation of query enhancement
requires a sandbox approach to test the ideas, as it
would be rat
her unworkable to implement such an
experiment across the entire web. The sandbox that we
chose reflects the current trend of social networking on
Web 2.0. The system is tested against a custom social
network implementation called Who R U.


4.1

Who R U Applic
ation


The system allows a user to register and then provide
details about him/her. This is achieved by making
statements about oneself, such as “I like fish”, which
translates to a subject (I) predicate (like) and object
(fish). It is then possible for th
e user to further describe
the object in question at various levels of detail, for
example “I like fish”, becomes “I like (tropical) fish”.
Who R U, provides the user with a blank canvas to
publish as much (or as little) information about
themselves as the
y wish. It is then possible for other
people to query this information, perhaps to find out
who else likes “tropical fish” and start a chat group.




Figure
8

WhoRU Layer Architecture


As Figure 8 shows,
there are 4 main elements to the
application. At the very top of the layer architecture we
have the Who R U website. This is the main graphical
user interface that allows users to interact with the
system. The website would also be described as the
view co
mponent in a model
-
view
-
controller (MVC)
architecture. The website interacts with the query
enhancement service, which is a web service.
Implementing the QES as a web service opens its
functionality up for other applications to run on top of
it without any

difficult integrating or glue code. Both
the web site and the web service make extensive use of
the query engine library, which is a dynamically linked
library (DLL). The query engine provides the routines,
algorithms and internal data structures which en
ables
Semantos to execute queries against RDF data sources.
At the bottom of the stack are the RDF documents.
Each RDF document represents a single person and all
the statements he has made of himself.

Although this test
-
bed application does not have a
sof
tware agent implementation to execute the searches,
it is simple to see the possibility of implementing it.
Each RDF document would represent a different data
sources; which in this case, just so happens to be in the
same place. Also the Semantic Web crawl
er is omitted
for the same reason


the RDF documents are all in the
same location, extracting and counting the edge
information in the RDF context graphs is a simple
matter of iterating through the documents in a folder. In
a real live application however
, these RDF documents
would be distributed all over the Web, and an agent
and Web crawler would both be required in order to get

to the information. Our test
-
bed application does
however provide us with a suitable and interesting
general implementation so
that we may experiment
with our theoretical ideas.


4.2

Synonyms


As we are working with written statements or
comments it would be prudent to modify our element
matching function slightly to incorporate lexical
synonyms for words. A dictionary would for examp
le
classify the words “love” and “adore” as synonyms and
they would therefore be considered a good match. In
these instances we force the calculated distance
between the words to the maximum value of 1, yielding
a modified weight function:



Now given that people have different vocabularies
(read ontologies), and that we allow them complete
freedom to use any word they like, it is quite likely that
people will use different words that essentially mean
the same thing. Someone else may
have commented
about themselves: “I dig tropical fish”. Where “dig” is
a synonym for “like”. If this particular user where to
look for people who also like fish he would probably
unknowingly use the term “dig”, which would perhaps
yield poor results as he
might have been the only
person to phrase it in such a manner. This is where the
QES comes into play, by changing the “dig” into
“like”. This replacement would not yield all the
possible results, as it is a best effort approach, meaning
that the users who
stated that the “dig” fish will be
omitted, but the users that “like” fish will be returned.
This is a better result as more people will be returned
by a query looking for people who “like” fish.


Table
2
. Possible replacement value
s for query
using “dig”


Value

Occurrence

Distance

Synony
m

Weight

Dig

320

Z

1

320

Like

1288

Z

1

1288

Enjoy

896

Z

1

896

Dug

2

Z

0

Z x 2

Digger

38

Z

0

Z x 38


Table 2 illustrates the use of the synonym function with
respect to the distance function. Fr
om this table we can
see that when a synonym is present, the weight of the
value is solely determined by the occurrence column.
Please also note that Z represents the Levenshtein
distance between the word “dig” and the word in the
value column. It is also
noteworthy to see that in our
index table, we have the word “dig” as a synonym to
itself. What this boils down to is that we favour
synonyms with high occurrence values over words that
look similar.


4.3

Related Work


Some of the work on QES has been inspired

by the
directions taken with the SWOOGLE [7], crawler
-
based indexing and retrieval system for the Semantic
Web. The key differences between SWOOGLE and
QES are:




QES measures the importance of an RDF triple,
whereas SWOOGLE measures the importance of
an e
ntire semantic web document



SWOOGLE is meant to be used by people,
whereas QES is exclusively used by software
agents.



SWOOGLE aims to aid Semantic Web document
discovery, whereas QES has the goal of improving
search performance by modifying the query
pre
dicates.



QES is not as mature in its ability to measure the
importance of relations between triple elements, as
SWOOGLE is at measuring the relations between
documents.


All things considered, the two technologies may be
found to compliment each other well
, in the sense that
they cater for the two separate segments of the
consumer market (man and machine) and that they may
even aid each other in RDF/S document/triple
discovery.

A key shortcoming of the matching process is that it
does not take context into
play. The example of looking

for fish would be drastically different if people usually
say that they “dig” fish and not “like” fish. In the
context of talking about fish, the word “dig” should
carry a higher weight. This may be achieved by using a
matrix t
hat physically stores the number of times one
element is used in relation to another. When looking up
the occurrence value it would then be required to look
for the occurrence value relative to the element being
matched.


5

Conclusions


This paper introduced

an XML based RDF query
language, called Semantos, and illustrated its use.
Semantos is novel in that is relies on XML, as opposed
to currently popular languages that rely on structures
alien to XML. By means of this new query language we
have also establi
shed a framework for the enhancement
of RDF queries. This framework would primarily be
used by software agents to retrieve information from
the Semantic Web. Query enhancement is achieved by
replacing parts of the query with parts that have been
found to o
ccur more often in the Semantic Web, thus
yielding better results. This process is akin to the way
search engines establish the relative importance of a
web page by looking at the number of links or
references to it. The concept of relative importance is
i
mplemented in the Query Enhancement Service by
crawling the Semantic Web and storing the number of
times a particular predicate is used. When an agent
requests the enhancement of a query, it can then be
shown that a particular predicate is more widely used

and would then be likely to return better results. Early
test results show that this approach may yield better
results than an un
-
optimized query would in a
heterogeneous knowledge environment like the World
Wide Web.


References

[1]

Berners
-
Lee, T., Hendler,

J., & Lassila, O.: The
Semantic Web. Scientific American, pp. 34

43 (2001)

[2]

Bonifati, A., & Ceri, S.: Comparative analysis of five
XML query languages. ACM SIGMOD Record, 29(1)
(2000)

[3]

Brickley, D., & Guha, R.: Resource Description
Framework (RDF) Schema Sp
ecification 1.0. Retrieved
August 18, 2007, from Candidate recommendation,
World Wide Web Consortium:
http://www.w3.org/TR/2000/CR
-
rdf
-
schema
-
20000327
(2000)

[4]

Broekstra, J., Kampman, A., & van Harmelen, F.:
Sesame: An Architecture for Storing and Querying R
DF
Data and Schema Information. In D. Fensel, J. Hendler,
H. Lieberman, & W. Wahlster, Semantics for the
WWW. 197
-
222 MIT Press (2001)

[5]

Carroll, J., & Stickler, P.: RDF Triples in XML. Proc.
Extreme Markup Languages (2004)

[6]

Delen, D., Nikunj, D., & Perakath,

B.: Integrated
modelling: the key to holistic understanding of the
enterprise. CACM 48 (4) 107
-
112 (2005)

[7]

Ding, L., Finin, T., Joshi, A., Pan, R., Cost, S., Peng, Y.,
et al.: Swoogle: A Search and Metadata Engine for the
Semantic Web. Proceedings of the 1
3th ACM
international conference on Information and Knowledge
Management. Washington: ACM (2004)

[8]

Doan, A., Domingos, P., Halevy, A.: Learning to match
the schemas of data sources: A multistrategy approach.
VLDB Journal 50, 279
-
301 (2003)

[9]

Frasincar, F., Hou
ben, G., Vdovjak, R., & Barna, P.:
RAL: An Algebra for Querying RDF. Proc. 3rd
International Conference On Web Information Systems
Engineering, WISE (2002)

[10]

Gutierrez, C., Hurtado, C., & Mendelzon, A.: Formal
aspects of querying RDF databases. Proc. 1st
Int
ernational Workshop on Semantic Web and
Databases. Berlin (2003)

[11]

Haase, P., Broekstra, J., Eberhart, A., & Volz, R.: A
Comparison of RDF Query Languages. Retrieved
August 18, 2007, from http://www. aifb.uni
-
karlsruhe.de/WBS/pha/rdf
-
query/rdfquery.pdf (2004
)

[12]

Karvounarkis, G., Alexaki, S., Christophides, V.,
Plexousakis, D., & Scholl, M.: RQL: A declarative
query language for RDF. Proc 11th International World
Wide Web Conference, 592
-
603 (2002)

[13]

Levenshtein, I.V.: Binary codes capable of correcting
deletions,

insertions, and reversals. Cybernetics and
Control Theory 10(8):707
-
710 (1966)

[14]

Maedche, A., Staab, S.: Measuring similarity between
ontologies. Proc. European Conference on Knowledge
Acquisition and Management (EKAW), 251
-
263,
Springer (2002)

[15]

Meijer, E.,

Beckman, B.: XLinq: XML Programming
Refactored (The Return Of The Monoids). Proc. XML
2005 Conference. Atlanta (2005)

[16]

Meijer, E., Schulte, W., & Bierman, G.: Unifying
Tables, Objects and Documents. Proc. DP
-
COOL 2003.
Uppsala (2003)

[17]

[Meijer, E., Torgersen
, M., & Bierman, G.: Lost In
Translation: Formalizing Proposed Extensions to C#.
Proc. OOPSLA 479
-
498. Montreal (2007)

[18]

Melton, J., & Muralidhar, S.: XML Syntax for XQuery
1.0 (XQueryX), W3C Working Draft. Retrieved August
18, 2007, from World Wide Web Cons
ortium:
http://web4.w3.org/TR/2005/WD
-
xqueryx
-
20050404
(2005)

[19]

Miller, L., Seaborne, A., & Reggiori, A.: Three
Implementations of SquishQL, a Simple RDF Query
Language. The Semantic Web
-

ISWC 423
-
435 (2002)

[20]

Noy, N.F., Musen, M.A.: The PROMPT suite:
interac
tive tools for ontology merging and mapping.
International Journal of Human
-
Computer Studies 59,
983
-
1024 (2003)

[21]

Prud'hommeaux, E., & Seaborne, A.: SPARQL Query
Language for RDF. Retrieved August 18, 2007, from
W3C Candidate Recommendation:
http://www.w3.o
rg/TR/rdf
-
sparql
-
query (2007)

[22]

Shadbolt, N., Berners
-
Lee, T., & Hall, W.: The
Semantic Web Revisited. IEEE Intelligent Systems 21
(3), 96
-
101 (2006)