SEMANTIC WEB APPLICATION :

wafflebazaarInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

367 εμφανίσεις




SEMANTIC WEB APPLICATION :
ONTOLOGY-DRIVEN RECIPE QUERYING







A MASTER’S THESIS
in
Computer Engineering
Atılım University





by
GÜLER KALEM
JUNE 2005

SEMANTIC WEB APPLICATION:
ONTOLOGY-DRIVEN RECIPE QUERYING




A THESIS SUBMITTED TO
THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES
OF
ATILIM UNIVERSITY
BY
GÜLER KALEM

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF
MASTER OF SCIENCE
IN
THE DEPARTMENT OF COMPUTER ENGINEERING




JUNE 2005

Approval of the Graduate School of Natural and Applied Sciences

_____________________
Prof. Dr. İbrahim Akman
Director
I certify that this thesis satisfies all the requirements as a thesis for the degree of
Master of Science.
_____________________
Prof. Dr. İbrahim Akman
Head of Department

This is to certify that I have read this thesis and that in my opinion it is fully
adequate, in scope and quality, as a thesis for the degree of Master of Science.

_____________________
Asst. Prof. Dr. Çiğdem Turhan
Supervisor


Examining Committee Members
Prof. Dr. Ali Yazıcı _____________________
Prof. Dr. İbrahim Akman _____________________
Assoc. Prof. Dr. Nazife Baykal _____________________
Asst. Prof. Dr. Çiğdem Turhan _____________________
Asst. Prof. Dr. Nevzat Sezer _____________________
iii

ABSTRACT

SEMANTIC WEB APPLICATION :
ONTOLOGY-DRIVEN RECIPE QUERYING
Kalem, Güler
M.S., Computer Engineering Department
Supervisor: Asst. Prof. Dr. Çiğdem Turhan
June 2005, 102 pages

Currently all the information presented on the Internet just have static content
giving meaning in some contexts, and these documents cannot be used effectively by
different systems. However, presenting information with well-defined meaning will
enable different computer systems to process and reason about the information at the
semantic level whereas the present systems process the information only at the
syntax level. Semantic Web approach will drastically change the effectiveness of the
Internet and will enable the reuse of information and increase the representative
power of information. It will be possible to combine information from different
locations and process them together since they are defined in a standard way.
In this thesis, concepts such as representing knowledge with a Semantic Web
language, ontology processing, reasoning and querying on ontologies have been
implemented to realize a Semantic Web application: Ontology-driven Recipe
Querying.
As the domain, a Web-based application dealing with food recipes has been
chosen. All the information and application logic have been moved into an OWL
(Web Ontology Language) ontology file which controls all the content and the
iiistructure of the application, and makes it possible to reason on the provided
information to create new facts from already given logic statements.
In the application, it is possible for the user to enter queries made up of arbitrary
elements when querying for available food recipes. The application is capable of
responding meaningfully no matter how the queries are constructed.


Keywords: Semantic Web, ontology, ontology querying, ontology management,
ontology-driven knowledge management, knowledge representation, Internet.
iv

ÖZ

ANLAMSAL AĞ UYGULAMASI:
ONTOLOJİ ODAKLI YEMEK TARİFİ SORGULAMASI
Kalem, Güler
Yüksek Lisans, Bilgisayar Mühendisliği Bölümü
Tez Yöneticisi: Yrd. Doç. Dr. Çiğdem Turhan
Haziran 2005, 102 sayfa

Günümüzde, Internet ortamında yer alan tüm bilgiler statik içerik içermektedir ve
bu dokümanların farklı sistemler tarafından etkili bir şekilde kullanılması oldukça
zordur. Bununla birlikte, bilgiyi uygun tanımlanmış bir anlamla sunmak, farklı
bilgisayar sistemlerine anlamsal düzeyde bilgiyi işlemeyi ve bilgi hakkında çıkarım
yapmayı sağlayacaktır, fakat mevcut sistemler sadece imla düzeyinde bilgiyi
işlemektedirler. Anlamsal Ağ yaklaşımı Internetin etkinliğini büyük oranda
arttıracak, bilginin tekrar kullanımını sağlayacak, ve bilginin sunum gücünü
arttıracak. Bilgiler bir standart ile tanımlandığından, farklı yerlerdeki bilgilerin
birleştirilmesi ve bu bilgilerin birlikte işlenmesi mümkün olacaktır.
Bu tezde, bir Anlamsal Ağ Uygulaması: Ontoloji Odaklı Yemek Tarifi
Sorgulaması gerçekleştirmek için; bilginin Anlamsal Ağ dili ile sunulması, ontoloji
işleme, ontolojiler üzerinden çıkarım yapma ve sorgulama kavramları
gerçekleştirilmiştir.
Alan olarak, Internet tabanlı yemek tarifleri uygulaması seçilmiştir. Tüm bilgi ve
uygulama mantığı uygulamanın içeriğini ve yapısını kontrol eden OWL (Web
vOntoloji Dili) ontoloji dosyasına yüklenmiştir, ve bu dosya var olan mantıksal
ifadelerden yeni bilgiler elde etmek için çıkarım yapmaya olanak sağlar.
Uygulamada, kullanıcı mevcut yemek tariflerini görebilmek için isteğine göre
seçtiği malzemeleri yazarak sorgulamalar yapabilir. Ayrıca uygulama sorgulamalar
her ne şekilde oluşturulursa oluşturulsun anlamlı cevaplar döndürür.


Anahtar Kelimeler: Anlamsal ağ, ontoloji, ontoloji sorgulaması, ontoloji yönetimi,
ontoloji odaklı bilgi yönetimi, bilginin temsil edilmesi, örüt ağ.
vi


ACKNOWLEDGMENTS


I express sincere appreciation to my supervisor Asst. Prof. Dr. Çiğdem Turhan for
sharing her knowledge with me and guiding me throughout my thesis. Without her
project proposal, support, encouragement, guidance and persistence this thesis would
never have happened.
I should also express my appreciation to examination committee members Prof. Dr.
Ali Yazıcı, Prof. Dr. İbrahim Akman, Assoc. Prof. Dr. Nazife Baykal and Asst. Prof.
Dr. Nevzat Sezer for their valuable suggestions and comments.
In addition, I would like to thank my parents Nafiye and Recep Kalem and my sister
Ayşegül for their unlimited patience, support and love during the course of the study.
vii
TABLE OF CONTENTS



ABSTRACT ............................................................................................................... iii
ÖZ.................................................................................................................................v
ACKNOWLEDGMENTS..........................................................................................vii
TABLE OF CONTENTS .........................................................................................viii
LIST OF FIGURES......................................................................................................x
CHAPTER
1. INTRODUCTION................................................................................................1
2. OVERVIEW OF THE WEB................................................................................4
2.1 Web Languages..............................................................................................5
2.2 Information Management on the Web............................................................6
2.3 Information Retrieval on the Web..................................................................6
3. SEMANTIC WEB................................................................................................9
3.1 Overview of the Semantic Web....................................................................10
3.2 Information Retrieval with Semantic Web...................................................13
3.3 Semantic Web Tools and Languages............................................................14
3.3.1 SGML (Standard Generalized Markup Language)...............................15
3.3.2 XML (eXtensible Markup Language)..................................................15
3.3.3 RDF (Resource Description Framework).............................................17
3.3.4 RDFS (RDF Schema)...........................................................................18
3.3.5 OIL (Ontology Inference Layer)..........................................................19
3.3.6 DAML+OIL (DARPA Agent Markup Language - OIL).....................20
3.3.7 OWL (Web Ontology Language).........................................................22
4. ONTOLOGY, ONTOLOGY EDITORS AND QUERY LANGUAGES.........26
4.1 Ontology Editors........................................................................................34
4.2 Ontology Management System..................................................................36
viii 4.3 Ontology Query Languages........................................................................38
5. DESIGN OF THE SEMANTIC WEB APPLICATION: ONTOLOGY-
DRIVEN RECIPE QUERYING........................................................................41
5.1 Overview of the System...............................................................................42
5.2 System Domain............................................................................................43
5.3 System Specifications..................................................................................46
5.4 System Design.............................................................................................47
5.4.1 OWL Ontology Design........................................................................49
5.5 Technical Specification...............................................................................52
6. IMPLEMENTATION.......................................................................................54
6.1 OWL Query Server......................................................................................55
6-2 Web Interface..............................................................................................62
6-3 Implementing the Ontology with Protégé...................................................75
7. CONCLUSION.................................................................................................76
REFERENCES......................................................................................................79
APPENDICES
A. Ontology Editor Survey Results...................................................................86
B. Sample Queries.............................................................................................90
C. OWL Model..................................................................................................92
D. Class Hierarchy for foodReceipts Project....................................................98


ix

LIST OF FIGURES



FIGURE
1. Structure of the System ....................................................................................... 42
2. Properties and Relations of the System .............................................................. 51
3. OWLQueryServer UML diagram ....................................................................... 57
4. RequestHandler UML diagram ........................................................................... 61
5. FoodEntry UML diagram ................................................................................... 62
6. Search Interface 1 ............................................................................................... 63
7. Sample Search (A) .............................................................................................. 64
8. Sample Search (B) .............................................................................................. 64
9. Sample Search (C) .............................................................................................. 65
10. Sample Search (D) .............................................................................................. 65
11. Ingredients and Recipe of Körili Pilav ............................................................... 66
12. Source of Körili Pilav ......................................................................................... 67
13. Search Interface 1 ............................................................................................... 68
14. Selecting Ingredients from Categorized Box ..................................................... 69
15. Sample Search (E) .............................................................................................. 70
16. Sample Search (F) .............................................................................................. 71
17. Sample Search (G) ............................................................................................. 72
18. Category of Pilavlar ........................................................................................... 73
19. Help Interface of the System .............................................................................. 74
20. Protégé Ontology Editor......................................................................................75
x
CHAPTER 1

INTRODUCTION

During the last fifteen years the Internet has stepped into our lives to stay
permanently. Today, almost everything in our life has a connection with the ‘Web’ in
one way or another. The Internet has become one of the most important platforms for
e-commerce, communication, entertainment, business, education and sharing
knowledge by all means. By just looking at the different fields and platforms
involved in the Internet it is not difficult to say that the Internet is not just a modern
way of doing something, but more it is a ‘de facto’ situation which will not just fade
away but continue its growth into the way we live. It is an entire concept surrounding
our life and reshaping our life style.
While the Internet is changing our way of living it is also changing and evolving
within itself. A new phase is needed where information on the Internet are given
well-defined meaning, enabling computers and people to work in cooperation.
Currently all the information presented on the Internet just have static content giving
meaning in some target environments or contexts. The Internet contains billions of
such documents which in general cannot be used effectively by different systems.
However, presenting the information in a well defined format using shared standards
will enable computer systems to process information at the semantic level whereas
the present systems process the information only at the syntax level. Presenting
information with well-defined meaning will enable different computer systems to
process and reason about the information presented. This approach will drastically
change the effectiveness of the Internet and will enable the reuse of information and
increase the representative power of information. It will be possible to combine
1information from different locations and process them together since they are defined
in a standard way.
“The Semantic Web” is a new way of representing information enabling it to be
defined and presented at the semantic level, better enabling computer systems to
process this information. A possible realization of the above mentioned process, if
not the only one, is to use Semantic Web languages enabling the semantic definition
of the information.
In this thesis all the concepts involved in Semantic Web have been studied, and
how different solutions could be combined to realize such applications have been
shown. Concepts such as representing knowledge with a Semantic Web language,
ontology processing, reasoning and querying on ontologies have been applied
successfully in the developed application.
The main purpose of the thesis is to investigate and research the Semantic Web
concept and get a solid understanding of the concepts together with its difficulties,
problems and the ability to be used in real world applications.
In developing the Semantic Web application, the following practical problems
arise:
• to process data defined with the Semantic Web language OWL (Web
Ontology Language) [34] [35] [37] [38] [50] [62] [73].
• to execute queries on OWL ontologies.
• to use meaning when applied within applications.
• to combine and process different information located at different systems, on
a single system.
The implementation part of the thesis mostly deals with the problems mentioned
in the above list. As a domain a Web-based application dealing with food recipes has
been chosen. Instead of building all the application logic into static standard HTML
with a scripting language, all the information and application logic have been moved
into an OWL ontology file. All the data and the application logic should reside in the
OWL Web ontology as much as possible for a more effective system.
2Specifically, the OWL ontology controls all the content and the structure of the
application. It makes it possible to reason on the provided information and create
some new facts from already given logic statements. In the application, functionality
is provided, so that an end-user can enter queries for some food recipes through the
Web interface. All the data needed for user queries is provided from the information
stored in the ontology itself. It is also possible for the user to enter queries made up
of arbitrary elements when querying for available food recipes. The application is
able to respond meaningfully no matter how the queries are constructed.
The report document has been divided into chapters, each of them dealing with
specific parts of the Semantic Web concept and the implemented application.
The following chapter presents the background information about the concept of
Web, general problems and Information Management on the current Web. Then, in
chapter 3, the history of Semantic Web, the domain of the Semantic Web and
Semantic Web Tools and Languages are presented. Chapter 4 is about Ontology,
Ontology Editors and Query Languages. In the 5th chapter, System Design is
covered, and System Domain and Specifications has been explained. Chapter 6
presents the Implemented Application discussing User Interface and System
Structure in detail. Finally, the Conclusion chapter explains possible extensions to
the thesis and future work on the Semantic Web subject.
3
CHAPTER 2

OVERVIEW OF THE WEB

At the very beginning, when the Web first emerged, some computers connected
to each other in order to work together and share the necessary data between them
[13] [14] [42]. Over time, the Web started to grow, and the intranets and LANs came
on to the scene. But the explosion of personal computers, Mobile devices and major
advances in the field of telecommunications were the actual triggers of the Web as
we know it today.
The growth of the Web has been impressive for the past few years. It is a
phenomenon which cannot be defined and described ranging over a period of time
because of its potential to change and to fit into our life. The interaction between the
Web and the way we live is an interaction which involves both parts equally. As the
Web changes according to our needs, equally the way the human beings work, study,
communicate with each other are also being reshaped. It is an interaction which still
has a great potential to move this interaction far away beyond our imagination.
At the first stage of the Web, it was thought of as some exchange platform of
documents and data, and a communication media for work collaboration. It was
meant to be a big network of workstations where the programs and databases could
share their knowledge and work together in collaboration. But with the enormous
explosion of the media programs, video games, films, music, pictures, etc. the
present Web is almost only used by humans and not by machines. The content is
mainly targeted for human consumption. The information meant to be processed by
computing systems are generally defined by some custom standards which is a
handicap for a more broad and extended use of the provided information.
4Specifically, the main problem that has appeared in the present Web is that the
information is written only for human consumption in most cases. The machines
cannot understand the meaning of online information. Enormous amount of pictures,
drawings, movies of all kind of media types, and information presented in a natural
language format populate the actual Web. As a result, this meaningless information
is not useful at all to the machines because they cannot process these data as a
context-aware system; they only present these data for the user in a specific format.
On the other hand, finding the right piece of information is often a nightmare on
the present Web. Search results are in most cases imprecise, often yielding matches
to thousands of pages. The human searching is often a difficult task, takes too much
time and has several limitations. Moreover, users face the task of reading all the
documents retrieved in order to extract the information which is actually desired.
Today’s search engines are not context-aware but rather, perform search with text-
match based methods. A related problem is that the maintenance of Web sources has
become very difficult. The burden on users to maintain consistency is often
overwhelming. This has resulted in a vast number of sites containing inconsistent and
contradictory information.
2.1 Web Languages
There are many languages used to publish data in the current Web [13] [15].
Some of these languages are: HTML, PHP, JSP and ASP and some Media-oriented
Web languages such as Flash. However, these scripting and markup languages are
only meant to process the business logic of the applications and the visual
presentation of the information they are dealing with. Markup languages such as
HTML does not care about what the information is, it will only control the layout
and appearance of the given information. Server side Web scripting languages such
as PHP are generally targeted to the dynamic behavior of the Web applications and
the business logic of such applications. However, the above mentioned languages all
have a common lack in providing and processing semantic meaning bound to
information. They just treat data as plain text without any meaning, that is, such Web
languages are not “aware” of the information they are dealing with.
52.2 Information Management on the Web
The incredible progress of the Web is as a direct consequence of a big explosion
of all kinds of online Web documents. The information storage and collection on the
Web is as follows; the information is generally stored in large databases that are kept
in the servers. The programs running on the servers generate the requested Web
documents “on the fly”, based on the needed data at some state. Most of these
dynamically generated on-line documents are only made for human consumption and
it is impossible for the machines to understand the meaning of these documents.
Such kind of Web documents is difficult to reuse and to make available to other
parties because they not permanent but are being generated on specific requests
without any well-defined meaning.
2.3 Information Retrieval on the Web
Information retrieval on the Web [15] refers to the act of recovering information
from the vast amount of online Web documents; getting the desired documents and
presenting them to the user. This is the classic and the most widely way of obtaining
information from the Web. With this approach a user does not extract any
information from a document. However, the user just picks up some documents
among all the available documents in the Web. The user will get a document or a set
of documents and will have to analyze the document to find the desired information
if it exists. Actually in this approach, only a portion of the computational power
exposed by the computers is used to fetch the desired information. The computing
systems used are only responsible in transferring the document and presenting to the
user. No processing power is used to retrieve directly relevant information through
context-aware processes and methods.
The problems associated with the retrieval of quality information from the
Internet are many. We can consider the Internet as a connected undirected graph with
many nodes where the connections are the edges. In this perspective, the nodes are
distributed across the world without regard for cultures of time zones. From this
point of view, the idea of a connected undirected graph captures elegantly the idea of
the Internet. The problems related to traversal of the Internet to retrieve information
6are that the data is distributed to the whole world and the nodes of the Internet are
spread across the world. And, it is obvious that the Internet is changing very fast and
the data is volatile. Every six months the Internet nodes and connections are doubling
in a topology that is not predefined. The data is redundant and stored in an
unstructured way, and the data on the Web is duplicated in many instances across
mirror sites. And also, the quality of the data is poor. The volume of data to be
searched and found on the Web is growing at an exponential rate. Not all the data is
in the same language because the Web is a reflection of the real world in that it is
multicultural and multilingual. New media types are appearing at a fast rate,
particularly where audio-visual or multimedia files are concerned. Many Web pages
content’s are created dynamically on demand.
On the Web, the unstructured markup languages make it difficult for humans and
even more difficult for the machines to locate and acquire the desired information.
To retrieve information on the Web, current methods are browsing and keyword
based searching. Both of the mentioned methods have several limitations when
retrieving information from the Web.
Browsing: Browsing the Web refers to the act of retrieving a Web document by
means of its URI (Uniform Resource Identifier) and displaying it in the local client
browser to view its content. The user often has to traverse from link to other link in
order to reach the desired information if it ever happens.
Anybody familiar with the Web knows the drawbacks of looking for information
by means of browsing:
• It is very time consuming
• It is not always possible to reach the desired information even though it exists
somewhere on the Web.
• It is also very easy to get lost and disoriented following all the links the user
might find relevant; suffering from what is called the “lost-in-hyperspace”
syndrome.
7Keyword Searching: Keyword searching is an easier way to retrieve
information when compared against browsing Web documents through Web links.
Keyword searching on the Web refers to the act of looking for information using
some words to guide the searching. These keywords that the user wants to search for
are entered into a search engine which will perform the searching on the Web cache
it has stored and indexed locally. Beforehand, the search engines continually traverse
all the links available on the Web caching and indexing all the Web documents they
reach. The search engines search the reduced copy of the Web following the links
and trying to match the input words with the words found in its index tables. When a
match occurs, the links pointed to by the index tables are returned back to the user.
Keyword searching is more useful than just browsing when looking for
information, since the user does not need to know the exact URI of the desired Web
document, however this approach still has some disadvantages:
• The user must be aware of the available search engines and choose the correct
one that fits his/her necessities.
• The keywords entered by a user are the ones the user considers more relevant
for the information he/she wants to look for, which is a very subjective
decision.
• The entered keywords have to exactly match the words presented in the Web
documents. Even a slight variation is not tolerated.
• Keyword searching normally returns vast amounts of useless document
references/links the user has to filter by hand.
“Although search engines index much of the Web's content, they have little
ability to select the pages that a user really wants or needs” [66].
8
CHAPTER 3

SEMANTIC WEB

The Web has dramatically changed the accessibility of electronically available
information. Today, the Web currently contains about 3 billion static documents and
these are accessed by over 500 million users from all around the world [5] [12] [67].
For this reason, with this huge amount of data, and since the information content is
presented primarily in a natural language, it became increasingly difficult to find,
access, present, and maintain relevant information. So, a wide gap has occurred
between the information available for tools and the information maintained in
human-readable form.
As a response to this problem, many new research initiatives and commercial
enterprises have been set up to enrich available information with machine-
processable semantics. One of the examples of the recent research is Semantic Web
which aims to provide intelligent access to heterogeneous, distributed information,
enabling software products (agents) to mediate between the user needs and the
information resources available. This support is essential for “bringing the Web to its
full potential.” Tim Berners-Lee [66], Director of the World Wide Web Consortium
and the inventor of World Wide Web foresees a number of ways in which developers
can use self-descriptions and other techniques so that context-understanding
programs can selectively find what users want. Lee referred to the future of the
current Web as the Semantic Web that is “extended Web of machine-readable
information and automated services that amplify the Web far beyond current
capabilities”.
9The explicit representation of the semantics underlying data, programs, Web
documents, and all kind of information related Web resources will enable a
knowledge-based Web that provides a qualitatively new level of service and a new
way of processing data. Computing systems and automated services will improve in
their capacity and ability to assist humans in achieving their goals by
“understanding” more of the information presented on the Web, and thus providing
more accurate filtering, categorizing, and searching of these information sources
available on the Web. This process will ultimately lead to an extremely
knowledgeable system that features various specialized reasoning services thus
extending the representational power of the available information. As Lee
summarized [5] [67]; “The first step is putting data on the Web in a form that
machines can naturally understand, or converting it to that form. This creates what I
call a Semantic Web - a Web of data that can be processed directly or indirectly by
machines”.
3.1 Overview of the Semantic Web
The purpose of the new phase in Web technology is to make the machines
capable of understanding the semantics of the information presented on the Web. To
be able to “read” and “understand” the Web as a human being does. For this purpose,
many different approaches have been formulated by a large number of researchers,
organizations and universities. Most of these methods are explained in detail in this
thesis.
The Semantic Web is not a separate Web [2] [11], yet it can be assumed to be an
extension toward the meaning of the current Web. The main difference between the
Semantic Web and the Web is that the Semantic Web is supposed to provide
machine accessible meaning for its constructs whereas in the Web this meaning is
provided by external mechanisms. In order to determine the meaning of a collection
of documents, it is necessary to use only the meaning determined by the formal
language specifications of the Semantic Web, currently the RDF (Resource
Description Framework) model theory and the OWL model theory.
10The Semantic Web aims for meaningful and machine-understandable Web
resources, whose information can then be shared and processed both by automated
tools, such as search engines, and by human beings [5] [9]. The consumers of Web
resources, whether automated tools or human beings are referred to, as agents. This
sharing of information between different agents requires semantic mark-up, for
example, an annotation of the Web page with information on its content that is
understood by the agents searching the Web. This kind of an annotation will be given
in some standardized, expressive language (which, e.g., provides predicate logic and
some form of quantification) and will make use of certain terms or classes (like
\Human", \Plant", etc.). To make sure that different agents have a common
understanding of these terms, we need ontologies in which these terms are described,
and which thus establish a joint terminology between the agents. Basically, Web
ontology is a collection of definitions of concepts and the shared understanding that
comes from the fact that all the agents interpret the concepts with respect to the same
ontology. Using the same standards will enable the reuse of the defined information.
That is, the information is not annotated for a specific system, however the
annotation relies on some shared standards which makes it possible to be recognized
by different computer systems.
What the Semantic Web is NOT?
The Semantic Web is not Artificial Intelligence: The concept of machine-
understandable documents does not imply some magical artificial intelligence which
allows machines to comprehend human words and fully understand them as human
beings do [16]. Semantic Web only denotes a machine's ability to solve a well-
defined problem by performing well-defined operations on existing well-defined
data. Instead of asking machines to deduce people's language, it involves asking
people to make the extra effort so that the machines are able to process the data in
some specific way.
Even though it is simple to define information with languages such as RDF, at
the level with the power of a Semantic Web these language will be complete
languages, capable of expressing paradoxes and tautologies. And it will be possible
to phrase questions whose answers normally would require a machine to search the
11entire Web and take an unpredictable amount of time to find the answer. This should
not keep us away from making these languages complete. Each mechanical
application relying on such languages will use a schema to restrict its use to an
intentionally limited language. However, when links are made between the Webs
relying on such languages, the result will be an expression of a big amount of
information. It is obvious that because the Semantic Web must be able to include all
kinds of data to represent the world, the languages must be completely expressive.
A Semantic Web will not require every application to use expressions of arbitrary
complexity: Even though the languages used to define information allow expressions
of arbitrary complexity and computability, applications which generate semantically
defined information will in practice be limited to generating simple expressions such
as access control lists, privacy preferences, and search criteria.
A Semantic Web will not require proof generation to be useful: proof validation
will be enough: Although access control on Web sites involve validation of a
previously prepared proof, there is no requirement for them to answer an arbitrary
question, find the path and the construct of a valid proof. It is well known that to
search for an answer for an arbitrary question and generate a proof for the question is
typically an intractable process as many other real world problems, and a Semantic
Web language does not require this (unsolvable) problem to be solved in order to be
useful.
A Semantic Web is not an exact rerun of a previous failed experiment: Until now
other concerns has been raised against the Semantic Web concept such as the relation
to Knowledge Representation Systems. More or less, such systems have tried to
achieve similar results as the Semantic Web concept is trying to do. Systems such as
KIF [97] and CYC [98] [99] are some examples of such Knowledge Representation
Systems. However the success or failure of such systems should not be a threshold or
limit for the Semantic Web concept/project. A more constructive approach would be
to feed the Semantic Web with design experience and the Semantic Web may
provide a source of data for reasoning engines developed in similar projects such as
those that utilize Knowledge Representation systems.
123.2 Information Retrieval with Semantic Web
Machine to Human: The addition of semantic annotations to Web documents
would improve information retrieval in various ways yet unimagined. As Tim Bray
said, search engines "do the equivalent of going through the library, reading every
book, and allowing us to look things up based on the words found in some text" [66].
If more descriptive metadata were available, one would not, as when using Web
search engines; have to rely on the popularity of the resource as an assurance of its
relevancy. How can we be sure that often accessed information against some queries
is relevant to each other? We cannot be sure that such relations always hold.
Librarians, who often act as human mediators between the complex relations of
structured information and the often unformulated queries of the information seeker
know that information retrieval is often incomplete even when information is
organized well. When organized badly or not at all, the consequences are failure in
retrieving information.
Human to Machine: Tim Berners Lee discussed as illustrated in the reference
[42] how content-aware “agents” using semantic information could be used to
conduct research efforts into everyday tasks such as investigating health care
provider options, prescription treatments, or available appointment times. Each of
these tasks now is usually conducted by a human researcher assigned for this task. If
one has left the task to take a trip, he/she must investigate the best price for an
airplane ticket, (even though some of this information is already collected), and
match the information about available flights with available times from a personal
calendar. This sort of research is conducted daily and one takes for granted the
mental and representational systems needed to ask a question, investigate an answer,
pull related information together, select the information which is relevant to the
inquiry and initiate another set of actions based on this selection.
Researchers on artificial-intelligence [68] [69] have been working on methods to
automate these kinds of tasks and processes for many years. Such researchers have
developed several approaches that in the future may be applicable to the Semantic
Web.
133.3 Semantic Web Tools and Languages
During the last few years, several ontology languages [4] [17] [21] [71] [72] have
been developed. All of these languages are based on XML [23] syntax, such as XOL
[25] (Ontology Exchange Language), SHOE [26] (Simple HTML Ontology
Extension) which was previously based on HTML, and OML (Ontology Markup
Language), whereas RDF [27, 28, 29] (Resource Description Framework) and RDFS
[30] (RDF Schema) are languages created by W3C (World Wide Web Consortium)
group members. Two additional languages are being built on top of the union of RDF
and RDF Schema with the objective of improving its features; these are OIL
(Ontology Inference Layer) and DAML+OIL [32] (Darpa Agent Markup Language).
Semantic Web languages such as XML, RDF, RDFS, DAML+OIL, OWL [33,
34, 35] are used to organize, integrate and navigate the Web; at the same time
allowing content documents to be linked and grouped in a logical and relevant
manner. With the information environment that these standards can create, users can
search and browse information resources in an intuitive way with the help of content-
aware machines/computing systems.
All of these languages that are oriented to create the Semantic Web are structured
languages and with this feature they can carry on meaning besides giving structure to
the text. Also they have different characteristics compared to each other. Some of
them are relatively new languages, and the newly available languages aim to make
progress from the previous ones, evolving and improving their characteristics to
support the Semantic Web concept.
The reached semantic power is at different levels, some languages provide
meaning to the text/information; others go further and make assertions and inference
of knowledge and facts etc. possible as well.
Some important languages in chronological order [17] [18] [20]:
• Standard Generalized Markup Language (SGML)
• eXtensible Markup Language (XML)
• Resource Description Framework (RDF)
14• Darpa Agent Markup Language - Ontology Inference Layer (DAML+OIL)
• Web Ontology Language (OWL)
In the context of the Semantic Web, a major effort is devoted to the realization of
machine processable semantic meaning, expressed in meta-models such as RDF,
OIL, OWL, DAML+OIL and based on shared ontologies. Still, these approaches rely
on common ontologies being able to be merged, to which existing information
sources can be related by proper annotation. This is an extremely important
development, but its success will heavily rely on the wide standardization,
acceptance of different languages and adoption of common ontologies or schemas.
In the Semantic Web, all the necessary information resources (data, documents
and programs) will be made available along with various kinds of descriptive
information and annotations, i.e., metadata. A clear defined knowledge about the
meaning, usage, accessibility or quality of Web resources will considerably facilitate
automated processing of all the available Web content/services. The Semantic Web
will allow both human beings and machines to query the Internet as if it were a huge
database.
To allow the realization of such a concept, besides the Web languages, different
tools also have to be developed in order to infer information from the Web. Inference
does not only depend on the languages but also on the different tools that are
currently being developed around the languages.
3.3.1 SGML (Standard Generalized Markup Language)
It is a system for organizing and tagging elements of a document. SGML was
developed and standardized by the International Organization for Standards (ISO) in
1986 [70].
3.3.2 XML (eXtensible Markup Language)
The XML [14] [19] is a meta-language for defining application specific markup
tags and it is the universal format for structuring Web documents and data on the
Web which is also proposed by the W3C. The main contribution of XML is
15providing a common and communicable syntax for Web documents. XML itself is
not an ontology language, but XML Schemas [24], which define the structure,
constraints and the semantics of XML documents, can be used to specify ontologies.
But since the aim of the creation of XML Schema is the verification of XML
document and its modeling primitives and these tasks are more application oriented
rather than concept oriented, XML Schema will not be considered as an ontology
language.
The only reasonable interpretation is that XML code contains named entities with
sub-entities and values; that is, every XML document forms an ordered, labeled tree,
which is because of the both XML’s strength and its weakness. It is possible to
encode all kinds of data structures in an unambiguous syntax, but XML does not
specify the data’s use and semantics. The groups that use XML for their data
exchange must agree beforehand on the vocabulary, its use and meaning.
Why Meta Data Is Not Enough: XML metadata is a form of description of
available data within some document or information. It describes the purpose or
meaning of raw data via a text format to more easily enable information exchange,
interoperability, and application/platform independence [5]. As a description, the
general rule is accepted as “more is better.” Meta data increases the usability and
granularity of the defined data. The way to think about the current state of metadata
is that words (or labels) are attached to the data values in order to describe it. While
the moving toward metadata evolution will not follow natural language descriptions,
it is a good analogy to that only the words are not enough. The motivation for
providing richer data description is to move data processing from being static and
mechanistic to dynamic and adaptive.
For example, we may be enabling our systems to respond in real time to a
location-aware cell phone customer who is walking in a store outlet. If a system
could match consumers’ needs or past buying habits to current sale merchandise, the
revenue would increase. Additionally, the computers should be able to support that
sale with just-in-time inventory by automating the supply chain with its partners. The
general rule is: The more computers understand, the more effectively they can handle
complex tasks.
16All the possible ways a semantically aware computing system can drive new
business and decrease the operation costs have not yet been invented. However, to
get there, it must push beyond simple metadata modeling to knowledge modeling and
standard knowledge processing. There are three emerging steps beyond simple
metadata: semantic levels, rule languages, and inference engines. These are the
backbones of the Semantic Web.
3.3.3 RDF (Resource Description Framework)
RDF is a document structure for encoding, exchange and reuse of structured
metadata that is also proposed by W3C [14] [19]. In order to represent metadata in
XML, RDF provides a standard form. The RDF data model consists of three object
types:
Resources: All things being described by RDF expressions are called resources.
A resource could be an entire Web document; such as the well known HTML
document "http://www.w3.org/Overview.html" for example. A resource may be a
part of a Web page; e.g. a specific element within the document source of an HTML
or XML Web document. A resource may also be a large collection of Web
documents; e.g. an entire Web site. A resource could also be a Web object not
directly presented on the Web; e.g. a printed book.
Properties: A property is a specific aspect, characteristic, attribute, or relation
used to describe a resource. Each property has a specific meaning, defines its
permitted values, the types of resources it can describe, and its relationship with
other properties. This document does not address how the characteristics of
properties are expressed; for such information, one should refer to the RDF Schema
specification.
Statements: A specific resource together with a named property and the value of
that property for that resource is defined as an RDF statement. These three individual
parts of a statement are called the subject, the predicate, and the object of that
statement respectively. The object of a statement (i.e., the property value) can be
another resource or it can be a literal value; i.e., a resource (specified by some URI)
or a simple string or any other primitive data type defined by XML. Speaking in
17RDF, a literal may have a content that is XML markup, however is not further
evaluated by the RDF processor.
RDF does not have any specific mechanisms to define relationships between
these object types, but the RDF Schema (RDFS) Specification Language does.
Although the main intention of RDFS is not for ontology specification, RDFS can be
used directly to describe ontologies. RDFS provides a standard set of modeling
primitives for defining ontology (class, resource, property, “is a” and “element-of”
relationships etc.) and a standard way to encode them into XML. But, since axioms
cannot be defined directly, RDFS has a rather limited expressive power. And also,
the relation between ontology and RDF(S) is much closer than that of between
ontology and XML.
Basically, the RDF data model consists of statements about resources, encoded as
object-attribute-value triples. The objects are resources, the attributes are properties
and the values are resources or strings. For example, to state that “Zeynep” is the
author of the article at a specific URL (Uniform Resource Locator), one would use
the triple: http://www.somewhere.com/#article, has author, “Zeynep”. Attributes,
such as “has author” introduced in the previous example, are called the properties.
3.3.4 RDFS (RDF Schema)
The important feature of RDFS when concerned with ontologies is that RDFS
expresses class-level relations describing acceptable instance-level relations.
RDF Schema is a language layered on top of the RDF language. This layered
approach has been presented by the W3C organization and Tim Berners-Lee as the
“Semantic Web Stack” of layers of different languages or concepts all related to each
other [30] [71] [72]. The base layer of the stack is the concepts of universal
identification (URI) and a universal character set (Unicode). Above those concepts,
the XML Syntax is layered (elements, attributes, and angle brackets) and namespaces
to avoid vocabulary conflicts so that every domain can identify names only required
to be unique within the local domain. The layers above XML are the triple-based
assertions of the RDF model and syntax discussed in the previous section. If a triple
is used to denote a class, class property, and value, it will be possible to create class
18hierarchies for the classification and description of different objects. This is the goal
of RDF Schema.
The data model expressed by RDF Schema is the same data model used by
different object-oriented paradigms e.g. programming languages like Java. The data
model for RDF Schema allows creating classes of some information within a
domain. A class is defined as a group of things with distinct features and with some
common characteristics. In object-oriented programming (OOP), a class is defined as
a template or a type definition for an object (instance) composed of characteristics
(also called data members or fields) and behaviors (also called methods or functions).
An object is a single instance of a specific class. Object-oriented languages also
allow classes to inherit characteristics and behaviors from a parent class (also called
a super class). All these concepts are more or less very similar to the model used by
RDF Schema.
Above RDF Schema the ontologies layer is residing. Above ontologies, logic
rules can be added about things defined in the ontology. A rule language will make it
possible to infer new knowledge and make decisions. Additionally, the rules layer
provides a standard way to query and filter out data from RDF. The rules layer is sort
of an “introductory logic” capability, while the actual logic framework will be
“advanced logic.” The logic framework allows formal logic proofs to be shared.
Lastly, with such proofs, it will be possible to establish a trust layer for levels of
application-to-application trust. This “Web of trust” forms the third and final Web in
Tim Berners-Lee’s three-part vision expressed as collaborative Web, Semantic Web
and Web of trust.
3.3.5 OIL (Ontology Inference Layer)
OIL was developed in the OnToKnowledge Project [17] [19] [41], and is both a
representational and exchangeable language for creating Web ontologies. The
language is combined with primitive elements from frame-based languages, formal
semantics and reasoning services from description logics. To enable the use of OIL
on the Web it is based on the W3C standards, XML and RDF(S). The ontology
description is divided into three different layers: object level (concrete instances),
19first meta-level (ontological definitions) and second meta-level (describing features
of the ontology). The OIL ontology language provides definitions for classes and
class relations, and a limited set of axioms enabling the representation of different
classes and their properties. Relations (also called slots) are treated like first-class
citizens, and can be represented in different hierarchies. Although it has some
limitations, OIL can provide precise semantic meaning which will enable reasoning
systems to process the defined information effectively.
As mentioned in the above paragraph, OIL is built on top of RDF(S), and has the
following layers: Core OIL groups the OIL elements/primitives that have a direct
mapping to RDF(S) elements/primitives; Standard OIL is the complete OIL model
with all its features, using more primitives than the ones defined in RDF(S); Instance
OIL adds instances of different concepts, classes and roles to the previous model;
and Heavy OIL has been designed as the layer for future extensions of the OIL
language. OILEd, Protégé-2000, and WebODE are some powerful ontology editors
that can be used to author OIL ontologies (as well as other Web ontologies). Another
feature of OIL is that its syntax can also be expressed in ASCII which is not XML
compliant.
3.3.6 DAML+OIL (DARPA Agent Markup Language - OIL)
These two languages are the XML and Web-based languages to support the
development of Semantic Web.
DAML+OIL [11] is a descriptive semantic markup language for Web resources
which is built on top of earlier defined languages such as RDF and RDF Schema, and
extends these languages with richer modeling primitives enabling reasoning systems
to process it more effectively. DAML+OIL was developed by the Defense Advanced
Research Projects Agency (DARPA) [20] under the DARPA Agent Markup
Language (DAML3) Program.
With DAML+OIL, in order to make the information/data yet more expressive
and powerful, it is possible to use description logic to describe the data enabling it to
be processed on reasoning systems. In this way, not only the explicitly given data
will be available but some new facts and conclusions will be available about the data
20provided. In order to achieve this extra feature, the DAML+OIL is a suitable
language because of its expressiveness with descriptive logic. For achieving this
extra power, an extension of RDF, called DAML+OIL, can be used. DAML+OIL is
a description logic language disguised in an XML format.
DAML extends RDFS in the following ways:
• Support of XML Schema data types rather than just string literals and
primitive data types such as dates, integers, decimals, etc.
• Restrictions on properties like cardinality constraints.
• Definition of classes by enumerations of their instances.
• Definition of classes by terms of other classes and properties. In order to
enable the definition from other classes different expressions has been
defined such as; unionOf, intersectionOf, complementOf, hasClass and
hasValue which some of them has their roots in classic set theory.
• It is possible to make Ontology and instance mappings (sameClassAs,
samePropertyAs, sameIndividualAs, differentIndividualFrom) permitting
translation between ontologies.
• Additional hints to reasoning systems such as; disjointWith, inverseOf,
TransitiveProperty and the UnambiguousProperty.
DAML is not completely developed yet. Even though it was actually the
recommended ontology language by the World Wide Web Consortium, a new
project called Ontology Web Language (OWL) has been developed to replace
DAML. The OWL project has removed some of the requirements specified for
DAML language, as rules, queries and services are still under development.
Description Logics
Description logics (DLs) [9] [39] are a family of knowledge representation
languages that can be used to represent the knowledge of an application domain and
is very well-suited to provide structure to information. Description Logics is a subset
of First Order Logic, which is non functional and does not allow explicit variables. It
is less expressive in favor of having greater decidability when processed by inference
21procedures. Description Logics is different form predecessors, such as semantic
networks and frames, in that they are equipped with a formal, logic-based semantics.
High quality Web ontologies are necessary for the Semantic Web to be
successful, and their construction, integration, and evolution is greatly dependent on
the availability of a well-defined semantics and powerful reasoning systems. Since
DLs provide these aspects, they should be ideal candidates for creating and
developing ontology languages. That much was already clear ten years ago, but at
that time, there was a fundamental mismatch between the expressive power and the
efficiency of reasoning that DL systems provided, and the expressivity and the large
knowledge bases that ontologists needed. Through the basic research in DLs in the
last 10 to 15 years, the gap between the needs of ontologists and the systems that DL
researchers provide has finally become narrow enough to build stable bridges.
3.3.7 OWL (Web Ontology Language)
OWL Ontology: Ontology is a term borrowed from philosophy which refers to
the science of describing the kinds of entities in the world and how they are related to
each other. An ontology created with OWL may include descriptions of classes, their
instances and properties. Given such an ontology, the formal semantics of OWL
specifies how to derive its logical meaning not given explicitly, i.e. facts that are not
present in the ontology, but derived by the semantics. These derivations may be
based on a single OWL document or multiple distributed documents that have been
combined with OWL mechanisms allowing such extendable ontologies. The Web
Ontology Language is developed and produced by the W3C Web Ontology Working
Group (WebOnt).
The Web Ontology Language OWL [11] [38] is a semantic markup language for
publishing, extending and sharing ontologies through the Web. OWL is developed as
a vocabulary extension of the formerly developed RDF and is derived from the
DAML+OIL Web Ontology Language adding some extra features and discarding
some of the specifications intended for DAML+OIL. It is a revision of the
DAML+OIL Web ontology language including lessons learned from the design and
application of DAML+OIL ontology language.
22OWL can be used to explicitly represent the exact semantics of classes within
some domain and the relationships between those classes (and instances). OWL has
more expressive semantic power than XML, RDF, and RDFS, and thus goes beyond
these languages ability to represent machine readable content on the Web.
In the comparision of OWL to XML and XML Schema, two points must be
mentioned:
• An ontology differs from an XML Schema in a way that ontology is a
knowledge representation, not a message format. Most Web standards based
on industrial corporations consist of a combination of different message
formats and protocol specifications. These formats have been given an
operational semantics, such as, "Upon receipt of this PurchaseOrder message,
transfer Amount dollars from AccountFrom to AccountTo and ship the
product purchased." That is each of the steps in the semantics is precisely
defined. However, this kind of a specification is not designed to support
reasoning outside the transaction context. It is fixed on the well defined steps.
For example, in general it is not possible to have a mechanism to conclude
that because the Product is a type of Chardonnay it must also be a white
wine. Such kind of reasoning and conclusions are essential in Semantic Web.

• One advantage of OWL ontologies is the availability of different tools that
can reason about them (For example Racer which reasons on OWL
ontologies and derives new facts from given statements). Such tools will
provide generic support that is not specific to a particular domain, which
would definitely be the case if one were to build a system to reason about a
specific industry-standard XML Schema. Developing a useful reasoning
system is not a simple task to accomplish. Developing an ontology is much
more tractable and feasible.
The OWL language provides three increasingly expressive sublanguages
designed for different users in specific communities.
23OWL Lite: OWL Lite is targeted for users only needing simple constraint
features and classification hierarchies. For example, even though OWL Lite supports
cardinality constraints the cardinality values are restricted. For such constraints only
the values 0 and 1 is allowed. It is much simpler to provide tool support for OWL
Lite than it is for its more expressive relatives. This will allow easy migration to
OWL Lite from different ontology languages being used on the market.
OWL DL: OWL DL supports users who want the maximum expressiveness
without the lack of computational completeness (all entailments are guaranteed to be
computed) and decidability (all computations will finish in finite time) of reasoning
systems. OWL DL includes all OWL language constructs with restrictions such as
type separation (a class cannot also be an individual or property, a property cannot
also be an individual or class) enabling to create distinct definitions. It is named as
OWL DL because of its correspondence to Description Logic [39], a field of research
that has studied a decidable fragment of first order logic. OWL DL was designed so
that it has desirable computational properties for reasoning systems.
OWL Full: OWL Full is targeted for users who want maximum expressiveness
and the syntactic freedom of RDF with no computational guarantees. Decidability
and completeness properties have not been restricted as it is in OWL DL. Type
separation is not as strict as it is in OWL DL. For example, in OWL Full a defined
class can be treated as a collection of different individuals and as an individual in its
own right simultaneously. Another important difference from OWL DL is that in
OWL Full a owl:DatatypeProperty can be marked as an owl:InverseFunctional-
Property. OWL Full allows an ontology to incorporate the meaning of a pre-defined
(RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to
support every feature supported by OWL Full.
Each of the sublanguages mentioned above is an extension of their simpler
predecessor; both in what can be legally expressed in the ontology and in what can
be validly concluded in it. The following set of relations hold, but their inverses do
not.
• Every legal OWL Lite ontology considered legal OWL DL ontology.
24• Every legal OWL DL ontology considered legal OWL Full ontology.
• Every valid OWL Lite conclusion considered valid OWL DL conclusion.
• Every valid OWL DL conclusion considered valid OWL Full conclusion.
Ontology developers should consider which of the species best suits their needs
when choosing an OWL language. When making choice between OWL Lite and
OWL DL, the choice depends on whether the users need the more expressive
restriction constructs provided by OWL DL. Reasoning systems for OWL Lite will
have desirable computational properties. Reasoners for OWL DL will be subject to
higher worst-case complexity because of its more expressiveness compared to OWL
Lite. When considering OWL DL and OWL Full, the choice between them mainly
depends on the extent to which users require the meta-modeling facilities provided
by RDF Schema (i.e. defining classes of classes). Reasoning support is less
predictable when comparing OWL Full to OWL DL.
Moreover, OWL makes an open world assumption, that is, descriptions of
resources are not bounded to a single file or scope. While class C1 may be defined
originally in the ontology O1, it can also be extended in other ontologies. The
consequences of these additional propositions about C1 are not reversible. New
information cannot retract previous information where the new information is
originated from. New information from reasoning can be contradictory, but facts and
entailments can only be added and never deleted.
It is the responsibility of the designer of the ontology to take into consideration,
the possibility of these contradictions. It is expected that tool support will help
detecting such cases.
In order to write an ontology that can be interpreted unambiguously and used by
software agents, a syntax and formal semantics for OWL is required. In addition,
OWL is a vocabulary extension of RDF [37].
25
CHAPTER 4

ONTOLOGY, ONTOLOGY EDITORS AND
QUERY LANGUAGES

Shortly, an ontology [44] [45] [47] is conceptualization based specification of
classes or in other words “things”. An ontology is a detailed description of certain
concepts and the relations among them where the concepts are defined within a
specific domain. The usage of an ontology is consistent with the definition since it is
broken into simpler sets of such concept definitions and relations when being
processed. Even though the word “ontology” has its origin in philosophy, it is
understood in a completely different sense.
Ontologies are designed for the purpose of defining knowledge, reusing and
sharing it effectively. It is a formal definition for some ontological commitment so
that different parts can participate when relying on the same definitions and
vocabularies. An ontology is a set of definitions written using a formal vocabulary.
To specify a conceptualism this is the main approach used because it has some
properties enabling AI processing systems to share knowledge among them. In other
words, an ontological commitment is a kind of an agreement involving different
domain specifications to use a specific vocabulary when defining concepts. Different
processing systems are being built so that they can participate in such commitments.
That is they can be “connected” to some ontology without any conflicts with respect
to the definitions and the vocabularies used. An ontology is built so that such systems
can participate into them, and share knowledge among other systems.
Given a specific domain, an ontology defined for that domain is the base for the
knowledge to represent for that domain. An ontology enables the definition of a
26vocabulary in order to express the knowledge for some domain. Without a definition
of some vocabulary it is not possible to share knowledge among different
systems/agents. Simply said, there will be no common ground for such systems to
exist and share knowledge. A domain is a specific area of a subject or an area of
knowledge like medicine, economy, a specific research etc. Ontologies are used by
systems/agents such as databases application programs or any other thing that need
to share knowledge. Ontologies are being build up of basic concepts and the different
relations between them. The definitions of such concepts and relation are computer
usable so that computing systems can process these concepts and relations.
Simply said, defining an ontology is similar to defining a set data with all its
properties so other programs can use this data. Different computing systems as
domain independent applications and software agents use ontologies and knowledge
bases built on top of a set of ontologies.
Class definitions are the most common approach when defining some domain in
an ontology. Class definitions are suitable to define and describe the different
concepts within a domain. For example, a class defining a pizza represents all the
different pizza instances that exist. Any pizza is an instance of the class defining and
describing a pizza. Classes can have inheritance relations between them enabling the
definition of more specific classes from a given class. Definition of more general
classes is also possible. For example we can have subclasses of the class pizza as
“spicy pizza” and “non spicy pizza” where the class pizza is a super-class of these
two classes.
An ontology supports software agents, or in general all computer systems
requiring to share and reuse domain knowledge. Below are listed the important
features of an ontology are listed [47]:
• Ability to reuse domain language.
• Making domain assumptions explicit.
• Separation of operational knowledge and domain knowledge.
• Sharing the formal definitions and vocabularies when describing some
concept.
27• Analysis of domain knowledge.
There are many contradicting definitions of ontologies especially in the AI world.
An ontology is not directly a knowledge base. There is a thin line between the
definitions of these to concepts. Definitions of some knowledge for a domain, the
classes and the instances of these classes constitute a knowledge base. On the other
hand, an ontology is not much concerned with the individual instances. For example,
for an ontology the number of spicy pizzas is not important, rather the definition of a
pizza is essential for an ontology. The definition of knowledge is what ontologies a
more concerned about.
What can ontologies be used for?
Below is a list of major use cases of different ontologies identified by the Web
Ontology Working Group at W3C [16] [33] [47].
• Controlled vocabulary
• Web site or document organization and navigation support
• Browsing support
• Search support (semantic search)
• Generalization or specialization of search
• Sense "disambiguation" support
• Consistency checking (use of restrictions)
• Auto-completion
• Interoperability support (information/process integration)
• Support validation and verification testing
• Configuration support
• Support for structured, comparative, and customized search.
How are ontologies different from relational databases?
Although databases and ontologies have some similarities, they differ in many
important features. First of all an ontology is not storage for data but is a defining
model for the data whereas a relational database is a data repository. An ontology can
28be used as filter or a framework to access and manipulate data where a database can
be used to store the different data instances defined by the ontology. Another
important difference is querying. When making queries against a relational database
the returned data will be the same data stored previously, just matching some
conditions. However when making a query against an ontology, together with some
reasoning process, the returned data can be some inferred data which was not stored
previously but generated from some facts represented by the ontology. In ontologies,
queries can also be made for some specific relations while this is not possible with
ordinary relational databases.
How are ontologies different from object-oriented modeling?
An ontology is also different than the object-oriented paradigm even though there
are a lot things in common, especially when it comes to model real life with class
definitions. First of all, the whole concept of ontologies has its theoretical roots in
logic. Because of that ontologies allows reasoning systems to make automated
reasoning on the defined knowledge represented by the ontology. Another important
difference is the definition of properties. In an ontology, properties are treated as
first-class citizens while in the object-oriented paradigm this not true. In the object-
oriented world, properties are internal to class definitions. In an ontology it is
possible define multiple inheritance while this is not the case in the object-oriented
paradigm. In object-oriented modeling it is only possible to make single inheritance
between classes because of overlapping method signatures defined in different super
classes when participating in a multiple inheritance relationship.
Ontologies allows property inheritance while this not possible with object-
oriented modeling. While the ontologies allow user defined relations between
different classes, object-oriented modeling restricts the relation with the class sub-
class concept. However, because of the wide acceptance and use of object-oriented
modeling and UML, they are accepted as practical specifications when modeling
ontologies. But because of the lack of logic capabilities of the object-oriented
modeling approach these two different concepts cannot be fully combined and be
productive as they are defined today. Currently there is an on-going effort to add
29logic capability to object-oriented modeling, represented by OCL (Object Constraint
Language).
Some important aspects of ontologies are explained below [46]:
Kinds of Ontologies: Ontologies may differ with respect to different aspects
such as their implementation, content, level of description and the structure of
knowledge modeling.
Level of description: Ontologies can be built in several ways. The same
knowledge domain can be described in different ways. There is no unique perception
of a knowledge domain which results in a specific description. It is entirely
dependent on the different practitioners. Different vocabularies, terms and
taxonomies used can be given distinguishing properties where these properties can
make it possible to define new concepts where these concepts have some named
relationships with other concepts.
Conceptual scope: The scope and purpose of the different concepts can also be
different in ontologies. The clearest difference can be seen between ontologies where
one of them is modeling fields of knowledge domains such as medicine and more
high level ontologies describing basic concepts and relationship when domain
knowledge is expressed with natural language.
Instantiation: All ontologies have a terminological component which is
analogues to the relationship between an XML document and Schema. This
terminological component defines the vocabulary and structure of the domain the
ontology is intended to model. The second part called the asserted part, populates the
ontology with individual instances that are created on the ground established by
vocabulary and structure of the ontology. The second part can be separated from the
ontology implementation and be maintained in a knowledge base where access to
this knowledge base is controlled by the ontology itself. However, treating an
instance as an individual or treating it as a concept is entirely defined by the way the
specific way the ontology is defined.
30Building Ontologies: An ontology can be built in several ways depending on the
practitioners and the domain to be modeled. Below is list of the different ontology
building approaches.
1. Acquiring domain knowledge: Assembling all the information resources
that will define the consistency and terms used to formally describe the things
in a given domain. These information, concepts and relations must be
collected so that they can be described by a chosen language.
2. Organization of the ontology: Designing the overall conceptual structure of
the domain involving the identification of the domain’s specific concepts and
properties. Identifying the different relationship between the different
concepts and all the concepts that have individual instances.
3. Building detailed descriptions for the ontology: Adding concepts,
properties, relations and individuals according to the needs of the domain
being modeled.
4. Ontology verification: Checking inconsistencies among the ontology
element such as the syntax, logic and semantic properties. This can also be
based on automatic classification that defines new concepts from existing
concepts, class relations and properties.
5. Ontology Commitment: Final verification of the ontology and later
commitment of the ontology by deploying it into a target environment.
Why are ontologies important in computing?
Building systems relying on ontologies shows a great potential to make software
more efficient, adaptive and intelligent. It is one of the most promising areas in Web
technology which will enable the next break through in the Web. Still it is not widely
accepted and deployed but it has already been accepted by some industries. For
example some medicine industries are heavily using ontologies and contributing to
the development of it. The medicine community has produced the powerful ontology
editor Protégé [50] which is an ontology editor allowing the management and
development of ontologies. However it is still not being used by the majority of the
mainstream users because it is not a straightforward process to apply ontologies into
different software system dealing with knowledge. There is no standard way of doing
31things. However, it is only a matter of time that some more techniques will gain
attention after experience gained from the different subject fields using ontologies as
information representation technology.
However, the Semantic Web has completely changed their vision of the ontology
landscape to make it a more widespread applied technology. They make a great
effort on developing standard semantics markup languages based on XML, ontology
managements systems and different ontology management tools to make it easier to
adapt to ontologies and integrate them into computer systems. The use of ontologies
is newly being discovered in important applications heavily dealing with information
and the integration of different processes with information. Ontology is slowly
making its way into the software world as its usefulness is becoming clearer as time
passes.
Ontology Tools
Effective and efficient work with the Semantic Web must be supported by
advanced tools enabling the full power of this technology. In particular, we need the
following elements.
In order to effectively make use of the Semantic Web, the users must be
supported by different tools to be able to use all the power exposed by the Semantic
Web. The following list is the important elements needed to make use of Semantic
Web efficiently and effectively:
• Ontology editors to easily create and manipulate ontologies.
• Annotation tools to link information sources together with different
structures.
• Reasoning services to enable advanced query services and to map between
ontologies with different terminologies.
• Ontology library systems and Ontology Environments to create and reuse
ontologies. Such systems should in general allow merging different
ontologies sharing the same terminology.
32Inference engines can be used to reason about ontologies and the instances
defined by those ontologies and create new knowledge from existing knowledge.
Inference engines are similar to SQL (Structured Query Language) query engines
running against databases but provide stronger support for different rules which
cannot be represented in relational databases known today.
An example inference engine is Ontobroker [57] which is now a commercial
product. Ontobroker can automatically derive new concepts in a given concept
hierarchy when reasoning the concepts of an ontology. Another well known
inference engine is Racer [49] which can be used to implement industrial strength
projects which make uses of ontologies created with OWL/RDF.
Ontology Libraries and Environments
If we assume that we have access to various well defined ontologies, creating a
new ontology is only a matter of merging the existing ontologies and adding new
concepts. Instead of building the ontologies from scratch it will be possible to reuse
the existing ontologies. In order to do this mainly two types of tools are needed.
1. Tools to store and access existing ontologies.
2. Tools to manipulate and manage existing tools.
How to create and manage ontologies in order to make them reusable is far from
being easy. This is why ontology libraries are important. An ontology library makes
it easy to re-organize, group ontologies and merge them together so that they can be
reused, managed and integrated with existing systems.
In order to support ontology reuse, a system must support the following
properties:
• Ontology reuse by identification, versioning and open storage to enable
access to ontologies.
• Ontology reuse by providing support for specific task oriented fields to easily
adapt the stored ontologies.
33• Ontology reuse by constructing ontologies which fully supports the standards
available: Providing access to high level ontologies and standard
representation languages is an important issue when reuse is going to be
provided to its full potential.
Some examples of existing ontology library systems are: WebOnto [74] [84],
Ontolingua [75] [85], DAML Ontology library system [76], SHOE [26] [86],
Ontology Server [77], IEEE Standard Upper Ontology [78], Sesame [79],
OntoServer [80], and ONIONS [81]. ONIONS has been implemented in several
medical ontology library systems [83]. It is a methodology to enable integration of
existing ontologies. Comparisons together with detailed description for these library
systems can be found in article written by Ding & Fensel [82].
4.1 Ontology Editors
Most of the existing ontology editors [46] are sufficiently general purpose to
allow the construction of ontologies targeting as specific domain. Some of these tools
lack useful ontology export capabilities because they make use of an object-oriented
specification language to model information in a domain. Currently independent
tools to convert different specifications such as UML and DAML+OIL are being
developed.
Tools for ontology design and management:
Today, there are more than 90 tools available for ontology development from
both non-commercial organizations and commercial software vendors [47] [87].
Most of them are tools for designing and editing ontology files. Some of them may
provide certain capabilities for analyzing, modifying, and maintaining ontologies
over time, in addition to the editing capabilities. One of the more popular editing
tools is Protégé, developed by the Stanford University School of Medicine [88].
Other tools are SemTalk [89], OilEd [90], Unicorn [91], Jena [92], and Snobase [93],
to name a few.
Some of the different tools available can be integrated to each other enabling a
more complete development environment. For example the ontology editor Protégé
34can communicate with the Inference engine to make reasoning and consistency
checking on the ontology being built. Please refer to the detailed survey on different
ontology editors that is provided in Appendix – A.
Protégé: Protégé is a free, open-source, integrated and platform-independent
system for development and maintenance of ontologies [19] [50]. Currently, it is the
version 3.0 of the tool, and was developed by Stanford Medical Informatics. Protégé
has a frame-based knowledge model, which is completely compatible with OKBC
(The Open Knowledge-Base Connectivity protocol) enabling interoperability with
other knowledge-representation systems. Protégé enables a development
environment supported by a number of third party plug-in, targeted to the specific
needs of specific knowledge domains. It is also an ontology development platform
which can easily be extended to include various graphical components such as
graphs and tables, media such as sound, images, and video, and various storage
formats such as OWL, RDF, XML, and HTML.
Ontolingua: The Ontolingua system [46] [75] provides users with the ability to
manage, share and reuse different ontologies stored on an remote ontology server.
The system has been developed at the Knowledge Systems Laboratory at Stanford
University in the early 90s [19]. Ontolingua supports a wide range of translations
while most ontology editors only support a limited range of translations. It can easily
import and export constructed ontologies with the newer languages like DAML+OIL
and OWL.
WebOnto: WebOnto [74] can manage ontologies constructed in OCML. It is a
Web-based tool for browsing, editing and managing ontologies constructed with
OCML. WebOnto has been developed at the Knowledge Media Institute, at the
Open University as part of several European research projects in the late 90s [19]. It
is basically a Java based client application connected to a specific Web server having
access to ontologies constructed with OCML.
WebODE: WebODE is a workbench for managing ontologies on the Web. It has
been developed by The Ontology and Knowledge Reuse Group, at the Technical
University of Madrid [19]. It is built up based on three-tier architecture: the user
35interface, the application server and the database management system. The main
elements of the WebODE knowledge model are: concepts, groups of concepts,
relations, constants and instances of specific definitions.
OntoEdit: OntoEdit is developed by the Knowledge Management Group of the
University of Karlsruhe [19]. It is an ontology design and management tool whose
knowledge model is related to frame-based languages, and it supports multilingual
development.
OilEd: OilEd [90] is a development environment for ontologies constructed with
the OIL and DAML+OI languages. It can be integrated with a reasoner (FaCT) and
can extend the expressiveness of frame based tools. OilEd is a simple tool to make
demonstrations and ignores services and flexibility of ontologies.
4.2 Ontology Management System
An ontology management system for ontologies is similar to a database
management system for relational databases [47]. A DBMS allows an application to
access data stored in a database via a standard interface. The techniques for storing
and structuring the data is left to the DBMS itself so that the application does not
have to consider these issues. The DBMS system allows the application to access
data stored in the database with a query language (SQL) taking care of all the things
related to data storage, indexing of data and data file management. An ontology
management system allows access to ontologies in a similar way that a DBMS does.
The application making queries on an ontology through an ontology management
system does not have to worry about how the underlying processes relate to data
storage, and how structuring of data is done. Ontology editing capabilities are not the
central parts of an ontology management system, however some systems may
provide capabilities to edit ontologies programmatically through a programming
interface. In the case that such editing capabilities are not provided, developers can
choose to use some graphical editing environments such as Protégé.
36Snobase (Semantic Network Ontology Base) Ontology Management System
Snobase [47] [93] is an ontology management system providing capabilities to
loading files remotely or through any URL (Unifrom Resource Locator) for files
stored on some Web server somewhere in the world. It is possible to create, modify
and store locally created ontologies. With Snobase, it is possible to run queries
against the loaded ontology through a well defined programming interface.
Applications are allowed to access ontologies through standard ontology languages
such as RDF, DAML+OIL, and OWL. The system makes use of a persistent storage
for ontologies, built-in inference engine, a local ontology directory and source
connectors to application programs. Snobase is a Java package providing similar
capabilities as the JDBC (Java Data Base Connectivity) and returns query results
similar to the results set returned from queries made against a relational database.
Snobase currently supports a variant of OWL Query Language (OWL-QL) [94]
when making queries against an ontology model loaded into the persistent storage of
Snobase. OWL-QL is an ontological equivalent of SQL for the Snobase ontology
management system.
Jena Semantic Web Framework
Jena [92] is a Java framework for building Semantic Web applications
programmatically. Jena provides a programmatic environment for RDF, RDFS and
OWL ontologies, including a rule-based inference engine. Given an ontology and a
model, Jena's inference engine can make reasoning so that additional statements that
the model does not express explicitly can be derived. Jena provides several Reasoner
types to work with different types of ontologies.
Some important capabilities of Jena are listed below: