The application of Web Ontology Language for information sharing in the dairy industry

grotesqueoperationInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

246 εμφανίσεις

The application of Web Ontology Language for information
sharing in the dairy industry
by
Yongchun Gao
Department of Animal Science
McGill University, Montreal
August 2005
A thesis submitted to McGill University in partial fulfillment of requirements of
the degree of Master of Science
©
Y ongchun Gao 2005
1+1
Library and
Archives Canada
Bibliothèque et
Archives Canada
Published Heritage
Branch
Direction du
Patrimoine de l'édition
395 Wellington Street
Ottawa ON K1A ON4
Canada
395, rue Wellington
Ottawa ON K1A ON4
Canada
NOTICE:
The author has granted a non­
exclusive license allowing Library
and Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell th es es
worldwide, for commercial or non­
commercial purposes, in microform,
paper, electronic and/or any other
formats.
The author retains copyright
ownership and moral rights in
this thesis. Neither the thesis
nor substantial extracts from it
may be printed or otherwise
reproduced without the author's
permission.
ln compliance with the Canadian
Privacy Act some supporting
forms may have been removed
from this thesis.
While these forms may be included
in the document page count,
their removal does not represent
any loss of content from the
thesis.


Canada
AVIS:
Your file Votre référence
ISBN: 978-0-494-24672-6
Our file Notre référence
ISBN: 978-0-494-24672-6
L'auteur a accordé une licence non exclusive
permettant
à
la Bibliothèque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par télécommunication ou par l'Internet, prêter,
distribuer et vendre des thèses partout dans
le monde, à des fins commerciales ou autres,
sur support microforme, papier, électronique
et/ou autres formats.
L'auteur conserve la propriété du droit d'auteur
et des droits moraux qui protège cette thèse.
Ni la thèse ni des extraits substantiels de
celle-ci ne doivent être imprimés ou autrement
reproduits sans son autorisation.
Conformément
à
la loi canadienne
sur la protection de la vie privée,
quelques formulaires secondaires
ont été enlevés de cette thèse.
Bien que ces formulaires
aient inclus dans la pagination,
il n'y aura aucun contenu manquant.
Abstract
In this thesis the Semantic Web and its core technology - Web Ontology
Language (OWL) - were studied. Considering the features of the different units
involved in the dairy industry, OWL, in its capacity as an ontology description
language, can be used to encode and thus exchange ontology among the units in
the dairy industry. After creation of OWL file using Protégé, an OWL parser was
programmed to decode the ontology and data contained in the OWL file. Based
on these investigations, it was determined that OWL can be used to encode,
exchange, and decode data between farms and the units that interact with them,
although large volumes of data
among
the service agencies pose certain
challenges in terms of transfer size. A structure of the Semantic Web services in
the dairy industry is proposed for Semantic Web Service registration, search and
usage re1ated to certain farm-management tasks. With the help of the Semantic
Web and OWL, one can expect a more efficient data processing in the future
dairy industry.
1
Résumé
Dans cette thèse le Web sémantique et sa technologie de base - Langue
d'Ontologie de Web (OWL) - ont été étudiés. Vu les besoins des différents
groupes impliqués au sein de l'industrie laitière, OWL, en sa qualité de langage de
description d'ontologie, peut être employé pour coder et échanger l'ontologie entre
ces groupes au sein de l'industrie laitière. Après la création des dossiers OWL en
utilisant Protégé, un analyseur de OWL a été programmé pour décoder l'ontologie
et les données contenues dans le fichier OWL. Basé sur ces investigations, on a
déterminé que OWL peut être employé pour coder, échanger, et décoder les
données entre les fermes et les différents intervenants interagissants avec elles et
ce, malgré l'immense volume de données des agences de service qui posent
certains défis en terme de taille de transfert. Une structure des services
sémantiques de Web dans l'industrie laitière est proposée pour l'enregistrement
sémantique de service de Web, la recherche et l'utilisation liée
à
certaines tâches
de gestion
à
la ferme. Avec l'aide du Web sémantique et OWL, on peut s'attendre
à une gestion plus efficace des données au sein de l'industrie laitière dans un
avenir rapproché.
ii
Acknowledgments
l would like to thank many people that have supported and he1ped me in
the research.
Firstly, l wish to thank my supervlsor Dr. Kevin M. Wade for his
guidance, support and encouragement. l also appreciate his help in correcting this
manuscript. Secondly, l wish to thank Dr. René Lacroix for his great suggestions
and support. Thirdly, l wish to thank Dr. Laurence Baker for his valuable time
and support.
l also want to take this opportunity to thank Amy Wong for the fellowship
set by her to support students from China. Without the financial support, it could
be very difficult for me to come and pursue my studies at McGill University.
l wish to thank the staff of the Department of Animal Science for their
support, especially my colleagues in the Dairy Information Systems Group:
Diederik Pietersma, Annie St-Onge, Gayatri Boda, Jose Moro-Méndez, and
Marie-Claude Ferland.
l wish to thank my family for their love and support.
It
is the love and
support from my parents, my brother, and sisters that support me throughout my
study.
Finally l wish to thank my wife Li Song, for her love, support, patience,
and encouragement, which helped much for the completion of this thesis.
iii
Table of Contents
Abstract .................................................................................................................... i
Résumé .................................................................................................................... ii
Acknowledgrnents ............................. , .................................................................... iii
Table of Contents ................................................................................................... iv
List of Figures .................................................................... , ................................... vi
List of Tables ........................................................................................................ vii
1 Introduction ..................................................................................................... 1
2 Literature Review ............................................................................................ 3
2.1 Information fIow in the dairy industry .... , ..................... , ......................... 3
2.2 The integration of data resource on the farm .......................................... 7
2.3 Ways of data encoding ............................................................................ 9
2.3.1 Text file, Comma Separated Value (CSV) file ............................... 9
2.3.2 Fixed format file ........................................................................... 10
2.3.3 eXtensible Markup Language (XML) .......................................... Il
2.3.4 Document Type Definition (DTD) ............................................... 12
2.3.5 XML Schema ................................................................................ 12
2.4 Ways of data exchange ......................................................................... 14
2.4.1 The use of Electronic Data Interchange (EDI) in the dairy industry
15
2.4.2 Agents and multi-agents systems .................................................. 17
2.5 A promising new technology: the Semantic Web ................................. 17
2.6 Description of the Semantic Web ......................................................... 19
2.7 Ontology ............................................................................................... 22
3 Materials and Methods .................................................................................. 24
3.1 The data used in the thesis .................................................................... 24
3.2 The OWL editor: Protégé with OWL plug-in ....................................... 25
3.3 Visual Basic .Net .................................................................................. 25
3.4 Database: Microsoft Access 2002 ......................................................... 26
3.5 Methods frame ...................................................................................... 26
3.5.1 Description of data by RDF .......................................................... 27
3.5.2 Description of ontology and data by OWL. .................................. 28
3.5.3 How to encode data in OWL message .......................................... 30
4 Results and Discussion ................................................................................. 32
4.1 OWL files and their parser .................................................................... 32
4.1.1 OWL file creation by Protégé ....................................................... 32
4.1.2 OWL file simple parser ................................................................. 34
4.1.3 OWL database ............................................................................... 37
4.2 Comparison of different encoding data formats ................................... 42
4.2.1 XML's advantages over fiat files for data storage and exchange. 43
4.2.2 Differences between the XML and OWL format ......................... 44
4.2.3 Advantages of OWL ..................................................................... 46
4.2.4 Disadvantages of OWL. ................................................................ 48
4.3 Data exchange volume matters ............................................................. 49
IV
4.4 Applying the Semantic Web to infonnation systems in the dairy
industry ............................................................................................................. 50
4.4.1 The potential value ofthe Semantic Web in the dairy industry .... 50
4.5 Possible scenarios within a Dairy Semantic Web ................................. 52
5 Conclusion .................................................................................................... 55
6 References ..................................................................................................... 57
7 URLs ............................................................................................................. 67
8 Appendix 1. Structure of the ontology used in this thesis ........................... 68
9 Appendix 2. An example ofOWL file created by Protégé .......................... 69
v
List of Figures
Figure 2-1 Sorne ofthe many information flows in the dairy industry .................. 4
Figure 2-2 The structure ofthe Semantic Web (Bemers-Lee, 2000) ................... 20
Figure 3-1 Encoding/decoding and transferring of message in OWL message .... 27
Figure 4-1 Screen captures of structures of classes and properties in Protégé ..... 33
Figure 4-2 A snapshot of the OWL parser-Parse OWL File .............................. 36
Figure 4-3 A snapshot of the OWL parser-Property View ................................ 36
Figure 4-4 A snapshot of the OWL parser-Class View ..................................... 37
Figure 4-5 Units and their Semantic Web services in the dairy industry ............. 51
vi
List of Tables
Table 2-1 Client ...................................................................................................... 9
Table 2-2 The start column and width of each field ............................................. 10
Table 3-1 PATLQ (Programme d'analyse des troupeaux laitiers du Québec) Data
....................................................................................................................... 24
Table 3-2 CIAQ (Centre d'insémination artificiel du Québec) data ..................... 24
Table 4-1 The table ofClass ................................................................................. 38
Table 4-2 The table ofProperty ............................................................................ 39
Table 4-3 The table ofInstance originally before modification ........................... 40
Table 4-4 The table ofInstance after modification ............................................... 41
Table 4-5 The table of namespace ........................................................................ 41
Table 4-6 The table of ontology ............................................................................ 42
VIl
1 Introduction
Agriculture has been practiced since the beginning of human civilization in one
form or another and is destined to remain an integral part of our society. Within
agriculture, the dairy sector has become heavily dependent on machinery, such as
milking machines, automatic feeding systems, and mechanical vehicles (e.g.,
tractors), to release farmers from mundane or heavy physical work. Book keeping
and data analysis have also become very important with regard to sustainability
and competition within an area of production with relatively smaIl margins for
profit. Government, organizations and professional companies have developed
specific methods for dealing with the necessary data processing so as to provide
customized service to producers and their advisors.
The major goal for most dairy farmers is that of producing the most milk from the
least amount of inputs and investment. In order to achieve this, they often avail of
various services to help them in their tasks such as dairy-herd analysis services,
feed evaluation and veterinarian consultation, etc. To maximize the benefit of aIl
these services, there is a real need for sharing of the data among, and
communication between the various services.
The advance of science and technology has led and will continue to lead the
advancement of aIl the aspects of our life; this also applies to dairy industry. Over
the past fifteen years, the advance of information technology (IT) has dramaticaIly
changed every aspect of our life. Among the applications of IT in dairy industry,
there are management information systems which help farmers manage their
farms; there are milking robots which help them milk cows; there are different
kinds of machines which help farmers analyze the components of milk; and so on.
The network technology represents one of the most explosive IT areas; it has
influenced the dairy industry in several aspects, such as data exchange among
different units, publication of reports and searching for information on websites,
etc. There is still much work that might be done in such areas as e-commerce, or,
1
with regard to the dairy industry, the improvement of the virtual network among
all the major players and the producers themse1ves.
Since different units in the dairy industry have their own interesting focus and
different emphasis on the same aspect, the ontologies that they have are different
from each other. They also have specific data structures and formats for their in­
house computer data processing. This specification, on the one hand, may
acce1erate information processing within the unit, but on the other hand, it might
be a hindrance for information exchange among different units. The difficulty of
reaching a mutual ontology in certain areas is obvious, because different groups
of people have their own view of the situation, a different ontology to describe it,
and their own interest in the investment. Researchers have looked at the
possibility of uniting different ontologies but, as yet, little progress has been
achieved overall (Archer, 2000). Rather than unite these various ontologies in one
agreed ontology, there is another possible solution: leave each ontology
untouched and find a new approach to allow each to communicate efficiently and
effective1y with one another. A promising technology is developing that may have
considerable consequences in this regard - the Semantic Web.
The Semantic Web (Berners-Lee, 2001) currently represents one of the most
research areas, and holds much promise for the development of the current
Internet and how it behaves. While based on the framework of the CUITent
Internet, it focuses on the building of the data exchange framework through the
network. One can imagine that different units in the dairy industry could describe
the information in the common framework provided by the Semantic Web,
whereby not only human beings, but also machines could understand the
information provided by other units. Although the Semantic Web is promising to
smooth the information ex change, much work has to be done to achieve this goal.
In the set of Semantic Web technology, Web Ontology Language (OWL,
http://www.w3.org/2001lsw/WebOnt)
IS
especially useful for ontology
description and exchange, which is fundamental for data exchange. Semantic Web
2
services will most like1y be built on it in the near future, and it should bring a
more convenient service to all units in the dairy industry. The main goal of this
research was to explore the possibility of using the Semantic Web, especially
OWL, to make the data exchange more fluent and efficient between sample units
in the dairy industry.
2 Literature Review
The past decades have witnessed a rapid deve10pment in computers and networks
as well as their applications to the dairy industry. Since the computer was
introduced into the dairy industry for recording (see Tomaszewski, et al.,
2000),
it
has contributed much to the goal of producing more high quality milk. On-farm
computers, with appropriate management information systems, can help farmers
to record the performance of hislher cows compared with the average producer or
over time. On the other hand, agencies already exist in the dairy industry with
their own computer resources, such as dairy herd improvement agencies, breed
associations, artificial insemination centers, veterinarians, and feed companies.
They have their own reasons to exist independently, but they might all benefit and
indeed, help their common clients if their systems could communicate with each
other. Such an act of cooperation is less obvious than that among different
departments in the same unit. Inevitably, an important aspect of the cooperation
falls into information/data exchange.
2.1 Information flow in the dairy industry
It
is known that information flow is, in sorne sense, more important than product
transportation. Usually information flows in the same direction as the products. In
the dairy industry, the productlinformation flow can be divided into two parts: one
is the flow between companies and farms (e.g., feed/seed/fertilizer/pesticide/etc.),
and is re1ated to the input of the farm. The other is between a farm and various
organizations that are concemed with either administration or provision of
services (e.g., Agri-Tracabilité Québec inc., - http://www.agri-tracabilite.qc.ca.
Programme d'analyse des troupeaux laitiers du Québec - http://www.patlq.com or
3
Canadian Dairy Network - http://www.cdn.ca). These flows are often more
related to the output from the farm. An example of these connections can be seen
in Figure 2-1.
The registration number of cow works as an ID for data records.
It
is compulsory
that dairy farmers must register every cow in their herds in the Canadian Cattle
Identification Agency (CCIA, http://www.canadaid.com). which is a non-profit
agency set up to establish a national cattle identification prograrn. Through
registration,
it
is easier for farmers, organizations and governrnents to record and
retrieve each cow's information by its ID, for example in the case of a crisis
related to Bovine Spongiform Encephalopathy (BSE) or foot
&
mouth diseases.
Federation of
Figure 2-1 Sorne of the many information flows in the dairy industry
4
Milk recording agencies help dairy farmers record the data associated with their
cows and provide informationladvice for improving the milk quality and quantity.
Dairy farms have a strong relationship with milk recording agencies, such as
PA TLQ in Québec and Can West Dairy Herd Improvement across British
Columbia, Alberta, Saskatchewan, Manitoba and Ontario (CanWest DHI,
http://www.canwestdhi.com). These agencies visit farms anywhere from 4 to 12+
times a year and record/provide information about cows, milk production, herd
management, etc. These agencies can then obtain data from their members for
analysis and provision of advice on their CUITent situation (e.g., benchmarking) or
ways to improve their management practices. With such information, farmers can
maintain or improve the health and productivity of their cows.
A cow can also have a different ID if the owner registers her with a breed
association, such as Holstein Canada. Artificial Insemination centres like Centre
d'insémination artificielle du Québec (CIAQ, http://www.ciaq.com) may use the
Breed Association ID or yet another, such as the barn ID from the farm itse1f.
Nowadays, almost every unit has its own rule to assign an ID to a cow, instead of
using unique ID in the dairy industry, which is what CCIA is promoting. The fact
that we have ( and use) historical data for management purposes, also implies that
the need for saving and documenting these various IDs will continue. This is one
of the encumbrances that make the communication difficult between different
units in the dairy industry. By getting genotypic and phenotypic data from CDN,
CIAQ is able to recommend to farmers the best bulls for artificial insemination.
Nowadays, farmers receive information from Artificial Insemination Centers (AI
Centers) in which bulls' proofs are indicated for various traits. Farmers can
choose bulls aft:er balancing the traits for deficits in their own cows. AIso, major
AI centers have set up web sites, so farmers can have easy access to information
on the traits of the bulls.
Data ex change from one unit to another unit by way of CDN is one direction of
data exchange. For example, AI centers may obtain information from milk
5
recording agencies through CDN. Genetic infonnation cornes from CDN after
data have been processed by milk recording agencies (production) and breed
associations (confonnation). CDN's two main responsibilities are (1) the
provision of genetic evaluations of dairy cattle and (2) the establishment of a
national database for the dairy cattle improvement industry (CDN,
http://www.cdn.ca).This organization does not collect data directly from fanns,
but from the organizations which collect infonnation from fanns, as indicated in
Figure 2-1.
The quota system is also an important component to the dairy industry. Dairy
fanners derive benefits from the quota system administrated by the Canadian
Dairy Commission (CDC), which
"strives to balance and serve the interests of ail
dairy stakeholders- producers, processors, further processors, exporters,
consumers and governments"
(CDC, http://www.cdc.ca). The CDC collects aIl
the milk from fanners, and sends it to milk processors. Every fanner should sell
almost exactly the amount of milk designated in his/her quota. The quantity of
milk produced, processed and consumed is balanced by a quota system, which
decreases the competition between producers and processors.
More frequent data collection is possible with the advent of sorne advanced
milking systems (e.g., milking robots). Fanners or milk recording agencies can
record data every day instead of approximately once per month, which may help
fanners to know the situation of their cows, but we can not assure that fanners can
explain aIl these data as precisely as experts or their advisors. Unfortunately, it is
impossible for their advisors to constantly be available. An alternative might
involve sending aIl of the related data to an advisor for solution over the Internet.
Certainly there are other products and infonnation flow in the dairy industry, but
this brief introduction gives a very basic idea of the kinds of interactions between
fanners and organizations. AIso, while the data ex change network as indicated in
6
Figure 2-1 refers to the dairy industry in Canada, its structure may also be
relevant to other industries and other countries.
2.2 The integration
of
data resource
on
the farm
Farmers have many different sources for data collection which are very useful for
farm management. Different sources mean different formats. For example, a
farmer might check the milk meter for milk temperature and milk volume when
milking the cow, and write it down later for recording; a farmer might record a
cow' s disease history from the veterinarian; a farmer might get best match semen
for his/her cow. AIl these data could be written in paper and could be in different
formats in electrical copy.
Also the advancement of technology in many other fields, such as information
technology, has a positive influence on the dairy industry. There is an increasing
use of sensors that are used for detection ofmilk components, urine, etc. (Jenkins
and Delwiche, 2002; Claycomb and Delwiche, 1998; Eshkenazi, 2000).
Compared with laboratory testing, sensors can respond more quickly and are less
expensive although in sorne circumstance, they may be less precise. AIl these
sensors help the farmers monitor the situation of cows and make corresponding
decisions in a timely fashion. In most circumstances, farmers should collect the
information to analyze the situation and make a decision, while in other
circumstances they need help/advice from a domain expert. The advent of the
automatic milking systems (AMS) or robots helps, not only with milking the cow,
but also with the collection of data concerning the actual milk and the animal
herself (Armstrong and Daugherty, 1997; Artmann, 1997; Mol, 2001). AIso, the
software associated with an AMS may support information management and
decision making.
Communication problem anses when integrating data from different sources.
Many farmers are using computers for farm management (Lacroix,
et al., 2001).
Computers running Management Information Systems have been used in the dairy
7
industry for different purposes, and have had positive effects on reaching their
goals, directly or indirectly (Verstegen, 2001; Lewis, 1998; Asseldonk, 1999).
Problems arise when farmers have different sources of data, because the
management information systems in their computers may be unable to accept the
data from different sensors in different formats, and the exchange of data among
different software packages is also weak (Harsh, 1998).
It
costs a lot of resources
and energy to input or convert different format of data into one unique format for
processmg.
Decision making is based on a complete understanding of the situation concemed
of cows, farms and the overall industry. Players involved include processors,
organizations, companies, and areas of govemments. Among them, dairy herd
improvement agencies, breed associations and artificial insemination centres are
the most important and also quite close1y re1ated.
Communication problem exists in different units in the dairy industry. Many
producers, processors, organizations, companies, and areas of govemments are
involved in the dairy industry. From a management and economic point of view,
more efficient ways for data and products to flow among these units would seem
useful. Both computer and network technology have accelerated the speed of data
ex change and processing. However, due to the individuality of the units, data
structures are designed specifically for the unit itse1f. Most often data in one unit
are useful to another unit. But unfortunate1y the different data structures are a
technical obstacle for information sharing. This problem even exists between
different department within the same units or different version of software.
While one way to solve the communication problem would be to create a well­
recognized standard or data dictionary for data exchange, difficulties exist for
reaching certain standards, because different units may insist on using their own
format because of the historical reasons (e.g., data formats and/or data collection
methods). In Canada, CDN began a project to promote the standardization of the
8
data structure for the purpose of genetic evaluations once the importance of data
ex change among these units was noticed. Most of its work is based on flat files
and exchange of data in database backup, not in online messages among different
applications or computers.
Suppose there are N different units that need to communicate with each other.
One way to solve the problem is to create N*N documents to describe the relation
between each other. Another way is to create N document to describe the relation
with one "standard". Another way is to create one document, in which aIl the
relationships among different ontologies are described.
2.3 Ways
of
data encoding
The concept of data exchange implies that the two ends of data sender and
receiver must have a common understanding of data encoding and must,
subsequently, form an agreement. The following sections will illustrate typical
data encoding ways in recent time.
Before data can be exchanged, it has to be stored somewhere; typicaIly, a
database is a good choice, especially for large volume of data. A database is a
well-structured record-keeping system on a computer, in which data are stored in
different tables, fields and records. With index capability, the common operations
of insert, delete and change are done efficiently (Date, 2004). Here is an example
of a database, which has only one table. This table will be used as an example to
retrieve and encode data in various ways.
Table 2-1 Client
ClientID FamilyName GivenName Nationality Birthday
101
Legrand David France 7/5/1981
102 Baker Linda Canada 2/6/1982
2.3.1 Text file, Comma Separated Value (CSV) file
Generally speaking, "text" refers to letters of the alphabet, numerals, punctuation,
and a few common symbols. ASCII format is commonly used to encode western
9
text on computers. A text file can also be called an "ASCII text" or a "plain text"
file.
CSV file is a kind of text file, in which data are listed in columns, and each value
is separated by a comma and each new line represents a new set of data.
Applications may use CSV file to store or exchange data. Here is an example of
CSV file (size: 120 bytes) which is another format for the same information as in
Table 2-1.
ClientID,FamilyName,GivenName,Nationality,Birthday
101 ,Legrand,David,France, 7/5/1981
102,Baker,Linda,Canada,2/611982
2.3.2 Fixed format file
A fixed format file is a file in which each column has a constant width. Different
fields are joined directly, without any delimiter. Any vacancy in the file must be
filled by space or zeros, depending on the data type of the field. Each record is on
one line. Data in Table2-1 can be represented in fixed format like the following
(size: 92 bytes):
101 Legrand
102 Baker
David
Linda
France
Canada
7/5/1981
2/6/1982
The start column and width of each field name are listed below:
Table 2-2 The start column and width of each field
FieldName start width
ClientID 1
4
FamilyName 5 10
GivenName 15 10
Nationality 25 10
Birthday 35 10
10
From these two examples, we can see
 The use of a text file to store or exchange data is space efficient, thus
saving bandwidth when transferred on the network.
 Field names could be written in the first line of a CSV file.
 There should be a description document that states the field name, field
type, field length, etc., because the two end points of communication must
know the format of the file.
 Different fields are joined with comma (variation: commas can be
replaced by other proper symbol, e.g. Tab.) in CSV files.
 Bach record is in one Hne in both files.
 Lines with special symbol (e.g. commas, double-quotes, etc.) must be
processed differently in CSV files.
2.3.3 eXtensible Markup Language (XML)
XML has already been widely used in the exchange of a data on the Web and
different applications, although it has only come into existence in recent years,
and was originally designed to describe data for large scale electronic publishing
(XML, http://www.w3.org/XML).
When integrated with databases, XML files usually package individual records in
a pair of c10sed tags. These tags are not predefined and must be defined by the
user. Usually field names in database correspond to tags in an XML file. Here is
an example of an XML file (size: 404 bytes), in which the same information is
represented as in Table 2-1.
<?xml version="1.0"?>
<Client>
<row>
<ClientID> 101 </ClientID>
<FamilyName>Legrand</FamilyName>
<GivenName>David</Given>
11
</row>
<row>
</row>
</Client>
<N ationality> France</N ationality>
<Birthday>7/511981 </ Birthday >
<ClientID> 1 02</ClientID>
<FamilyName>Baker</FamilyName>
<GivenN ame> Linda</GivenN ame>
<N ationality>Canada</N ationality>
<Birthday>2/6/1982</Birthday>
2.3.4 Document Type Definition (DTD)
DTD, either internaI or external, defines elements and data structures of XML
documents. Programs can validate an XML document according to the rules laid
out by the DTD. Different applications can ex change data in XML format if they
have agreement on DTD, thus parsing and processing the XML document. Here is
a DTD for the XML file in last section (size: 258 bytes):
<!ELEMENT Client(row*»
<!ELEMENT row(ClientID, FamilyName, GivenName, Nationality, Birthday»
<!ELEMENT ClientID (#PCDATA»
<!ELEMENT FamilyName (#PCDATA»
<!ELEMENT GivenName (#PCDATA»
<!ELEMENT Nationality (#PCDATA»
<!ELEMENT Birthday (#PCDATA»
2.3.5 XML Schema
XML Schema is an XML based alternative to DTD, since it also defines the
structure of an XML document. Contrary to DTD, XML Schema is written in
XML itself, which means it is extendable. AIso, the data type and name space
12
support let XML Schema is a successor of DTD. Here is a part of XML Schema
file that associates with the XML file in section 2.3.3 (size: 721 bytes):
<xs:schema xmlns:xs=http://www.w3.org/2001lXMLSchema>
<xs:element name="Client">
<xs:complexType>
<xs: e1ement name="row">
<xs:complexType>
<xs:sequence>
<xs:e1ement name=" ClientID " type="xs:string"/>
<xs:simpleType>
<xs:restriction base ="xs:string">
<xs:maxLength value="50"/>
</xs:restriction>
</xs:simpleType>
<xs:e1ement name="FamilyName" type="xs:string"/>
<xs:element name="GivenName" type="xs:string"/>
<xs:element name="Nationality" type="xs:string"/>
<xs:element name="Birthday" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:complexType>
</xs:element>
</xs:schema>
Because of lack of vocabulary for c1ass and property, XML Schema has its
limitation in description for c1ass and property, although it has great capability to
describe instance.
13
2.4 Ways
of
data exchange
Data are usually distributed in different applications, computers and locations,
making the need for data ex change inevitable. Traditional media for data
exchange include paper and disk, and they are still in use today. Since its
appearance, the computer network has become a very important medium for data
ex change, through which data are transferred in the format of message packages
or files. In the case of large volumes of data exchange, the best way is to transfer
data electronically instead of exchanging data on paper. While the former is what
is currently happening among units in the dairy industry, this is simply a first step:
usually, data have to be processed, in a variety of manners, for their subsequent
use.
For small volumes of data exchange - which is the case with computer
management on individual farm - electronic transmission is only beginning to be
supported and used. One reason is that the percentage of on-farm computer
management systems is not so high and the large units in the dairy industry do not
provide enough service in this way; the other reason is that the software for farm
management is less available than in sorne other industries.
A database is another way to ex change data. Usually a database management
system (DBMS)
fUllS
on an individual server. Sorne DBMSs provide clients for
remote data access, so direct access to database is possible. Mostly databases are
used within application(s) in the same computer. From a technical view, it could
be used to ex change data among different applications on different computers,
although from an economic and security view, this is often not the case. If data
ex change is needed between two different computers, applications may utilize the
network facilities to access the database. Another way is to develop services that
retrieve data directly from a database and send them to a client (e.g.,
TCP/IP,
HTTP, FTP, SOAP etc.).
14
2.4.1 The use of Electronic Data Interchange (EDI) in the dairy
industry
Electronic Data Interchange (EDI) has been used in large companies since mid of
1960s as a means of exchanging business data in an agreed structured and
standard message format between different computers (Blacker, 1991).
OriginaIly, EDI was introduced in the logistics area for faster ex change of data as
weIl as a reduction of communication cost, errors and time. The advancement of
communication in recent decades has dramatically accelerated the increase ofEDI
users. At the beginning, the physical channel for data exchange was based on
Value Added Network (VAN). The message standard was gradually established
as more and more sectors joined the chain. Nowadays, the standard of Electronic
Data Interchange For Administration, Commerce and Trans-Port (EDIFACT)
prevails over, and will soon replace, ANXI X.12 as the standard for North
America (Blacker, 1991).
Electronic Data Interchange was first introduced into dairy industry in the
Netherlands in the 1980s (Koorn, 1996). Nowadays approximately 30% of Dutch
dairy farmers are using EDI for identification, registration, or studbook (Buiten,
et
al., 2003), and more potential application areas are continuously in development.
Of course those who use EDI must have management information system running
on a computer to manage the farms, thus the potential of EDI can be exploited.
According to research about the adoption and impact of EDI (Heck and Ribbers,
1999), one of the reasons that small business adopts EDI is external pressure from
the suppliers or clients who are using ED!. This pressure is evident by the fact that
many of those using EDI are only exploiting a small amount of its full potential.
Dairy farms currently fall into this category - a situation which is not likely to
change in the near future. However, due to external circumstances (lack of
alternatives, etc.) the number of dairy farms who use EDI is increasing, as is the
practice of associated organizations providing them with EDI data ex change
services.
15
On the other hand, the co st of communication is not necessarily reduced in the
case of a dairy farm. If the data exchange in the dairy industry happens as
frequently as big companies, the adoption of EDI is worthwhile. The high initial
costs require frequently use of EDI to compensate the investment. But the truth is
that in the dairy industry, especially on the farms, the data ex change is not so
frequent. This may be one of the main reasons why the adoption of EDI has not
increased so quickly in the dairy industry.
The extraordinary growth of the Internet has glven new impetus for EDI
applications, due to this alternate method of transmission (as opposed to VAN),
leading to a reduction in the heretofore high cost of development and
communication (Fu, et al., 1999). Different models have been proposed, but, here
again, the implementation is not straight forward (Canon, S., 1996; Fu, S., et al.,
1999; Mak, 1999). Concerns about the security of data ex change increase as the
Internet is open to everyone while VAN is restricted to specified companies. The
Secure Sockets Layer Proto col (SSL) or Transaction Layer Security (TSL)
provides security during data transmission (Chou, 2002), while XML Signature
provides protection even after XML Documents reach the end point (Han, etc.
2001). These two technologies may help to increase information security.
XML was developed by an XML Working Group of the World Wide Web
Consortium in 1996 (W3C, http://www.w3.org).
It
was originally designed to
describe data for large scale electronic publishing, but it has already been used in
the ex change of data on the Internet for different applications. Although XML
succeeds weIl for its description of message content, it will not soon replace EDI
as the way of commercial data exchange, due to the amount of investment in EDI
and associated back-office integration solutions (Jones, 2001). A combination of
XML and EDI model has, therefore, been introduced [e.g., vocabulary-based
mapping (Jones, 2001) and Web-EDI (Miyazawa, 2000)]. However, the lack of
the standardization is one of the obstacles that still exist.
16
2.4.2 Agents and multi-agents systems
Due to the distributed storage of data in the dairy industry, sorne researchers have
adapted agent technology for data transfer within or among computers via the
Internet (Parrott,
et al.,
2003). In this example, the program could request certain
data from an agent, and receive the reply from another agent by way of querying
data from a database over the Internet. Unfortunately, the lack of a standard
ontology across the industry hinders the potential implementation of such
programs, because of the different understanding and encoding of data across
individual units.
2.5 A promising new technology: the Semantic Web
The concept of the Semantic Web was introduced by Tim Berners-Lee in World
W3C (Bemers-Lee, 1998). He described the Semantic Web as an
extension of the
current web in which information is given well-defined meaning, better enabling
computers and people to work in cooperation
(Berners-Lee, 2001). Although it is
still in its early stages, the idea of machines understanding information by
defining metadata may also be applied in the data ex change of different packages
or applications.
The
current
World Wide Web is designed for the consumption of information by
human beings.
Since the 1990s, it has greatly influenced methods of information
exchange, which can be seen by the explosion in the number of website. But the
true potential of the data are often hidden in the sheer volumes of information.
Information content is presented primarily in natura1languages, and the format in
which documents are published mixes the content with presentation, which is easy
for humans to view. However, these documents are difficult for comput ers to
interpret. Sorne web pages have begun to use XML and XML Schema or DTD to
separate the content from the style, but the intent is to simplify the publication of
documents (Roy and Ramanujan, 2000; Roy and Ramanujan, 2001). Therefore,
it
does not address the problem of information interpretation by machines. This is
one of the reasons why search engines, such as Google (Cheery 2002; Li, 2000),
17
exist and are increasingly popular. The fact remains, however, that many search
engines use key words for searching, meaning that the application needs to
pro cess the web pages each time a query is made so that an index can be saved.
The search-engine results are often diluted by unrelated information. More often it
depends on the user to discriminate and interpret the result, which takes a lot of
time. Searching through multimedia resources also provides a significant
challenge for traditional search engines.
In the Semantic Web, metadata are used to describe web pages, which uses
Resource Description Framework (RDF, http://www.w3.orglRDF/) and OWL to
achieve their goal (Decker et al., 2000a; Zuo and Zhou, 2003). Knowing the
meaning of metadata helps computers to "understand" more of the web pages, so
as to save users' time by filtering sorne unrelated information by computers. Also
the description of resources could be a way to index the search of multimedia
resources.
The Semantic Web technology is also promising to overcome one disadvantage of
current web, the limitation of Web-based e-commerce. For example, the
information on features such as products' characteristics and prices can be
displayed on the current web. However, it can be arduous for potential customers
to compare that information among different providers of goods and services. A
technology that would facilitate, or even automate, such comparative tasks would
be useful. This is one of the potential applications of the Semantic Web: it will be
designed to help human in accessing information, and machines in processing
data that the Web provides. Also the encoding of data by OWL among peers can
help the presentation and understanding of ontologies from different groups. The
build-in features of OWL can describe the relations among different ontologies by
description of classes, properties and instances in an OWL file.
The Semantic Web technology may also have a role to play in enhancing the
utility of Web Services. Currently, difficulty exists in Web Service advertising
18
and matching, because of 1ack of prior agreement on the description of Web
Services. With the he1p of the Semantic Web Techno10gy, especially RDF and
OWL, one can describe the Web Services for easier pub1ishing and location of
needs. Possible applications of the Semantic Web techno10gy inc1ude e­
commerce, know1edge management / know1edge representation, search engines,
software agent, etc.
However, the Semantic Web will not be designed to process natura11anguage. So,
the intended goal may be difficu1t to achieve since the process invo1ves encoding
information in a way that simplifies machine processing and does not complicate
human interpretation. This 1eads to a semi-structured web - somewhere between
the two extremes of i) the CUITent state (sub-optima1 structure), and
ii)
the
situation of uniform databases with no variation.
2.6 Description
of
the Semantic Web
The Semantic Web is an ongoing international initiative, based on CUITent web
practices as well as artificia1 intelligence. The concepts, ideas and too1s are still in
the deve10pment phase.
It
is based on evo1ving standards deve10ped by W3C,
among which are RDF recommendations, OWL recommendation, and others.
Many researchers in universities and companies are collaborating in the
deve10pment of the necessary standards and applications. The proposed
architecture for the Semantic Web is illustrated in Figure 2-2. The first layer
inc1udes Unicode and Uniform Resource Identifier (URI,
http://www.ietf.org/rfc/rfc2396.txt). Unicode is a system that allows for any
existing character in the world (e.g., English 1etters, Chinese characters, etc.) to be
mapped to a unique sequence of numbers (Davis and Collins, 1990). In this way,
any p1atform, program or language using Unicode shou1d be able to understand
data provided from anywhere around the world. As Figure 2-2 indicates, the URI,
which is the foundation of the current web, will a1so be part of the basis of the
future Semantic Web. URI is used to identify resources that may be accessed over
the Internet (e.g. strings starting with "http:" or "ftp:"). URIs are decentra1ized,
19
and anyone can create them. They have been successfully used in the current web,
and will be inherited automatically in the Semantic Web.
Figure 2-2 The structure ofthe Semantic Web (Berners-Lee, 2000)
Above the Unicode and URI is XML, which is used to define a standard way to
add markup to documents for information exchange. XML was originally
designed for large-scale e1ectronic publication, but its increasingly important role
in the exchange of data on the CUITent web and e1sewhere makes it useful for data
ex change on the future Semantic Web (Zisman, 2000). An exarnple of an XML
segment follows:
<HerdID> 320</HerdID>
<NumOfCow>2</NumOfCow>
<Cows>
<Cow ID>HOCANF999999999999</Cow ID>
<Cow ID>HOCANF999999999998</Cow ID>
</Cows>
There are no preconceived semantics, so anyone can design his/her own document
structure and then write it in XML format. This markup is "machine-readable" if
machine knows the format (Decker
et al.,
2000b). As a result, the semantics of the
document must be defined by the applications that process them. This means that
the different formats among different people/programs currently prevent the
20
format from being used more broadly. Since XML appeared several years ago and
many applications are available for building and parsing XML code,
it
has been
widely adopted on the Web and in various softwares.
RDF is designed to represent information by defining general-purpose language
)
for describing relationships arnong resources. RDF Schema, an extension ofRDF,
defines metadata that may be used to describe classes, properties and other
resources (Decker
et al.,
2000b; Broekstra
et al.,
2000). RDF and RDF Schema
are based on XML, which are still recommendation, and have been developed in
recent years by W3C. There are many commercial and free software tools
available for the setup of RDF (Golbeck
et al.,
2002), based on the
recommendation and different languages like C and Java (RDF,
http://www.w3.orgIRDF/), such as Jena (McBride, 2002), Protégé (Noy
et al.,
2001) and SMORE (Kalyanpur).
Ontology is a set of knowledge terms which are shared in a domain. For the
purpose of knowledge representation, a set of classes, properties, relations and
other objects are created to describe the knowledge (Gruber, 1993). In order to
enable computers to deal with knowledge, W3C is developing the OWL (OWL,
http://www.w3.org/2001/sw/WebOnt/), which serves as metadata schemas and is
used to describe classes and relations among them.
It
extends the classes and
properties defined in RDF, for defining a flexible language to represent
information. The Semantic Web's success depends on the rapid and inexpensive
construction of domain-specific ontologies, while the creation of ontologies
depends on experts in each specific area. Tools to automate the creation,
maintenance and mergers of ontologies should be provided by pro grammers ,
based on recommendations from researchers involved in W3C (Golbeck,
et al,
2002; Noy, 2001).
The top layers in Figure 2-2 (Logic, Proof and Trust) are currently under research
by various groups, and simple application demonstrations are being constructed.
21
The simple classes and properties defined in OWL are helpful for these three
layers. The Logic layer enables the writing of rules while the Proof layer executes
the rules and, together with the Trust layer mechanism for applications, evaluates
whether to trust the given proof or not (Golbeck, 2003). Since the models oflogic,
proof, and trust are not ready yet, few tools are currently available. As in the early
stage of development of the World Wide Web, when few web pages existed, it
can be postulated that the Semantic Web will undergo a similar experience.
Integrated Development Environment (IDE), such as FrontPage and Visual
InterDev, accelerated the creation of sites in its present forms; such kinds of tools
will also be essential to set up the emerging Semantic Web. Although many
simple tools already exist, most of them are only concerned with sorne parts of its
architecture. IDE will facilitate the creation of Semantic Web sites which, in turn,
will stimulate the development of new tools.
2.7 Ont%gy
Ontology was borrowed from philosophy by the knowledge-engineering
community, and then it was defined as the shared understanding of sorne domain,
which consists of classes, properties, instances and relationships among them
(Gruber, 1993; Corcho et al., 2003). Categorized Internet search engines, which
arrange topics in categories and subcategories, are considered as one kind of
ontology (Chandrasekaran, et al., 1999). Yahoo search engine is one of such
examples. We can say ontology is a structured image of the real world in the mind
of a group of people. That means there is logic in the ontology domain.
The reason why ontology is so important is that ontology is the basis of
communication, knowledge sharing and learning, thus facilitating data exchange
among different groups/programs (Chandrasekaran, et al., 1999).
It
is also very
useful in database design and Object-Oriented Programming, both at development
time and run time (Guarino, 1998).
It
could be thought as different views of
description of the same or related are as , and it helps to unify disparate
representation of knowledge. For the representation of ontology, the object-
22
oriented approach is crucial (Deloule and Roche, 1995). Ontology creation may
have sorne foundation in object-oriented programming (OOP), but there are large
differences: "method" is the center of the design in OOP, while "class" is the
focus of the design in ontology. Many different ways of generating ontologies
have been proposed in the past decades (Ding and Foo, 2002).
Ontology can be represented by an ontology language, which has been
categorized as description logics. There are several kinds of languages available
such as KIF (Knowledge Interchange Format) (Genesereth and Fikes, 1992),
SHOE (Simple HTML Ontology Extensions) (Heflin et al., 1999), Ontolingua
(Farquhar
et
al., 1996), OIL (Ontology Inference Layer) (Fensel
et
al., 2000),
DAML+OIL (Horrocks
et
al., 2002), OWL (Corcho,
et
al., 2003), etc. OWL is a
typicallanguage that is used throughout this thesis, which is based on description
logic, formallogic and web standards (Ding and Foo, 2002; Corcho,
et
al., 2003).
Ontology development tools are essential for building ontologies. Most of the
tools are written in Java, such as WebODE (Arpirez, 2001), OILEd, OntoEdit
(Sure
et
al., 2002), Protégé (Noy
et
al., 2001; Noy
et
al., 2000), etc. The aspects
of architecture, extendibility, interoperability, stability, etc. are the main features
when we choose tools for developing an ontology. Protégé was chosen as the
developing tools in this thesis for various reasons: it is open source; it can import
from/export to XML and RDF; it has an OWL plug-in; and it supports reasoning
tools.
23
3 Materials and Methods
In this section, part of the simple onto10gy that was used in this thesis will be
presented and illustrated by a simple database examp1e. Then, using a
deve10pment too1 (Protégé
2.1),
the data will be encoded by OWL with simple
onto10gy in it. An OWL parser will then be prograrnrned by Visua1 Basic .Net to
save the onto10gy and data in an Access database. The onto10gy used in this
thesis is illustrated in Appendix 1 and its partial expression as an OWL file is
shown in Appendix 2.
3.1 The data used in the thesis
Two simple tables from one database are used for simple illustration of the
methodo10gy in this thesis.
Table 3-1 PATLQ (Programme d'analyse des troupeaux laitiers du Québec) Data
CowID MilkYie1d LactationNumber BodyWeight
Marigold
15
3
465
Prirnrose
32
2
560
Table 3-2 CIAQ (Centre d'insémination artificiel du Québec) data
CowNarne MilkYie1d LactationNumber LastServiceSire
Marigold
7492 3 H07118
Primrose
8325
2
H07442
A database structure is chosen as the exarnp1e since it is wide1y used in different
industries for data storage, exchange and programming. While the tables may
appear trivial, illustration of the concepts does not benefit from large exarnp1es;
however, the resu1ts can easily be applied to multiple databases containing
multiple tables - the case in dairy industry.
It
shou1d a1so be mentioned that the
two tables ab ove are created to provide a simple onto10gy for the dairy industry,
and are chosen for prograrnrning convenience.
24
3.2 The OWL editor: Protégé with OWL plug-in
Protégé (Stanford University) is a knowledge-based ontology editor written in
Java, and the OWL plug-in for Protégé customized it as an OWL editor (Noy,
et
al.,
2001, Knublauch,
et al.,
2004). Protégé (with the OWL plug-in) is capable of
 Editing OWL for ontology;
 Visualizing OWL classes, properties and instances;
 Reasoning according to a specifie ontology.
3.3 Visual Basic .Net
Visual Basic .Net is part of Vi suai Studio .Net, which is an integrated
deve10pment environment based on Microsoft .Net Frarnework. Also, VB.Net
becarne an object-oriented prograrnrning language, which ended the debate of
whether VB version 4 to 6 is object-oriented language. The.Net platforrn - the
base of VB.Net - provides a cornrnon type system, in which all data types are
classes or structures that have been defined in the .Net Frarnework Class Library.
This feature allows components, written in different language in Visual Studio
.Net, such as c#, Java, VB.Net, etc., to be interpreted searnlessly. In addition, the
.Net platforrn provides Cornrnon Language Runtime which includes a variety of
services, such as memory management and garbage collection. (Roman,
et al.,
2002). Visual Basic .Net is an extreme1y powerful language and, as part of the
.Net Framework, VB.Net can be used to create a wide range of applications and
components, which include
 Standard windows applications
 WindowslW eb Services
 WindowslW eb controls and controllibraries
 Web (ASP.Net) applications
 Windows console mode applications
Its ability to build Internet applications - an area which has traditionally been
weak - should aiso be mentioned.
25
In this research, VB.Net is used as a programming tool to pro gram the parser
associated with Microsoft Access 2002 as standard windows application. Details
are given in Chapter 4.
3.4 Database: Microsoft Access 2002
A database is a collection of data organized by related tables, columns and rows.
A database itself or its technology is used mostly for computer applications,
especially in the area of data processing. Microsoft Access is one popular
database management system (DBMS) for the PC, meaning that it is desktop­
based. Although Access is limited by the data that it can store and the number of
concurrent users, it is an excellent choice for illustration (Petersen, 2002). One
can choose different kinds of database structure, such as Microsoft SQL Server,
but for the purpose of illustration, MS Access is adequate. In our program,
original data and the ontology parsed from the OWL file were stored in Access
for future reference. Details of the database structure are illustrated in Chapter 4.
3.5 Methods frame
Many commercial and professional softwares require a database to function and
the database is designed according to an ontology in the specific application area.
Direct communication among different databases is technically feasible, but due
to the problem of security and other various reasons, it is seldom used in
industry applications. Instead, agents are used to retrieve and send data as a
method of communication. In addition, the problem of differences in ontologies
(and often in the databases as weIl) becomes a hindrance across programs. In this
thesis, OWL was used to describe a specific ontology and to encode data into a
file. After transmission of the file, it was decoded and saved into
another (potentially different) database.
The proposed methodology sequence is shown in Figure 3-1. Given the available
data and a defined ontology (previous section), subsequent steps involve encoding
the data in an OWL message using Protégé 2.1
1
VB.NET. This message is then
26
ready for transfer over the Internet (within a simple computer application), before
being decoded in a reverse process for further application or use.
Original
Message!
Data
Ontology
D
OWL
Encode
o
[-::::-!
=
.!i~
q========;=:::=::
MesstPg~
Within,:.,~:;:
:.
..................................
;.
Ontology
OWL
Decode
Original
c::==:::::/>I
Message!
Data
Figure 3-1 Encoding/decoding and transferring of message in OWL message
3.5.1 Description of data by RDF
RDF is a language for information representation.
It
is an extension of XML that
adds vocabulary so as to describe c1ass and property, and the re1ationship between
them.
It
is a common framework, and can be exchanged among applications
without loss of meaning.
The following file is a RDF schema file (size: 256 bytes) that describes the c1ass
"Cattle" and "Cows", the property "HasCowID".
<rdfs:Class rdf:ID="Cattle"/>
<rdfs:Class rdf:ID="Cows">
<rdfs:subClassOf rdf:resource="#Cattle" />
</rdfs:Class>
<rdf:property rdf:ID="HasCowID">
<rdfs:domain rdf:resource="#Cattle" />
<rdfs :range rdf:resource=" &xsd;string" />
</rdf:property>
After introducing the definitions of c1ass and property, the instances of the c1ass
can be stated as in the following (size: 68 bytes), in which an instance of c1ass
"Cows" is defined.
It
has CowID of"Primrose".
27
<Cows rdf:ID="PatlqCow1 ">
<HasCowID>Primrose</HasCow ID>
</Cows>
3.5.2 Description of ontology and data by OWL
When using OWL to describe data structure, the coupling effect among different
layers (data layer, business logic layer and presentation layer) (Schach, 2002) or
modules could be decreased. Usually the first step in the construction of a
pro gram begins with the design of the data structure, both database/table structure
and data structure in processing. Once the data structure is designed, programmers
can conceive and build applications based on the data structure and the
application logic. Any small change in data structure often leads to a major
change in the program. There is also what can be described as a "snowball effect"
whereby a change may have significant ramifications on the procedure and logic
of a previous sequence of events. If we use OWL to describe data, it means that
part of the data logic or application logic can be written in sorne document, table
or segment of program. When changes occur in the data structure, the pro gram
itself can adjust according to the change, instead of requiring new code to be
produced by the programmer.
OWL is designed for
computers
to process the information content.
It
is based on
XML and RDF by ad ding sorne important vocabulary (e.g., owl:sameAs,
owl:equivalentClass, etc.) to describe relationships of c1ass, property, etc. In this
way, OWL introduces a logic to XML format.
In relational database, the relationship among tables can be expressed by
connections between fields of different tables. Such a connection can also be
represented in OWL files by referencing the resource ID of the c1ass, property or
instance. So data, that are stored in a database, could be extracted, encoded in an
OWL file, and exchanged without loss of integrity within a computer or across the
Internet.
28
The following is a segment of an OWL file (size: 547 bytes).
In
this exarnple, the
c1ass of "Cattle" and "Cows" are defined, with the latter being defined as a
subc1ass of the former. The property "HasCowNarne" is defined as a function
property of the c1ass "Cattle".
<owl:Class rdf:ID="Cattle"/>
<owl:Class rdf:ID="Cows">
<rdfs:subClassOf rdf:resource="#Cattle"/>
<owl:equivalentClass
rdf:resource= .. http://www.sarnple.comisirnpleCiaq#Cattle .. />
<owl:differentFrorn
rdf:resource= ''http://www.sarnple. cornisirnplePatlq#Bulls"/>
</owl:Class>
<owl: ObjectProperty rdf: ID="HasCattleID ">
<rdfs:dornain rdf:resource="#Cattle"/>
<rdf:type
rdf:resource=''http://www . w3 .org/2002/07 /owl#FunctionalProperty"/>
<owl:equivalentProperty
rdf:resource= .. http://www.sarnple.com/sirnpleCiaq#HasCattleNarne"/>
</owl:ObjectProperty>
After introducing the definitions of c1ass and property, the instances of the c1ass
can be stated as follows (size: 142 bytes), in which an instance of c1ass "Cows" is
defined.
It
hasCowID of "Prirnrose", and shows how the sarne cow is cross
referenced in another OWL file "CiaqCow1".
<Cows rdf:ID="PatlqCow1 ">
<HasCowID>Prirnrose</HasCow
ID
>
<owl:sarneAs
rdf:resource= .. http://www.sarnple.comisirnpleCiaq#CiaqCow1 "/>
29
</Cows>
3.5.3 How to encode data in OWL message
Since an OWL file is also an XML file, the structure of encoding data in short
message-transferring formats is similar to the equivalent procedure for XML
format. For example, the data previously shown in Table 3-2 could be expressed
in the following OWL format, which could, in tUffi, be part of an OWL
file/message:
<rdf:RDF>
<Cow rdf:ID="Marigold">
<hasLastServiceSire>
<Bull rdf:ID="H07118"/>
</hasLastServiceSire>
<hasMilkYield
rdf:datatype=''http://www.w3.org/2001/XMLSchema#positivelnteger"
> 15</hasMilkYield>
<hasLactationNumber
rdf:datatype=''http://www.w3.org/2001/XMLSchema#positivelnteger"
> 3 </hasLactationNumber>
</Cow>
<Cow rdf:ID="Primrose">
<hasMilk Yield
rdf:datatype=''http://www . w3 . org/2 00 1 /XMLSchema#positivelnteger"
> 32</hasMilkYield>
<hasLactationNumber
rdf:datatype=''http://www . w3 . org/2 00 l/XMLSchema#positivelnteger"
> 2</hasLactationNumber>
<hasLastServiceSire>
<Bull rdf:ID="H07442"/>
</hasLastServiceSire>
</Cow>
30
</rdf:RDF>
31
4 Results and Discussion
4.1 OWL files and their parser
4.1.1 OWL file creation by Protégé
While, in theory, any text edit pro gram can create an OWL file, various dedicated
editors are available such as, for example, SWOOP - Semantic Web Ontology
Overview and Perusal, Maryland University (Kalyanpur
et al.,
2005) - or Protégé.
The latter was chosen for this research because of its convenient editing
environment. As has been noted from the definition of OWL, it is subset of XML
and RDF which means that an OWL file can be considered as an XML file and a
RDF file as well. Thus, while the structure ofthe OWL file is similar to XML file,
there are sorne differences:
 More vocabularies are added to describe ontology in OWL, including
"owl:class", "owl:equivalentProperty", "rdfs:subClassOf', etc.
 In OWL, all the nodes are embedded in the node of rdf:RDF, which also
includes namespace declaration in its attributes.
 OWL declares sorne information about this ontology, such as importation,
prior version, label, comment, etc.
In Protégé, an option exists to save output of OWL file expression in a format
called RDF/XML, which uses a lot of Restriction to describe ontology. In this
format, the structure is not clear for editing work, and is relatively difficult for
programs to parse it. In this research, therefore, RDF/XML-ABBREV was used
for the output format (see the Appendix for greater detail).
The following three screen captures are the structures of classes, properties and
sample relationships in Protégé.
32
'.::. ValuePartition
tf'
@Sex
(f)female
(S)
male
~f)Animal
t
Cattle
(~)
Bull
. (,9
Cow
; ..
(~\
Heifer
~\Goat
(S)
Sheep
CS)
Machine
~
t
CS)
Human
(~)
Farmer
(6) Land
'È)
Farm
'<f)
Cattle
(,sîj'i!il\j1
(s5~;;'
<\Il
Heifer
(9
Goat
L.Ç)Sheep
(E)Machine
(S;Human
(ç}
Land
(g)Farm
[ID
hasArea
[ID
hasATQID
[ID
hasBarnName
[ID
hasBirthday
[ID
hasBodWVeight
[Q]
hasChiid
[ID
hasGender
[ID
hasMilkYield
[Q]
hasOwner
t
[Q]
hasParent
++
f,asChiid
. [Q]
hasBirthFather
[Q]
hasBirthMother
[Q]
hasProperty
[Q]
hasSex
[QjhasTelNo
[Q]
isPartOf
Figure 4-1 Screen captures of structures of classes and properties in Protégé
33
Both Protégé 2.1 and SWOOP 2.1 have problems in defining namespaces and
importing ontology, which is important in ontology merging.
It
is proposed in this
research, therefore, that a more correct way to define them would be the
following:
Step 1; define a namespace in the RDF attribute, such as
XMLns:vbtest= .. http://www.sample.com/vbtest.OWL# ..
Step 2; define the "imports" node
<OWL:Ontology rdf:about="">
<OWL:imports
rdf:resource= .. http://www.sample.comlvbtest.OWL .. />
</OWL:Ontology>
4.1.2 OWL file simple parser
An
OWL parser is a pro gram that analyzes or separates an OWL file into more
easily processed components. Before understanding data, the two communication
ends should know how the data are encoded, which is part of ontology carrying in
an OWL file. Only after getting the ontology from the OWL file can data, which
are encoded by OWL, be processed. The ability of self description by OWL
implies a certain level of artificial intelligence, and this could be strengthened by
reasoning tools, such as Racer RICE - RACER Interactive Client Environment­
(Haarslev
et al.,
2001). Using VB.Net an OWL parser was programmed that is
embedded in the sample pro gram for reading ontology from an OWL file to a
database. This occurs after the editing of the OWL file in Protégé or SWOOP.
There are sorne OWL parsers that have been realized in an OWL editor, such as
Protégé and SWOOP, but they are written in Java, and cannot, therefore, be used
directly in VB.Net.
The steps of the pro gram are listed below:
34
Step 1: Read the whole OWL file once, to obtain the names of all classes and
properties, since the relation description needs the reference to the
existing classes and properties;
Step 2: Read the whole OWL file for the second time, to get the details of
class/property descriptions, and fill them into the database; (Sorne
descriptions may fall into anonymous class descriptions in the database);
Step 3: Account for Equivalent classes, which are IntersectionOf, UnionOf, etc.;
Step 4: Account for Property,
- add it to its domain class (PropertyList column in the Class table);
- extend it to its domain sub classees);
- program the SubPropertyOf value;
Step 5: Update the Instance table by adding property fields;
Step 6: Fill the Instance table by re-reading the complete OWL file again or by
using another data file/message encoded in OWL;
Step 7: Deal with child classees).
Once the OWL parser has been programmed, one can import the OWL file,
including ontology and data. The pro gram will show the parsing result by listing
the classes, properties, and instances in a text box, and by listing the ontology
record in a data grid.
The following three figures are three screen captures of the parser. From these it
can be seen that, after loading and parsing the OWL file, the data can be stored in
the database and shown in the window.
35
(null)
(null)
(nuit]
(null)
(nuH)
HumanAnim
MilkRobot.Tr
Chlcken,Goat
Machlneüln
(null)
{nulQ
(m~O
(milO
Anon,llmousC
AnonymousC
bulllestl
H07442
(nuHJ
, Onellf,
.;
~?~~r.~~~
(nullJ
hasBirthMoth
[null)
{null]
(':I
ull)
(null)
(nuit)
hasB~lhMoth
[nu!!)
hasBiflhMoth
(null)
Figure
4-2
A snapshot of the OWL parser-Parse OWL File
Property
hasB irthM other
ha sC hi Id
hasLastS erviceS ire
hasProperly
hasLactationNumber
hasMilkYield
ha sB od}'Weight
hasATQID
hasSex
hasB irthF ather
Class
Animal
Human
Figure
4-3
A snapshot of the OWL parser-Property View
36
,:~~p~~,,~f,
Catlle
(nulJ
Machine
Animal
(nullJ
.Class
.... land
Î:f:J
Human
Animal
Sheep
Chicken
Goat
CalUe
Bull
III
·····Heifer
Maèhine
ValueP artition
Property
hasBirthMother
hasChild
haslastS erviceS ire
haslactationNumber
hasB irthday
hasMilkYield
hasB odyWeight
has,ll,TQID
hasB irthF ather
Insliince
Marigold
CowTest1
Primrose
Figure
4-4
A snapshot of the OWL parser-Class View
4.1.3 OWL database
For the purpose of storing the ontology that has been parsed from the OWL files,
a database was created in Microsoft Access which consisted of several tables:
 Class: to store the classes and their relation arnong classes and properties
which were described in the OWL file;
 Property: to store the properties and their relation arnong classes and
properties which were described in the OWL file;
 Instance: to store the instances which were described in the sarne or
different OWL files;
 Narnespace: to store the namespaces which were used in the description
tags;
37
 Ontology: to store the information regarding the OWL file (e.g., version,
comment, importation, label, etc.).
The details ofthose tables are listed below.
Table 4-1 The table of
Class
Field Name Data Type
ClassName Text,50
EquivalentClass Text, 255
Label Text,50
PropertyList Memo
SubClassOf Memo
InterSectionOf Text, 255
Description and Comments
 Unique class name
 This field is the primary key of the
table
 It
could be anonymous c1ass name,
such as AnonymousClassOOl, etc.
 The rule to name the anonymous
c1ass is in the program.
 If more than one, they should be
separated by ","
 Could be anonymous class
(unnamed)
 Could be union, intersection, etc.
of classes.
 The label of the class in different
language and its value in that
language, separated by ","
 AlI the properties that this class
has, including those inherited from
super class
 If more than one, their name
should be separated by ","
 List ofits super class's name
 If more than one, the names should
be separated by ",".
 Class name list
38

Separated by ","

Usually associated with
anonymous class

List of aIl the members of that
OneOf Memo
class (enumerate class)
DisjointWith Memo

Class name list, separated by ","
UnionOf Memo

Class narne list, separated by ","
ComplementOf Text, 255

Class name
AnonyrnousOnProperty Text,50

Restriction on which property

Could be "Sorne ValueFrom,
AnonyrnousOnField Text,50
AIlValueFrom, MinCardinality,
MaxCardinality, Cardinality,
HasValue"

The value of the restriction on
property
AnonyrnousOnContent Text,50

It
could be a class
(someValueFrom, AllValueFrom),
or data type value (2, -2, 4.6, etc.)

The class list of its subclass

If they are recorded here, it will be
SuperClassOf Memo
efficient in processing of property
and instance.
ChildClass Text, 255

The direct child of this class
Table 4-2 The table of Property
Field Narne
Data Type Description and Comments

Unique Property name, so this
PropertyNarne Text,50 field is the primary key of the
table.
ProDomain Text, 255

Class name list, separated by
39
" "
,
.

Class name list, separated by
ProRange Text,255
" "
,
.
InverseOf Text,255

Property name

Whether this property is
S ymmetricProperty Yes/No
symmetric property or not

Whether this property is
FunctionalProperty
Yes/No
functional property or not

Whether this property is
TransitiveProperty Yes/No
transitive property or not

Inverse functional property of
InverseFunctionalProperty Text,255 which property, could be
property list, separated by ","

Property name list, separated
EquivavlentProperty Text,255
by","
SubPropertyOf Text,50

Property name
Table 4-3 The table of Instance originally before modification
Field Name Data Type Description and Comments
InstanceID Text,50

The identification of instance
FromClass Text,50

Class Name

Instance ID, ifmore than one,
SameAs Text, 255
separated by ","

Instance ID, if more than one,
DifferentFrom Text, 255
separated by ","

Instance ID, if more than one,
AllDifferent Text, 255
separated by ","
NOTE: the fields listed above are the basic fields in the table of Instance. When
the table of Property is filled, the structure of Instance will change, i.e., several
40
fields will be added to the table of Instance. In the example case from this
research, it changed to the following:
Table 4-4 The table of Instance after modification
Field Name Data Type Description and Comments
InstanceID Text,50

The identification of instance
FromClass Text,50

Class Name

Instance ID, if more than one,
SameAs Text, 255
separated by ","

Instance ID, ifmore than one,
DifferentFrom Text, 255
separated by ","

Instance ID, if more than one,
AllDifferent Text, 255
separated by ","
hasBirthMother Text, 255
hasChild Text, 255
hasLastServiceSire Text, 255
hasProperty Text, 255
hasLactationNumber Number
hasBirthday Date/Time
hasMilkYield
Number
hasBodyWeight
Number
hasATQID Text, 255
hasSex
Text, 255
hasBirthF ather
Text, 255
Table 4-5 The table of namespace
Field Name Data Type Description and Comments

The short name used to represent the
ShortName Text,50
long name in the OWL file
LongName Text, 255

The original URI of the resource
41
Table 4-6 The table of ontology
Field Name Data Type Description and Comments

The field about imports, comments,
FName Text,50
labels, previous versions, etc.
FContent Memo

The field content
4.2 Comparison
of
different encoding data formats
There are various ways of storing data, depending on needs related to size,
retrieval and function. A flat file is sequential. This means that if a pro gram needs
the last "item" that is listed in the file, it must read through the entire file before
retrieving it. CSV files are a class of flat file in which sorne ASCII character (e.g.
",") are used to encode data (see Chapter 2).
It
could be used in data or file
exchange or in messages among applications within a computer or network.
It
is
simple and easily readable from a human point of view, although it may lack
sorne optimization for specific computer applications and needs (i.e., it is not
necessarily the most efficient format for data storage and exchange).
A database is a weIl developed system for data storage and retrieval, which is
efficient, safe and consistent in saving and querying data. In a database, a
"pointer" is used to indicate the address of a record, and, unlike the case of the
flat-file format, is used to acquire the content directly. Compared with other types
of data storage formats/frames, a database is more efficient at both reading and
writing data.
It
is a good choice for data storage and exchange within a unit.
XML provides new ways to store and exchange data.
It
has been widely used in
computer applications. Sorne of its key advantages are discussed in the following
section.
42
4.2.1 XML's advantages over fiat files for data storage and
exchange
 XML allows for structure. With XML, the embedding of one element in
another declares the structure of the data, just as fields and tables in
database. But the embedding of elements can be deeper than in databases
such that the structure of the table can easily be expressed by XML. In an
XML schema the element sequence can be defined as ordinal or not. Aiso
the absent or increase of a field is not as important as with text file:
because of the use of tags, the interpretation is not easily subject to
mismatching of data or fields.
 XML tags are not predefined. One may define one' s own tags based on
a specific application in the XML Schema, and they usually indicate the
content of the node. For exchange of data among different applications, a
common understanding should be reached regarding the tags, including
the name, data type, etc. In this way, it is easier for hum ans and computers
to parse the XML, despite the fact that an XML file is much bigger than a
CSV file.
 XML can easily encode in different languages. With special attribute -
xml:lang - one can specify the language used in the contents and attribute
values of any element in an XML document.
 XML has the option of respecting element sequence. In DTD or XML
Schema, one has the choice to decide whether XML file is sensitive to
element sequence or not.
 XML parsers are freely available. With a fiat file, a specific program is
needed to read the file before further data processing and transfer. This
pro gram must know the record layout. Any change in the record layout
may lead to a change in the program. On the other hand, XML parsers are
43
fteely available, and since the structure of the document is embedded in
the document itself, the parser is still functional even if the structure of the
XML document changes.
 XML fIle has non-linear access to text/data.
After being read into
memory, an XML file is expressed in a data structure provided by the
XML parser. Retrieval of the specific content of a tag in an XML file is
more efficient than in a fiat file. AIso, researchers are working on XML
query language for XML documents, corresponding to SQL to database
(Afonso de Sousa
et al.,
2002), which is believed will increase the
retrieval performance even more.
 XML format is easy to change to other formats.
An XML document
can be changed to other formats by using XSL Transformations (XSLT),
such as transformation to HTML (for a web browser). Also the same XML
document may look differently by simply applying different style sheets
(i.e., instructions on how to display the document on a web page).
Although the style of XML allows data query in a manner somewhat similar to
that of a database, it is more time consuming. AIso, the size of XML files or
messages has more strict limitation than databases. For example, the size of an
XML file or message could be 64MB, but not 64GB (as could easily be the case
with a database).
As was seen in Chapter 2, an OWL file is also an XML file. By adding more
vocabularies, OWL can describe simple logic, such as "subClassOf', "sameAs",
etc. This helps greatly for information search on the web. However, most of the
logic should be programmed in the application. Sorne difference between XML
and OWL are highlighted in the following section.
4.2.2 Differences between the XML and OWL format
 tags
44
In XML, there are no restrictions, generally speaking, on the naming of
tags. However, names of the columns in the database are often used as the
corresponding tag names. This depends on how the programmer appHes
the ontology and logic in the program.
In OWL, the tags are named according to the mapping between column
names in the database, but may include prefix of "has" or "is" for the
property name. Usually, programs are based on ontology and logic. In
OOP (Object-Oriented Programming), the first step is to design the class
models, according to the ontology and logic. This is followed by database
designation, if needed and finally, programming. The database in the
second step is different than the database used in this thesis' example that
stores the information contained in OWL. In this database, classes and
properties usually have the same name as in the OWL file, but slightly
modified by using the prefix of "has" or "is" for the property name.
 data type
In XML, the data type of property (corresponding to column name in a
database) might be indicated in another XML Schema file or DTD file.
The relation between property and class is only expressed by way of
record embedding.
In OWL, the data type of a property is shown in the tag' s attribute and the
relation between the property and the class is defined in the property' s
declaration.
 value
In XML, the value of property is always in string format, subject to how
the pro gram parses the value according to the definition in XML Schema.
In OWL, the value of DatatypeProperty is expressed in string format, but
it
has been clearly declared how to parse it in the tag's attribute. And in
45
ObjectProperty, the value is a reference to another object which
lS
represented there by a child node.
 reasoning
In XML, one can find Httle space for reasoning. But in OWL, the
declaration of property with "domain" and "range" makes property fit
certain subjects and objects only. One can check instance records with the
declaration. Aiso the reasoning is specifically expressed in the class
declaration. For example, "cow" is defined as "female cattle" and subclass
of "cattle" (sufficient and necessary condition). In this case, if an instance
of "cattle" is defined, and its sex is designated as female, the program
would "reason" that this was an instance of "cow" also. In another
example, if "hasBirthday" is a property of the domain "cattle", and "cow"