Ontologies in Document Exchange

sounderslipInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

54 εμφανίσεις

IN350 Summary and
Overview

Judith Molka
-
Danielsen

Nov.28.2003

Major Topics


What is Content Management?
What is Document
Management and Information Steering?(2003)



What is the role of Markup Languages in Content
Management?
Document Properties and Markup
Languages

(Text + Multimedia Languages +
Properties, Ch. 6, Baeza
-
Yates)


File Organization and Storage Structures, Connolly).


Text Properties, Zipf's Law, Heap's Law

(Text
Operations, Ch.7, Baeza
-
Yates).


Text Operations ch. 7 (doc)



Search Enhancements (older notes) (htm)

and
Compression of text
. (Compressing, Ch. 7, Cyganski).


Oracle Text Operations
...
Creating an Index
And
Types
of Indicies

Major Topics


Retrieval Evaluation Measures
, with
Precision and
Recall
. (Read: Retrieval Evaluation, Ch.3, Baeza
-
Yates).


The role of taxonomies
in content management.


Searching the Web

(Read: Searching on the Web,
Ch.13, Baeza
-
Yates).


Multimedia Management

(Read: Image Compression,
Ch.8, Cyganski, and Digital Video, Ch. 9, Cyganski).


Data Warehousing

(Read: Data Warehousing, Ch. 13).


Large Capacity Storage, Ch. 17, Cyganski).


Document Publishing and Distribution

and older notes
on
Online Publishing



B2B e
-
commerce standards for document exchange


Ontologies in Document Exchange


COEUR
-
SW program

Ontologies

From the IDILecture series

Reference:
Jon Atle Gulla


Nicola Guarino:Formal Ontology and Information Systems.

Robert Jasper and Mike Uschold: A Framework for Understanding and
Classifying Ontology Applications.

Ontology ABC


Ontology attracts attentions across many fields in
computer science recently.


There exists no consensus definition about ontology.


One most cited is “
Ontology is an explicit representation of
a conceptualization, the conceptualization includes a set
of concepts, their definition and inter
-
relationships
”.


In many cases, the term ontology is another name
denoting the result of familiar activities like conceptual
analysis and domain modeling.


The roles of ontology vary from knowledge management to
semantic interoperability.


One important reason for that ontology attracts so much
attention recently is the semantic web, since ontology is
considered the key enabler of semantic web.



More terminology


Ontology: engineering artifact


Constituted by a vocabulary (concepts, relations)


Assumptions about intended meaning


Formalization:


Logical theory accounting for the intended
meaning of a formal vocabulary


Committed to a particular conceptualization of the
world


Ontology vs. conceptualization


Conceptualization is language
-
independent


Ontology is language
-
dependent

Example 1. Ontology of
American Universities


SHOE ontology of university concepts

<?xml version = “1.0” encoding=“ISO
-
8859
-
1” standalone=“no” ?>

<!DOCTYPE ontology SYSTEM “http://…/onto.dtd”>

<ontology id=“university
-
ont” version=“2.1” description=“…”>


<def
-
category name=“
Department



isa=“EducationalOrganization”


short=“university department />

<def
-
category name=“
Activity



isa=“SHOEEntity”


short=“activity />

<def
-
category name=“
Work



isa=“Activity”


short=“work />

<def
-
category name=“
Course



isa=“Work”


short=“teaching course />

….

</ontology>

Example 2. Business Process
Ontology


MIT process handbook

Sell financial service

Sell savings & investment service

Sell loan

Sell management service

Sell account access services

Sell ATM access

Sell telephone access

Sell reserve credit

Sell credit card

Sell installment loan

Sell letter of credit

Sell mortgage

Sell credit line

Sell certificate of deposit

Sell account

Sell mutual funds

Sell retirement plan

Example 3. Hierarchical
Categories?


Can hierarchical

categories be

ontologies?

Conceptualization of medical

domain?

More Confusion


Differences and similarities

Ontology

Thesaurus

Dictionary

Categories

?

The Semantic Web


Goal: Evolve the Web



From sites designed for human consumption


To sites also understandable and usable by computer programs.


What would that do for us?


Query answering rather than document retrieval


Services findable, usable, and composable by automated agents


Information exchange among independently designed programs


The Semantic Web is an extension of the
current web in which information is given
well
-
defined meaning, better enabling
computers and people to work in cooperation.”


How do we get there from here?


For services



Service description


Ontologies to provide intended


meaning of service item.


For documents



Structure, ala XML


Ontologies

to provide intended meaning of terms

XML Describes Document Structure


HTML


Language for describing how to display document content

E.g., tag a word to be displayed in bold or italic


XML


Language for describing the structure of document content

E.g., declare data to be a retail price, a sales tax, a book title, ...


Uniform method for describing and exchanging data using HTTP


Provides a “syntactic schema”


XML allows authors to create their own markup (e.g. <AUTHOR>), which
seems to carry some semantics. However, from a computational
perspective tags like <AUTHOR> carries as much semantics as a tag
like <H1>. A computer simply does not know, what an author is and how
the concept author is related to e.g. a concept person.

Bibliographic Entry in XML

<Publication URL = "ftp://db.stanford … xml.ps”>


<Title> From Semistructured Data ... Language
</Title>


<Author> R. Goldman </Author>


<Published> Proceedings of ... Databases
</Published>


<Location>


<City> Philadelphia </City>


<State> Pennsylvania </State>


</Location>


<Date>


<Month> June </Month>


<Year> 1999 </Year>


</Date>

</Publication>

Location of what?

When in June?

XML Is Not Enough


Language for describing the structure of document content

E.g., declare data to be a retail price, a sales tax, a book title, ...


Uniform method for describing and exchanging data using HTTP



Provides a “syntactic schema”


Provides no means of specifying intended meaning of tags


Ontologies enable independently developed programs to exchange data


Ontologies specify intended meaning in a computer interpretable form

W3C Semantic Web Activity


Semantic Web Activity (http://www.w3.org/2001/sw/)


“Established to serve a leadership role, in both the design of
enabling specifications and the open, collaborative development of
technologies that support the automation, integration and reuse of
data across various applications.”


Successor to the W3C Metadata Activity


RDF Core Working Group (http://www.w3.org/2001/sw/RDFCore/)


Responsible for the Resource Description Framework (RDF)


Web Ontology Working Group (http://www.w3.org/2001/sw/WebOnt/)


Charter: Build upon the RDF Core work a language for defining
structured web based ontologies which will provide richer
integration and interoperability of data among descriptive
communities


Developing Ontology Web Language (OWL)


Based on DAML+OIL, developed in DARPA’s Agent Markup
Language program

Resource Description Framework


A simple representation language for describing Web resources


All sentences are triples of the form “(Property Subject Object)”


Property is a binary relation


Subject is a URI reference


Object is either a URI reference or a literal

E.g., (creatorOf http://www.w3.org/Lassila “Ora Lassila”)


XML external syntax


Model theoretic semantics


Includes a resource “Class” and properties “type”, “subclassOf”, etc.


Supports classes of resources and literals

E.g., (type Elephant Clyde)


Supports subclass hierarchies

E.g., (subclassOf Elephant Mammal)


Like a primitive frame representation language

RDF


Classes


Resource


Property


Literal


Statement


Container



Bag


Seq


Alt


Properties


type


subject



predicate



object


RDF Schema


Properties


subClassOf


subPropertyOf


seeAlso


isDefinedBy


comment


label


range


domain


member


Classes


Class


ContainerMembershipProperty

Resource

Class

Property

ContainerMembershipProperty

Literal

Container

Statement

Bag

Seq

Alt

RDF
-
S Class and Property
Definitions

<rdf:Class ID="
MotorVehicle
">


<rdfs:subClassOf rdf:resource="http.../PR
-
rdf
-
schema
-
19990303#Resource"/>

</rdf:Class>

<rdf:Class ID="
PassengerVehicle
">


<rdfs:subClassOf rdf:resource="#MotorVehicle"/>

</rdf:Class>

<rdf:Class ID="
Van
"


<rdfs:subClassOf rdf:resource="#MotorVehicle"/>

</rdf:Class>

<rdf:Class ID="
MiniVan
">


<rdfs:subClassOf rdf:resource="#Van"/>


<rdfs:subClassOf rdf:resource="#PassengerVehicle"/>


</rdf:Class>

<rdf:Property ID = "
registeredTo
">


<rdfs:domain rdf:resource = “#MotorVehicle” />


<rdfs:range rdf:resource = “#Person” />

</rdf:Property>

Christine is a
passenger vehicle.

Is Christine a motor
vehicle?

Yes.

Christine is registered
to Arnie.

What is Arnie?

A person.

Comments on RDF and RDF
-
S


Severely lacking in expressive power


Domain and range constraints rather than Value
-
Type

E.g., can’t define class of people all of whose children are male


No cardinality constraints


Particularly important for “exactly 1” and “at most 1”


No decompositions


Particularly important for “disjoint” and “exhaustive”


No axioms


No negation (!)


Not useful for checking consistency

E.g., can’t prove an object is not an instance of a class


Basically a typing system


More powerful ontology representation languages are needed.

Ontology languages

The DAML Program


DAML: DARPA Agent Markup Language



Goal: achieve semantic interoperability between Web pages,
databases, programs, and sensors.


DAML+OIL:


This language gets its strange name because it was created by a Joint
Committee of US and European researchers who were working on two
different, but similar languages.


DAML
stands for the DARPA Agent Markup Language, which is a
project being funded by the US Defense Advanced Research Projects
Agency
--

the same organization that funded much of the original work on
the Internet (which was then called the ARPAnet).



OIL
stands for the Ontology Interchange Language and is developed by a
number of researchers, primarily a group funded by the European Union's
Information Society Technologies Program.


The joint committee created a new language with the best features of
SHOE, DAML, OIL and several other markup approaches. At the time of
this writing, DAML+OIL is the most advanced web ontology language,
and it is expected to provide the basis for future web standards for
ontologies (OWL.



Web site: http://www.daml.org/

DAML+OIL


A representation language for user
-
defined ontologies


An ontology added to RDF and RDF
-
Schema


Specification document:

http://www.daml.org/2000/12/daml+oil
-
index.html


Expressive power analogous to:


Description logics (e.g., CLASSIC)


Monotonic frame languages (e.g., OKBC knowledge model)


Designed in collaboration with the European Community

Designers of the Ontology Inference Layer (OIL)


Basis for OWL, the candidate W3C standard

DAML+OIL Classes

Thing

Restriction

List

Ontology

AbstractProperty

TransitiveProperty

DatatypeProperty

UniqueProperty

UnambiguousProperty

Nothing

DAML+OIL Properties


Equivalence

equivalentTo, sameClassAs,


samePropertyAs


Lists

first, rest, item


Properties

inverseOf


Ontologies

versionInfo, imports


Classes

disjointWith


Defining Non
-
primitive classes

unionOf, disjointUnionOf, intersectionOf,
complementOf, oneOf


Restrictions

onProperty, toClass, hasValue, hasClass,
hasClassQ

minCardinality, maxCardinality, cardinality

minCardinalityQ, maxCardinalityQ,
cardinalityQ

Property Restrictions on Classes

<Class ID = "Person">


<comment> Person is a subclass of objects whose parents are persons.
</comment>


<rdfs:subClassOf>


<daml:Restriction>


<daml:onProperty rdf:resource = “#hasParent” />


<daml:toClass rdf:resource = “#Person” />


</daml:Restriction>


</rdfs:subClassOf>




<comment > Person is a subclass of resources that have one father.

</comment>


<rdfs:subClassOf>


<daml:Restriction>


<daml:onProperty rdf:resource = “#hasFather” />


<daml:cardinality> 1 </daml:cardinality>


</daml:Restriction>


</rdfs:subClassOf>

All objects all
of whose
parents are
persons

All objects that
have exactly 1
father

Person

Formal ontology and information systems


This paper is trying to offer a systematic account
of the central role ontologies may play in
information systems.


Ontology may have impacts for the three main
components of information systems: information
resources, user interfaces and application
programs.


In AI, an ontology is an engineering artifact. In
the simplest case, an ontology describes a
hierarchy of concepts related by subsumption
relationships; in more sophisticated cases,
suitable axioms are added in order to express
other relationships between concepts and to
constrain their intended interpretation.

Kinds of ontologies, depending on level of
generality


Top
-
level ontologies: general
concepts like space, time,
matter, object, event,etc…
which are independent of a
particular problem or domain.


Domain ontogies: the
vocabulary related to a generic
domain (medicine) , by
specifying the terms in the top
-
level ontology.


Task ontologies: describe
generic tasks or activities
(diagnosing or selling)


Application ontologies: describe
concepts depending both on a
particular domain and task.
Application ontology is a
particular knowledge base,
describing facts assuming to be
always true by a community of
users.


Ontology
-
driven information
systems


An IS consists of components of three
different types: application programs,
information resources, and user interfaces.
Ontologies can play a central role here.


Two dimensions for analysis:


Temporal dimension: using ontologies at
development time or run time.


Structural dimension: impact of ontologies on
different IS components.


The structural dimension: impact of
ontologies on IS components


Using an ontology for the database component.


An ontology can be compared with the schema
component of a database.


At development time, the resulting conceptual model
of requirement analysis can be represented as a
computer processable ontology and from there
mapped to concrete target platforms.


Another main use of ontology in development time is
information integration.


At run time, explicit ontologies (run
-
time accessible
database schema) are at the core of the mediation
based approach to information integration.


Using an ontology for the user interface
component.


Allow the user to query and browse the
ontology.


The user can browse the ontology in order to
better understand the vocabulary used by the IS,
being able therefore to formulate queries at the
desired level of specificity.


Another usage is vocabulary detaching: the user
can use his own natural language terms which
are mapped to the IS vocabulary with the help of
the ontology

Impact..

Impact..


Using an ontology for the application
program component.


Application programs encode knowledge in the
form of type or class declaration and
procedures.


The ontological commitment of the program
should be made explicit using ontologies


Further, for the benefits of ease of
maintenance and flexibility, we can turn the
program into knowledge based system.

Conclusion


Ontology driven information system.


Different types of ontologies.


The role of ontology in IS


Time dimension


Development time vs. run time


Structural dimension.


Information resource, user interfaces, and application
program.

Common access to
information


Use ontologies to enable multiple target
applications (or humans) to have access to
heterogeneous sources of information
(ontology based information integration).


Four categories.


Human communication


Data access via shared ontology


Data access via mapped ontology


Shared services

Human communication


Promote common
understanding among
knowledge workers.


Supporting technologies
include ontology editors
and browsers.


Example: the work flow
management coalition
reference documents.


Maturity: library
classification skills have a
long history (KWs sharing
an ontology in the form of
a glossary)

Data access via shared ontology


An ontology can be used
as an interchange format
to enable common
access to operational
data.


Example: Process
Interchange Format (PIF)
and EcoCyc


Maturity: commercial
success exists in some
context, while in others,
the technology is a long
way from being mature.


Difficult to agree on
common ontology

Shared services


Similar to data access via
shared ontology, but
different in the focus of
what is being shared. The
ontology defines
interfaces in multiple
target languages.


Example: Using UML to
create an ontology for
product data
management, this
ontology is then used to
generate interface code
for the client and server.


Maturity: relatively mature

Ontology based search


Use an ontology for
searching an
information repository
for desired resources.


Example: Yahoo


Maturity: Many
commercial internet
portals are beginning to
explore the use of
concepts for ontology
-
based search.



Endeca, Kaidara

Conclusion


The paper presents a framework for understanding
ontology applications.


We studied


The framework


Various ontology application scenarios (use cases).


Ontology as specification


Common access to information


Human communication


Data access via shared ontology


Data access via mapped ontology


Shared services


Ontology
-
based search


Summary


Ontology ABC


Motivation


Semantic web


RDF and RDFS


Brief introduction to state of art ontology
languages.


In depth introduction to one of such languages
-

DAML+OIL


Impact of ontology to information system.


Classification of ontology applications.



What is Content Management?
What is Document Management and Information
Steering?(2003)



What is the role of Markup Languages in Content Management?
Document
Properties and Markup Languages

(Text + Multimedia Languages + Properties,
Ch. 6, Baeza
-
Yates)


File Organization and Storage Structures, Connolly).


Text Properties, Zipf's Law, Heap's Law

(Text Operations, Ch.7, Baeza
-
Yates).


Text Operations ch. 7 (doc)



Search Enhancements (older notes) (htm)

and
Compression of text
.
(Compressing, Ch. 7, Cyganski).


Oracle Text Operations
...
Creating an Index
And
Types of Indicies


Retrieval Evaluation Measures
, with
Precision and Recall
. (Read: Retrieval
Evaluation, Ch.3, Baeza
-
Yates).


The role of taxonomies
in content management.


Searching the Web

(Read: Searching on the Web, Ch.13, Baeza
-
Yates).


Multimedia Management

(Read: Image Compression, Ch.8, Cyganski, and Digital
Video, Ch. 9, Cyganski).


Data Warehousing

(Read: Data Warehousing, Ch. 13).


Large Capacity Storage, Ch. 17, Cyganski).


Document Publishing and Distribution

and older notes on
Online Publishing



B2B e
-
commerce standards for document exchange


Ontologies in Document Exchange


IN350 Document Management and Information Steering