Introduction to Ontology

schoolmistInternet and Web Development

Oct 22, 2013 (3 years and 5 months ago)

47 views

Introduction to Ontology


Barry Smith

August 11,
2012





The problem of (big) data

Some questions


How to find
data?


How to understand data when you find it?


How to use data when you find it?


How
to integrate with other
data?


How to label the data you are
collecting?


How to
build a set of labels for
a new domain that will
integrate well with
labels used in neighboring domains?


Big problem: nearly all of this data is
siloed



3


Sources


Examples of databases containing
person data and data
pertaining to
skills




PersonID

SkillID

111

222

SkillID

Name

Description

222

Java

Programming

ID

SkillDescr

333

SQL

EmplID

SkillName

444

Java

The problem: many, many silos


DoD

spends more than $6B annually developing a
portfolio of more than 2,000 business systems
and Web
services


these systems are
poorly
integrated


deliver redundant capabilities,


make data hard to access, foster error

and waste


prevent secondary uses of data


https
://ditpr.dod.mil/

Based on FY11 Defense Information Technology
Repository (DITPR) data


5


6
/

One road to a solution: Exploit the
network effects of the Web


You build a site.


Others discover the site and they link to it


The more they link to it, the more important and
well known the page becomes (this is what Google
exploits)


Your page becomes important, and others begin to
rely on it


Many people link to the data, use it


New ‘secondary uses’ of the data are discovered




With thanks to Ivan Herman

7


Unfortunately the Web is ruled by anarchy.
However much we try to link web content
together à la google, we will still be left with
many, many siloes.

Photo credit “nepatterson”, Flickr

8


To avoid
silos, data must be available
on
the Web

in a standard
way
.


Use “ontologies” to capture
common
meanings
with logical definitions that are
understandable to both humans and
computers.


using a common language such as OWL
(Web Ontology Language)

The idea of the Semantic Web



Annotate data using ontologies

Source
Term

Ontology Label

Db1.Name

SE.Skill

Db2.SkillDescr

SE.ComputerSkill

Db3.SkillName

SE.ProgrammingSkill

Db1.PersonID

SE.PersonID

Db2.ID

SE.PersonID

Db3.EmplID

SE.PersonID

SE.ComputerSkill

SE.Skill

SE.ProgrammingSkill

SE.ComputerSkill


Inconsistent and idiosyncratic terms used in
source data are associated with single
preferred labels from ontologies



Where we stand today


html demonstrated the power of the Web to
allow sharing of information


increasing availability of semantically enhanced
data


increasing power of semantic software to allow
automatic reasoning over online information


increasing use of OWL in attempts to break down
silos, and create useful integration of on
-
line
data and information

11


Linked Open Data as of September 2010

Ontology success stories, and some
reasons for failure





unfortunately this data is not
really linked

13

Ontology success stories, and some
reasons for failure





14

unfortunately this
data is not really
linked

The result: the more Semantic Technology
is successful, they more it fails to achieve
it goals

the very success of the approach leads to the
creation of ever new controlled vocabularies ,
semantic silos



because multiple ontologies
are being created in ad hoc ways

The Semantic Web framework as currently
conceived yields minimal standardization

Creates semantic siloes

15


Basic Formal Ontology (BFO)

top
-
level architecture used
in over 120 ontology
projects world
wide


Next tutorial in this series: August 18
-
19, 2012

http://ncorwiki.buffalo.edu/index.php/Basic_Formal_Ontology_2.0


People will tell you, all you need is …

17


XML gives you:
processable

tagging
+ syntactic
interoperability

RDF gives you:
net
-
centricity

(URIs for
unique
and consistent naming
), linked data




OWL (Web Ontology Language)
gives you:

RDF +
semantic interoperability, richer
logic

Levels of coordination

but these are just tools
:


they do not rule out stovepipes


they do not prevent redundant efforts


they do not imply high quality ontologies of
the sort that will support reasoning

E
ven if we all speak Irish, thus does not mean that
we all understand each other



18


Warning 1.


OWL implementation is not enough


the issues we face are not only logical, but
also sociological


they are the same issues already endemic in
the database world


database architecture is inflexible


database systems, once distributed, degrade very
quickly; create stovepipes, forking, siloes …


How to ensure coordinated ontology
development over time?

Suggested principles for an
ontologist’s

code of ethics

1.
I hereby swear that I will reuse existing ontology
content wherever possible

2.
I hereby swear that whenever I reuse terms from an
existing ontology, I will keep their original source IDs

3.
I hereby swear that before releasing an
ontology I
will
aggressively test
it in multiple independent
real
-
world applications

4.
I hereby swear that before committing a new term
and definition to an ontology I will always
think first

Some governance principles


Information sharing
: to avoid ontology redundancy and
inconsistency, there must be sharing
of information at
every
stage


Collaborative
development:
where ontology
development needs overlap, the communities involved
must either develop shared resources or agree to a
division of labor


Leverage
of existing resources
: ontology development
should wherever possible involve
reuse of existing
ontologies.


Guiding
role of
subject
-
matter
experts
,
who should be
involved
in the construction and maintenance of all
domain ontology
content

Warning 2.

Ontology is a multi
-
disciplinary enterprise, in
which the same terms are used in conflicting
ways by different communities of ontologies



universal, type, kind, class


instance


concept, model


representation


datum

22

The ontology spectrum (data focus)

glossary
: A simple list of terms and their definitions.


data dictionary
: Terms, definitions, naming conventions and
representations of the data elements in a computer system.


data model (e.g. JC3IEDM)
: Terms, definitions, naming conventions,
representations and the beginning of specification of the relationships
between data elements.


taxonomy
: A complete data model in an inheritance hierarchy where all
data elements inherit their behaviors from a single "super data element".


ontology
: A complete, machine
-
readable specification of a
conceptualization =
conceptual data model

23

The ontology spectrum (reality focus)

glossary
: A simple list of terms and their definitions.


controlled vocabulary
: A simple list of terms, definitions and naming
conventions to ensure consistency.


taxonomy
: A controlled vocabulary in which the terms form of a
hierarchical representation of the types and subtypes of entities in a given
domain.


The hierarchy is organized by the
is_a
(subtype) relation


ontology
: A controlled vocabulary organized by
is_a
and by further
formally defined relations, for example
part_of
.

24

FMA

Pleural

Cavity

Interlobar

recess

Mesothelium

of Pleura

Pleura(Wall

of Sac)

Visceral

Pleura

Pleural Sac

Parietal

Pleura

Anatomical Space

Organ

Cavity

Serous Sac

Cavity

Anatomical

Structure

Organ

Serous Sac

Mediastinal

Pleura

Tissue

Organ Part

Organ

Subdivision

Organ

Component

Organ Cavity

Subdivision

Serous Sac

Cavity

Subdivision

Foundational Model of Anatomy

25

In graph
-
theoretical terms:

Ontology Components:


alphanumeric IDs form nodes of the graph


each node is associated with some single term
(preferred
label)


relationships
between
nodes, such as
is_a

form the
edges of the graph


definitions and synonyms are associated with each
node

26

Entity =
def

anything which exists, including things
and processes, functions and qualities,
beliefs and actions, documents and
software

27

A

515287

DC3300 Dust Collector Fan

B

521683

Gilmer Belt

C

521682

Motor Drive Belt

instances

universals

28

Catalog vs. inventory

Ontology vs. list of items in your warehouse

29

Warning 3.

Do not confuse things with words

and ideas



Level 1: the entities in reality, both
instances and universals


Level 2: cognitive representations of this
reality on the part of scientists ...


Level 3: publicly accessible concretizations
of these cognitive representations in textual
and graphical artifacts

30

Ontology development


starts

with: Level 2 = the
cognitive
representations of practitioners or
researchers

in the relevant domain



results

in:
Level 3
representational artifacts
(
comparable to maps, science texts,
dictionaries)


31

Domain =def.


a portion of reality that forms the subject
-
matter of a single science or technology or
mode of study;





proteomics




HIV




demographics




...




32

Representation =def.


an image, idea, map, picture, name or
description ... of some entity or entities



two kinds of representation:



analogue (photographs)



digital/composite/syntactically structured

33

Class =def.


a maximal collection of particulars referred to by a
general term



the class
A
=def. the collection of all particular
A
’s




where ‘
A
’ is a general term (e.g. ‘brother of Elvis
fan’, ‘cell’)



Classes are on the same level as the instances
which they contain



34

(Scientific) Ontology =def.


a representational artifact whose representational
units (which may be drawn from a natural or from
some formalized language
) are intended to
represent


1. universals in reality


2. those relations between these universals which
obtain universally (= for all instances)



lung

is_a
anatomical structure



lobe of lung

part_of
lung

35

Ontology (
science
)


the science of the kinds and structures
of objects, properties, events,
processes and relations in every
domain of reality

36