Eslide Sequence #23

sounderslipInternet and Web Development

Oct 22, 2013 (3 years and 5 months ago)


Artificial Intelligence and Lisp

Eslide Sequence #23

Knowledge Acquisition and Ontologies

LiU Course TDDC65

Autumn Semester, 2010

Topic of this Eslide Sequence:

What are the methods for organizing

large volumes of facts and knowledge

of the kind that is needed in many practical

Artificial intelligence systems

Quantities involved (examples)


500.000 articles,
40.000.000 words in 30 volumes

Nationalencyklopedin: 172.000 articles

Yongle encyclopaedia:
~2.000.000 articles

language Wikipedia: 3.052.283 articles
05) with > words

Dbpedia: ~ 420.000 nodes

1. Semantic Web

Term coined by Tim Berners

Vision: WWW as a universal and active medium for
information exchange using software agents

Commercial interpretation: a network of interacting
service providers

Advancing technology interpretation: the meaning of
information and services on the web is defined,
making it possible for the web to understand and
satisfy the requests of people and machines to use the
web content (as opposed to the hypertext web)

Obtaining the semantic information

for the semantic web

web: large, organized project for building
a universal knowledgebase (Cyc project)

Approach in the first stage of semantic web: semantic
annotation in conventional web pages

Current approaches: 1. build knowledgebases by
tapping and processing large knowledge sources that
are available on the conventional web that have been
designed structurally, e.g. dbpedia, wordnet

and 2. download and reverse
engineer large
collections of specialized information on the web

Example from wordnet (transformed to Leonardo notation):




[: type synset]

[: has
lexes {ferocity.n0 fierceness.n0 furiousness.n0 fury.n0 vehemence.n0

violence.n0 wildness.n2}]

[: explain “the property of being wild or turbulent; 'the storm's violence' “]

[: synset
offset "04978805"]

[: lex
filenum "07"]

[: wordnet
links {[: subclass
of {intensity.n0}]

[: has
derivations {angry.s0 violent.s3 violent.a0 vehement.s0 angry.s0

ferocious.s0 angry.s0 angered.s0 ferocious.s0 boisterous.s0 cutthroat.s0

fierce.s0 ferocious.s0}]

[: has
subclasses {savageness.n0}]}]

[: wordnet
origlinks {[has
hypernyms {intensity.n0}]

from {angry.s0 violent.s3 violent.a0 vehement.s0 angry.s0

ferocious.s0 angry.s0 angered.s0 ferocious.s0 boisterous.s0 cutthroat.s0

fierce.s0 ferocious.s0}]

hyponyms {savageness.n0}]}]

Example from dbpedia (transformed to Leonardo notation):




[: type scientist]

[: fullname Kepler.Johannes]

[: source
entities {[: wiki w.Johannes_Kepler]

[: wordnet Kepler.n0]}]

[: given
names <"Johannes">]

[: family
name "Kepler"]

[: date
birth [GregCal 1571]]

[: date
death [GregCal 1630]]

[: explain
seq <"German astronomer who first stated laws of

planetary motion">]

[: in
classes {astronomer.n0}]

[: in
disciplines {Astronomy Astrology Mathematics


[: studied
at {w.University_of_Tübingen}]

[: worked
at {w.University_of_Linz}]

CIA (U.S. Central Intelligence Agency) webpage:

The World Factbook

provides information on the
history, people, government, economy, geography,
communications, transportation, military, and
transnational issues for 266 world entities. Our
Reference tab includes: maps of the major world
regions, as well as Flags of the World, a Physical Map
of the World, a Political Map of the World, and a
Standard Time Zones of the World map.

CIA Factbook, example


Chiefs of State and Cabinet Members of Foreign Governments

Date of Information: 7/23/2009


Prime Min. Fredrik REINFELDT

Dep. Prime Min. Maud OLOFSSON

Min. of Agriculture, Food, & Fisheries Eskil ERLANDSSON

Min. of Culture Lena Adelsohn LILJEROTH

Min. of Defense Sten TOLGFORS

Min. for Education Jan BJORKLUND

Min. for Employment Sven Otto LITTORIN

Min. of Enterprise & Energy Maud OLOFSSON

Min. of Environment Anders CARLGREN

Min. of European Affairs Cecila MALMSTROM

Min. of Finance Anders BORG

Min. of Foreign Affairs Carl BILDT

Min. of Foreign Trade Ewa BJORLING

Min. of Health & Elderly Care Maria LARSSON

Min. for Higher Education & Research Tobias KRANTZ

Min. of Infrastructure Asa TORSTENSSON

Example of (2), European University Association

(around 1000 items in their list of members):

AGH University of Science and Technology


Hutnicza im.Stanislawa Staszica krakowie

Krakow l Poland l

Individual full member

Agricultural University of Athens


Athinai l Greece l

Individual full member

Akdeniz University

(Akdeniz Üniversitesi)

Akdeniz Üniversitesi

Antalya l Turkey l

Individual full member

Alexander Dubcek University, Trencin

Trencianska univerzita Alexandra Dubceka v Trencíne

Trencin l Slovakia l

Individual Associate Members

Copyright issues for knowledge acquisition
using sources on the www

Which information is covered by copyright (including

Can there be other kinds of proprietary restrictions?
(Explicit or implicit contracts, EU database directive)

Do these restrictions only apply to redissemination of
information content, or also to download for use in your
own project?

What are the rules if the downloaded information is
integrated with other information and then redisseminated?

What are the rules if the downloaded information is merely
used as an instrument for the processing of other

Semantic web today

A set of design principles

A number of working groups, in particular within the
WWW Consortium

A number of proposed enabling technologies:

Resource Description Framework (RDF)

Data Interchange Formats: RDF/XML, N

Notations, e.g. Web Ontology Language (OWL)

Software systems supporting these, e.g. Protégé

Published knowledgebases using the above

Knowledge Representation in

semantic web work (so far)

Rely on notational look
feel of XML

Strong emphasis on a network representation
consisting of nodes and arcs

Use of a subsumption relation in such networks,
relating a more general and a more specialized

Some notations and systems also use logic
formulas for characterizing other kinds of
restrictions on admissible network structures and
other kinds of information about the domain.

Web Ontology Language (OWL)

The OWL Web Ontology Language is designed for use
by applications that need to process the content of
information instead of just presenting information to
humans. OWL facilitates greater machine interpretability
of Web content than that supported by XML, RDF, and
RDF Schema (RDF
S) by providing additional vocabulary
along with a formal semantics. OWL has three
expressive sublanguages: OWL Lite, OWL
DL, and OWL Full.


Namespace declaration


xmlns ="

xmlns:vin ="

xml:base ="


xmlns:owl =""

xmlns:rdf ="


xmlns:xsd ="">

The first two declarations identify the namespace associated with this

ontology. The first makes it the

namespace, stating that unprefixed
qualified names refer to the current ontology. The second identifies the
namespace of the current ontology with the prefix
. The third identifies
the base URI for this document (see
). The fourth identifies the
namespace of the supporting food ontology with the prefix
. The fifth
namespace declaration says that in this document, elements prefixed with

should be understood as referring to things drawn from the namespace

This is a conventional OWL declaration, used to introduce

the OWL vocabulary.

Ontology headers

<owl:Ontology rdf:about="">

<rdfs:comment>An example OWL ontology</rdfs:comment>

<owl:priorVersion rdf:resource=


<owl:imports rdf:resource=


<rdfs:label>Wine Ontology</rdfs:label>


Classes and Things

<owl:Class rdf:ID="Winery"/>

<owl:Class rdf:ID="Region"/>

<owl:Class rdf:ID="ConsumableThing"/>

<owl:Thing rdf:about="#CentralCoastRegion">

<rdf:type rdf:resource="#Region"/>


Defining and using properties

<owl:ObjectProperty rdf:ID="madeFromGrape">

<rdfs:domain rdf:resource="#Wine"/>

<rdfs:range rdf:resource="#WineGrape"/>


<owl:ObjectProperty rdf:ID="course">

<rdfs:domain rdf:resource="#Meal" />

<rdfs:range rdf:resource="#MealCourse" />


<owl:Thing rdf:ID="LindemansBin65Chardonnay">

<madeFromGrape rdf:resource="#ChardonnayGrape" />




[: type Thing]

[: madeFromGrape ChardonnayGrape]

Class subsumption; restrictions on properties

<owl:Class rdf:ID="Wine">

<rdfs:subClassOf rdf:resource="&food;PotableLiquid"/>



<owl:onProperty rdf:resource="#madeFromGrape"/>

<owl:minCardinality rdf:datatype=







subexpression represents an “anonymous” class. It imposes

the condition that each instance of the type

must have at least one link


Essential points about OWL

Represents binary relations between entities

Well developed machinery for managing name
spaces, versions, and the like which is considered as
necessary for large knowledgebases with distributed
contents and distributed development activity

Relies on XML syntactic tradition

Knowledge modules are organized as documents,
somewhat analogous to computer programs: entities
are “declared” before they are “used”

Not easily readable; graphic interfaces are required for
practical work with the notation

Essential point about ontology
representation languages and systems

Major issues: subsumption hierarchies, restrictions on
admissible structures, information that makes logical
inference possible

Other issues that are needed in practice:


Administration of modules: comments, version
information, author and IPR information, etc.

A supertype system, e.g.


in OWL

A conventional type system

Issue: how to relate subsumption, supertypes, and types?

2. Ontologies

Practical definition: that which can be
expressed using an ontology language or
ontology representation system

Some contenders:

SUMO (Suggested Upper Merged Ontology)

CYC, OpenCyc, ResearchCyc

SUMO (Suggested Upper Merged

The Suggested Upper Merged Ontology (SUMO) and its
domain ontologies form the largest formal public ontology in
existence today. They are being used for research and
applications in search, linguistics and reasoning.

SUMO is the only formal ontology that has been mapped to all
of the WordNet lexicon.

SUMO is written in the SUMO
KIF language.

SUMO is free and owned by the IEEE. The ontologies that
extend SUMO are available under GNU General Public

Adam Pease is the Technical Editor of SUMO.

(From )

I. Geography Terms for the CIA World Fact Book

;; A. Location

;; B. Geographic coordinates

;; C. Map references

;; D. Area

;; E. Area


;; F. Land boundaries

;; G. Coastline

;; H. Maritime claims

;; I. Climate

;; J. Terrain

;; K. Elevation extremes

;; L. Natural resources

;; M. Land use

;; N. Irrigated land

;; O. Natural hazards

;; P. Environment

current issues

;; Q. Environment

international agreements

;; R. Geography


;; II. General Geography Terms and Background

;; A. Planet Geography & Astronomical Bodies

;; B. Directions and Distances

;; C. Land Forms

;; D. Water Areas

;; 1. Oceans & Seas

;; 2. Tides & Currents

;; 3. Water Subregions

;; 4. Fresh Water Areas

;; E. Coastal and Shoreline Areas

;; F. Air and Atmosphere

;; G. Weather & Climate

;; H. Vegetation and Biomes

;; I. Natural Disasters

;; J. Environmental Areas of Concern

(subclass SubtropicalDesertClimateZone DesertClimateZone)

(documentation SubtropicalDesertClimateZone


"&%SubtropicalDesertClimateZone is a subclass of

&%DesertClimateZone that is characterized by an

average temperature greater than 18 degrees Celsius,

as well as very low rainfall. This is Koeppen system



(and (instance ?AREA DesertClimateZone)

(subclass ?MO Month)

(averageTemperatureForPeriod ?AREA ?MO ?TEMP)

(greaterThan ?TEMP

(MeasureFn 18 CelsiusDegree) ))

(instance ?AREA SubtropicalDesertClimateZone) )

(subclass LandlockedWater BodyOfWater)

(documentation LandlockedWater EnglishLanguage "&%LandlockedWater includes

water areas that are surrounded by land, including salt lakes, fresh water lakes,

ponds, reservoirs, and (more or less) wetlands.")

; need a way to say that the body of water is surrounded by land (e.g., perimeter)

(subclass SaltLake SaltWaterArea)

(subclass SaltLake LandlockedWater)

(documentation SaltLake EnglishLanguage

"&%SaltLake is the class of landlocked bodies of salt water, including those

referred to as 'Seas', e.g., the &%CaspianSea. But note that the

&%MediterraneanSea is a &%Sea.")

(instance CaspianSea SaltLake)

(names "Caspian Sea" CaspianSea)

(instance AralSea SaltLake)

(names "Aral Sea" AralSea)

(instance GreatSaltLake SaltLake)

(names "Great Salt Lake" GreatSaltLake)

(geographicSubregion GreatSaltLake Utah)

(instance DeadSea SaltLake)

(names "Dead Sea" DeadSea)


(instance GulfOfOman Gulf)

(instance GulfOfOman SaltWaterArea)

(names "Gulf of Oman" GulfOfOman)

(connected StraitOfHormuz GulfOfOman)

(connected GulfOfOman ArabianSea)

(meetsSpatially Iran GulfOfOman)

(meetsSpatially Oman GulfOfOman)

(instance GulfOfAden Gulf)

(instance GulfOfMexico Gulf)

(instance PersianGulf Gulf)

These are all the instances of

in the file

3. Cyc and OpenCyc

Started in 1984 as a massive project to build up a

Main knowledgebase is proprietary; subsets are open in
general, or open for research

In later years the project has extended in the directions of
language dialog and machine learning

OpenCyc (2008): 47.000 concepts, 306.000 links

ResearchCyc (2006): logic base, additional software tools

Read the Wikipedia article about the Cyc project and
check out the Cyc webpage!

Cyc, examples of notation

(#$isa #$BarackObama #$UnitedStatesPresident)

(#$genls #$Tree
ThePlant #$Plant)

(#$capitalCity #$France #$Paris)



(#$isa ?OBJ ?SUBSET)

(#$genls ?SUBSET ?SUPERSET))

(#$isa ?OBJ ?SUPERSET))

Cyc knowledgebase

Knowledge elements are expressed as
propositions (predicate and argument)

Cyc provides a collection of

are like entityfiles in Leonardo, but consisting of
propositions and (sometimes) more complex
logical expressions

4. Protégé


is a free, open source

editor and a
knowledge acquisition system. Like
, Protégé is a
framework for which various other projects suggest plugins.
This application is written in

and heavily uses

create the rather complex user interface. Protege recently has
currently 126.870 registered users (17.135, 3.293).

It is developed at Stanford University in cooperation with the
University of Manchester and others.

Protégé: one platform, two ontology frameworks

Protégé frames: uses Open Knowledge Base Connectivity
Protocol (OKBC). Organized around classes, instances of
classes, and slots that are specific to classes. (Classes ~
types in Leonardo, slots ~ attributes).

Protégé OWL: uses Web Ontology Language (OWL).

Protégé has one editor for each of the frameworks and in a
common platform.

OKBC is an older approach; OWL is a more recent design
and is supported by W3C

Browse the Protégé tutorial for one or the other of these
two editors (see Protégé website; will also be linked from
the course website)

Approaches to ontologies and knowledgebases

Delivered as formal documents. The user
chooses how to integrate this information in his
or her system (SUMO)

Delivered as a monolithic knowledgebase that
can be queried over the Internet or downloaded
as a whole (Dbpedia, Freebase)

Delivered as a software system together with a
more or less modular knowledgebase that can
be accessed and modified using the system
(Protégé, Cyc, Wordnet)