Eslide Sequence #23

sounderslipInternet and Web Development

Oct 22, 2013 (3 years and 5 months ago)

79 views

Artificial Intelligence and Lisp

Eslide Sequence #23



Knowledge Acquisition and Ontologies



LiU Course TDDC65

Autumn Semester, 2010


http://www.ida.liu.se/ext/TDDC65/




Topic of this Eslide Sequence:


What are the methods for organizing

large volumes of facts and knowledge

of the kind that is needed in many practical

Artificial intelligence systems

Quantities involved (examples)



Encyclopaedia

Britannica:
500.000 articles,
40.000.000 words in 30 volumes


Nationalencyklopedin: 172.000 articles


Yongle encyclopaedia:
~2.000.000 articles



English
-
language Wikipedia: 3.052.283 articles
(2009
-
10
-
05) with > 1.000.000.000 words


Dbpedia: ~ 420.000 nodes

1. Semantic Web


Term coined by Tim Berners
-
Lee


Vision: WWW as a universal and active medium for
information exchange using software agents


Commercial interpretation: a network of interacting
service providers


Advancing technology interpretation: the meaning of
information and services on the web is defined,
making it possible for the web to understand and
satisfy the requests of people and machines to use the
web content (as opposed to the hypertext web)

Obtaining the semantic information

for the semantic web


Pre
-
semantic
-
web: large, organized project for building
a universal knowledgebase (Cyc project)


Approach in the first stage of semantic web: semantic
annotation in conventional web pages


Current approaches: 1. build knowledgebases by
tapping and processing large knowledge sources that
are available on the conventional web that have been
designed structurally, e.g. dbpedia, wordnet


and 2. download and reverse
-
engineer large
collections of specialized information on the web

Example from wordnet (transformed to Leonardo notation):

---------------------------------------------------------

--

ferocity.n0


[: type synset]

[: has
-
lexes {ferocity.n0 fierceness.n0 furiousness.n0 fury.n0 vehemence.n0


violence.n0 wildness.n2}]

[: explain “the property of being wild or turbulent; 'the storm's violence' “]

[: synset
-
offset "04978805"]

[: lex
-
filenum "07"]

[: wordnet
-
links {[: subclass
-
of {intensity.n0}]


[: has
-
derivations {angry.s0 violent.s3 violent.a0 vehement.s0 angry.s0


ferocious.s0 angry.s0 angered.s0 ferocious.s0 boisterous.s0 cutthroat.s0


fierce.s0 ferocious.s0}]


[: has
-
subclasses {savageness.n0}]}]

[: wordnet
-
origlinks {[has
-
hypernyms {intensity.n0}]


[derivation
-
from {angry.s0 violent.s3 violent.a0 vehement.s0 angry.s0


ferocious.s0 angry.s0 angered.s0 ferocious.s0 boisterous.s0 cutthroat.s0


fierce.s0 ferocious.s0}]


[has
-
hyponyms {savageness.n0}]}]


Example from dbpedia (transformed to Leonardo notation):






---------------------------------------------------------

--

Kepler.Johannes


[: type scientist]

[: fullname Kepler.Johannes]

[: source
-
entities {[: wiki w.Johannes_Kepler]


[: wordnet Kepler.n0]}]

[: given
-
names <"Johannes">]

[: family
-
name "Kepler"]

[: date
-
of
-
birth [GregCal 1571]]

[: date
-
of
-
death [GregCal 1630]]

[: explain
-
seq <"German astronomer who first stated laws of


planetary motion">]

[: in
-
classes {astronomer.n0}]

[: in
-
disciplines {Astronomy Astrology Mathematics


Natural_Philosophy}]

[: studied
-
at {w.University_of_Tübingen}]

[: worked
-
at {w.University_of_Linz}]




CIA (U.S. Central Intelligence Agency) webpage:


The World Factbook

provides information on the
history, people, government, economy, geography,
communications, transportation, military, and
transnational issues for 266 world entities. Our
Reference tab includes: maps of the major world
regions, as well as Flags of the World, a Physical Map
of the World, a Political Map of the World, and a
Standard Time Zones of the World map.


CIA Factbook, example



Sweden

Chiefs of State and Cabinet Members of Foreign Governments

Date of Information: 7/23/2009


King CARL XVI GUSTAF

Prime Min. Fredrik REINFELDT

Dep. Prime Min. Maud OLOFSSON

Min. of Agriculture, Food, & Fisheries Eskil ERLANDSSON

Min. of Culture Lena Adelsohn LILJEROTH

Min. of Defense Sten TOLGFORS

Min. for Education Jan BJORKLUND

Min. for Employment Sven Otto LITTORIN

Min. of Enterprise & Energy Maud OLOFSSON

Min. of Environment Anders CARLGREN

Min. of European Affairs Cecila MALMSTROM

Min. of Finance Anders BORG

Min. of Foreign Affairs Carl BILDT

Min. of Foreign Trade Ewa BJORLING

Min. of Health & Elderly Care Maria LARSSON

Min. for Higher Education & Research Tobias KRANTZ

Min. of Infrastructure Asa TORSTENSSON


Example of (2), European University Association

(around 1000 items in their list of members):


AGH University of Science and Technology

(AGH)

AkademiaGórniczo
-
Hutnicza im.Stanislawa Staszica krakowie

Krakow l Poland l
http://www.agh.edu.pl


Individual full member


Agricultural University of Athens

(AUA)


Athinai l Greece l
http://www.aua.gr/


Individual full member


Akdeniz University

(Akdeniz Üniversitesi)

Akdeniz Üniversitesi


Antalya l Turkey l
http://www.akdeniz.edu.tr/


Individual full member


Alexander Dubcek University, Trencin


Trencianska univerzita Alexandra Dubceka v Trencíne


Trencin l Slovakia l
http://www.tnuni.sk/


Individual Associate Members

Copyright issues for knowledge acquisition
using sources on the www


Which information is covered by copyright (including
copyleft)?


Can there be other kinds of proprietary restrictions?
(Explicit or implicit contracts, EU database directive)


Do these restrictions only apply to redissemination of
information content, or also to download for use in your
own project?


What are the rules if the downloaded information is
integrated with other information and then redisseminated?


What are the rules if the downloaded information is merely
used as an instrument for the processing of other
information?

Semantic web today


A set of design principles


A number of working groups, in particular within the
WWW Consortium


A number of proposed enabling technologies:


Resource Description Framework (RDF)


Data Interchange Formats: RDF/XML, N
-
Triples


Notations, e.g. Web Ontology Language (OWL)


Software systems supporting these, e.g. Protégé


Published knowledgebases using the above

Knowledge Representation in


semantic web work (so far)


Rely on notational look
-
and
-
feel of XML


Strong emphasis on a network representation
consisting of nodes and arcs


Use of a subsumption relation in such networks,
relating a more general and a more specialized
concept


Some notations and systems also use logic
formulas for characterizing other kinds of
restrictions on admissible network structures and
other kinds of information about the domain.

Web Ontology Language (OWL)

The OWL Web Ontology Language is designed for use
by applications that need to process the content of
information instead of just presenting information to
humans. OWL facilitates greater machine interpretability
of Web content than that supported by XML, RDF, and
RDF Schema (RDF
-
S) by providing additional vocabulary
along with a formal semantics. OWL has three
increasingly
-
expressive sublanguages: OWL Lite, OWL
DL, and OWL Full.


(
From

http://www.w3.org/TR/owl
-
features/)

Namespace declaration

<rdf:RDF


xmlns ="http://www.w3.org/TR/2004/REC
-
owl
-
guide
-
20040210/wine#"


xmlns:vin ="http://www.w3.org/TR/2004/REC
-
owl
-
guide
-
20040210/wine#"


xml:base ="http://www.w3.org/TR/2004/REC
-
owl
-
guide
-
20040210/wine#"


xmlns:food="http://www.w3.org/TR/2004/REC
-
owl
-
guide
-
20040210/food#"


xmlns:owl ="http://www.w3.org/2002/07/owl#"


xmlns:rdf ="http://www.w3.org/1999/02/22
-
rdf
-
syntax
-
ns#"


xmlns:rdfs="http://www.w3.org/2000/01/rdf
-
schema#"


xmlns:xsd ="http://www.w3.org/2001/XMLSchema#">


The first two declarations identify the namespace associated with this

ontology. The first makes it the
default

namespace, stating that unprefixed
qualified names refer to the current ontology. The second identifies the
namespace of the current ontology with the prefix
vin:
. The third identifies
the base URI for this document (see
below
). The fourth identifies the
namespace of the supporting food ontology with the prefix
food:
. The fifth
namespace declaration says that in this document, elements prefixed with
owl:

should be understood as referring to things drawn from the namespace
called
http://www.w3.org/2002/07/owl#
.

This is a conventional OWL declaration, used to introduce

the OWL vocabulary.

Ontology headers

<owl:Ontology rdf:about="">


<rdfs:comment>An example OWL ontology</rdfs:comment>


<owl:priorVersion rdf:resource=


"http://www.w3.org/TR/2003/PR
-
owl
-
guide
-
20031215/wine"/>


<owl:imports rdf:resource=


"http://www.w3.org/TR/2004/REC
-
owl
-
guide
-
20040210/food"/>


<rdfs:label>Wine Ontology</rdfs:label>


...

Classes and Things


<owl:Class rdf:ID="Winery"/>

<owl:Class rdf:ID="Region"/>

<owl:Class rdf:ID="ConsumableThing"/>



<owl:Thing rdf:about="#CentralCoastRegion">


<rdf:type rdf:resource="#Region"/>

</owl:Thing>


Defining and using properties

<owl:ObjectProperty rdf:ID="madeFromGrape">


<rdfs:domain rdf:resource="#Wine"/>


<rdfs:range rdf:resource="#WineGrape"/>

</owl:ObjectProperty>


<owl:ObjectProperty rdf:ID="course">


<rdfs:domain rdf:resource="#Meal" />


<rdfs:range rdf:resource="#MealCourse" />

</owl:ObjectProperty>


<owl:Thing rdf:ID="LindemansBin65Chardonnay">


<madeFromGrape rdf:resource="#ChardonnayGrape" />


</owl:Thing>


--

LindemansBin65Chardonnay

[: type Thing]

[: madeFromGrape ChardonnayGrape]

Class subsumption; restrictions on properties

<owl:Class rdf:ID="Wine">


<rdfs:subClassOf rdf:resource="&food;PotableLiquid"/>


<rdfs:subClassOf>


<owl:Restriction>


<owl:onProperty rdf:resource="#madeFromGrape"/>


<owl:minCardinality rdf:datatype=


"&xsd;nonNegativeInteger">1</owl:minCardinality>


</owl:Restriction>



</rdfs:subClassOf>


...

</owl:Class>


The

restriction
subexpression represents an “anonymous” class. It imposes

the condition that each instance of the type
Wine

must have at least one link

labelled
madeFromGrape

Essential points about OWL


Represents binary relations between entities


Well developed machinery for managing name
spaces, versions, and the like which is considered as
necessary for large knowledgebases with distributed
contents and distributed development activity


Relies on XML syntactic tradition


Knowledge modules are organized as documents,
somewhat analogous to computer programs: entities
are “declared” before they are “used”


Not easily readable; graphic interfaces are required for
practical work with the notation

Essential point about ontology
representation languages and systems


Major issues: subsumption hierarchies, restrictions on
admissible structures, information that makes logical
inference possible


Other issues that are needed in practice:


Namespaces


Administration of modules: comments, version
information, author and IPR information, etc.


A supertype system, e.g.
class

vs
thing

in OWL


A conventional type system


Issue: how to relate subsumption, supertypes, and types?

2. Ontologies


Practical definition: that which can be
expressed using an ontology language or
ontology representation system


Some contenders:



SUMO (Suggested Upper Merged Ontology)



CYC, OpenCyc, ResearchCyc





SUMO (Suggested Upper Merged
Ontology)

The Suggested Upper Merged Ontology (SUMO) and its
domain ontologies form the largest formal public ontology in
existence today. They are being used for research and
applications in search, linguistics and reasoning.


SUMO is the only formal ontology that has been mapped to all
of the WordNet lexicon.


SUMO is written in the SUMO
-
KIF language.


SUMO is free and owned by the IEEE. The ontologies that
extend SUMO are available under GNU General Public
License.


Adam Pease is the Technical Editor of SUMO.



(From http://www.ontologyportal.org/ )

I. Geography Terms for the CIA World Fact Book

;; A. Location

;; B. Geographic coordinates

;; C. Map references

;; D. Area

;; E. Area
-

comparative

;; F. Land boundaries

;; G. Coastline

;; H. Maritime claims

;; I. Climate

;; J. Terrain

;; K. Elevation extremes

;; L. Natural resources

;; M. Land use

;; N. Irrigated land

;; O. Natural hazards

;; P. Environment
-

current issues

;; Q. Environment
-

international agreements

;; R. Geography
-

note




;; II. General Geography Terms and Background

;; A. Planet Geography & Astronomical Bodies

;; B. Directions and Distances

;; C. Land Forms

;; D. Water Areas

;; 1. Oceans & Seas

;; 2. Tides & Currents

;; 3. Water Subregions

;; 4. Fresh Water Areas

;; E. Coastal and Shoreline Areas

;; F. Air and Atmosphere

;; G. Weather & Climate

;; H. Vegetation and Biomes

;; I. Natural Disasters

;; J. Environmental Areas of Concern

(subclass SubtropicalDesertClimateZone DesertClimateZone)


(documentation SubtropicalDesertClimateZone


EnglishLanguage


"&%SubtropicalDesertClimateZone is a subclass of


&%DesertClimateZone that is characterized by an


average temperature greater than 18 degrees Celsius,


as well as very low rainfall. This is Koeppen system


'BWh'.")


(=>


(and (instance ?AREA DesertClimateZone)


(subclass ?MO Month)


(averageTemperatureForPeriod ?AREA ?MO ?TEMP)


(greaterThan ?TEMP


(MeasureFn 18 CelsiusDegree) ))


(instance ?AREA SubtropicalDesertClimateZone) )




(subclass LandlockedWater BodyOfWater)


(documentation LandlockedWater EnglishLanguage "&%LandlockedWater includes

water areas that are surrounded by land, including salt lakes, fresh water lakes,

ponds, reservoirs, and (more or less) wetlands.")


; need a way to say that the body of water is surrounded by land (e.g., perimeter)


(subclass SaltLake SaltWaterArea)

(subclass SaltLake LandlockedWater)

(documentation SaltLake EnglishLanguage


"&%SaltLake is the class of landlocked bodies of salt water, including those


referred to as 'Seas', e.g., the &%CaspianSea. But note that the


&%MediterraneanSea is a &%Sea.")


(instance CaspianSea SaltLake)

(names "Caspian Sea" CaspianSea)

(instance AralSea SaltLake)

(names "Aral Sea" AralSea)

(instance GreatSaltLake SaltLake)

(names "Great Salt Lake" GreatSaltLake)

(geographicSubregion GreatSaltLake Utah)

(instance DeadSea SaltLake)

(names "Dead Sea" DeadSea)


etc.

(instance GulfOfOman Gulf)

(instance GulfOfOman SaltWaterArea)

(names "Gulf of Oman" GulfOfOman)

(connected StraitOfHormuz GulfOfOman)

(connected GulfOfOman ArabianSea)

(meetsSpatially Iran GulfOfOman)

(meetsSpatially Oman GulfOfOman)




(instance GulfOfAden Gulf)


(instance GulfOfMexico Gulf)


(instance PersianGulf Gulf)



These are all the instances of
Gulf

in the file

3. Cyc and OpenCyc


Started in 1984 as a massive project to build up a
knowledgebase


Main knowledgebase is proprietary; subsets are open in
general, or open for research


In later years the project has extended in the directions of
natural
-
language dialog and machine learning


OpenCyc (2008): 47.000 concepts, 306.000 links


ResearchCyc (2006): logic base, additional software tools


Read the Wikipedia article about the Cyc project and
check out the Cyc webpage!

Cyc, examples of notation


(#$isa #$BarackObama #$UnitedStatesPresident)


(#$genls #$Tree
-
ThePlant #$Plant)


(#$capitalCity #$France #$Paris)


(#$implies


(#$and


(#$isa ?OBJ ?SUBSET)


(#$genls ?SUBSET ?SUPERSET))


(#$isa ?OBJ ?SUPERSET))


Cyc knowledgebase


Knowledge elements are expressed as
propositions (predicate and argument)


Cyc provides a collection of
microtheories

which
are like entityfiles in Leonardo, but consisting of
propositions and (sometimes) more complex
logical expressions

4. Protégé

Protégé

is a free, open source
ontology

editor and a
knowledge acquisition system. Like
Eclipse
, Protégé is a
framework for which various other projects suggest plugins.
This application is written in
Java

and heavily uses
Swing

to
create the rather complex user interface. Protege recently has
currently 126.870 registered users (17.135, 3.293).


It is developed at Stanford University in cooperation with the
University of Manchester and others.

Protégé: one platform, two ontology frameworks



Protégé frames: uses Open Knowledge Base Connectivity
Protocol (OKBC). Organized around classes, instances of
classes, and slots that are specific to classes. (Classes ~
types in Leonardo, slots ~ attributes).


Protégé OWL: uses Web Ontology Language (OWL).


Protégé has one editor for each of the frameworks and in a
common platform.


OKBC is an older approach; OWL is a more recent design
and is supported by W3C


Browse the Protégé tutorial for one or the other of these
two editors (see Protégé website; will also be linked from
the course website)

Approaches to ontologies and knowledgebases


Delivered as formal documents. The user
chooses how to integrate this information in his
or her system (SUMO)


Delivered as a monolithic knowledgebase that
can be queried over the Internet or downloaded
as a whole (Dbpedia, Freebase)


Delivered as a software system together with a
more or less modular knowledgebase that can
be accessed and modified using the system
(Protégé, Cyc, Wordnet)