Semantics - NOSC

farmpaintlickInternet and Web Development

Oct 21, 2013 (3 years and 7 months ago)

65 views

Dictionaries, Vocabularies,
Namespaces, Thesauri,
Ontologies,

and all that


Rob Raskin

NASA/Jet Propulsion Laboratory

Raskin@jpl.nasa.gov


June 21, 2011

Why care about data
semantics?


Current data may need to be archived for
decades or centuries


Global change analysis requires consistent
comparisons across decades or centuries


Synonyms


multiple words, same meaning


Homonyms


same word, multiple meanings


Measurement ambiguities


Sea “
surface”

temperature
-

at what “height”?

Let’s eat, Grandma.

Let’s eat Grandma.

Time flies like an arrow.

Fruit flies like a pie.

Semantic Understanding

is Difficult!

LA Times headline

“Mission accomplished. Major combat
operations in Iraq have ended”




-
Pres. Bush, 2003

Variable t: temperature

Variable t: time

Data quality= 5

Data quality= 3

Surface wind: measured 3 m above surface

Surface wind: measured at surface

Semantic Spectrum

Catalog




List of
controlled
words






Semantics

Formal
Hierarchy

w/ Relations


Relations
between
children
defined

Informal
Hierarchy



Terms
classified by
categories

(e.g.
GCMD)


Formal
Hierarchy



Terms
inherent
properties/

meaning of
parent


Vocabulary



Ontology

Human
-
Readable


Machine
-
Readable

Scope of Representation


Parameter names


Scientific units


Spatial/temporal extent/resolution


Data quality


Data provenance


Data type


Data services

CF

What is an Ontology?



An approach to store knowledge


Machine
-
readable and human
-
readable


Provides definition of words or phrases


expressed relative to other terms


Offers shared understanding of concepts and
knowledge reuse


Provides
semantics

for machine
-
to
-
machine (or
human
-
to
-
human) communications

Practically, an ontology is a…


Framework for classifying knowledge


Ensures there is a “place” to store
components of knowledge

Ontology Languages:

RDF and OWL



W3C has adopted languages that specialize XML


Resource Description Formulation (RDF)


Ontology Web Language (OWL)


Languages predefine specific tags


RDF: Class, subclass, property, subproperty


Class
-
property similar to Entity
-
Relation of DBMS theory



RDF Class and Subclass


Class


The basic element or “thing” or “noun”


Subclass


Inherits all attributes of parent class


Typically, adds
Properties

to distinguish subclass
from its parent


Can have multiple parent classes

Cat

Animal

is a

has Legs

4

RDF Property & Subproperty



Property


A “verb”


Examples:


measures, hasLocation, hasArea, northOf


Properties can have attributes:


domain, range, transitive, …


Subproperty


Inherits parent attributes

OWL Language



Extends RDF to predefine further tags


cardinality


transitive relations


inverse relations


same as, different from


union, intersection


domain, range


Import (from one ontology to another, to enable sharing and
reuse of the work of others)




OWL Ontology Example


<Class “WaterPollution>


<SubClassOf “Pollution”>


<Restriction>


<
OnProperty

“hasSubstance”>


<
AllValuesFrom

“Water”>


</Restriction>


</SubClassOf>


</Class>

Statements about Statements


OWL allows us to make statements about statements


Degree of belief


Timestamps


Provenance / Lineage


Probability / Uncertainty


Security issues


Author / Source / Community


Community dialect




Observed

Feature

Landsat

has Probability

0.75

Corn Crop

has Source

is a



Ontologies provide a common
namespace


Documents, web pages, data, people, and
other resources can be mapped/
categorized to this namespace


Anybody can create or extend the
namespace


Why are Ontologies Useful? (1)


Dictionary


Concepts in the namespace not just “listed” (a
taxonomy), but “defined” (in terms of others)


Concepts defined via specializations of broader
concepts
--

with properties that distinguish each child
from the broader parent concept


Reductionist approach of science


Arbitrary levels of specialization are possible


As with Library of Congress and Dewey Decimal
numbering systems

Why are Ontologies Useful? (2)





Disambiguation


Reduces semantic mismatch


Synonym support (multiple terms with
same meaning)


label available to indicate preferred term for
each community


Homonym support (multiple meanings of
same term)


separate namespaces (
President:Bush vs
Plant:Bush)


Why are Ontologies Useful? (3)



Why are Ontologies Useful? (4)


Machine readable


Ontologies are generally stored in a format
(XML) that is readable by both humans and
computers


Computer accessibility enables automated
reasoning


Knowledge retention


Corporations use knowledge management to
ensure institutional memory over time, as
personnel come and go


Climate disciplines can do the same!


Facts/data can be represented and related in a
consistent manner


Common sense knowledge is captured


Instrument characteristics


Why are Ontologies Useful? (5)

Ontology Representation (1):

Knowledge Base of Triples

Noun
-
Verb
-
Noun representation






Parent
-
child relations:



Flood


is a

Weather Phenomena


GeoTIFF


is a

File Format


Soil Type


is a

Physical Property


Pacific Ocean

is a

Ocean



Or create your own relations:



Ocean


has substance

Water


Sensor


measures

Temperature

Ontology Representation

(2): Visual


Ontology Representation (3):
XML, RDF, and OWL



W3C has adopted XML
-
based standard ontology
languages


Resource Description Formulation (RDF)


Ontology Web Language (OWL)


Languages predefine specific tags


RDF: Class, subclass, property, subproperty, …


OWL: Extends RDF to predefine further tags such as cardinality


Three flavors of OWL (Lite, DL, and Full)


Use of standard languages makes it easy to extend
(specialize) work of others

Global Warming Query in the
Semantic Web

Find data which demonstrates global warming at high latitudes
during summertime and plot warming rate.



Extract information from the use
-
case
-

encode knowledge


Translate this into a complete query for data
-

inference and integration
of
data from instruments, indices and models


“Global warming”= Trend of increasing temperature over large
spatial scales

“High latitude”= |Latitude| > 60 degrees

“Summertime”= June
-
Aug (NH) and Jan
-
Mar (SH)

“Find data”= Locate datasets using catalogs, then access and
read it

“Plot warming rate”= Display temperature vs time

Semantic Web for Earth and
Environmental Terminology
(SWEET)


Concept space written in OWL


Initial focus to assist search for data resources


Funded by NASA


Later focus to serve as community standard (upper
-
level
Earth system science ontology)


Enables
scalable

classification

of Earth system science and
associated data concepts


Specialists can further refine SWEET concepts


SWEET 2.2 has 6600 concepts in 200 modular ontologies


http://sweet.jpl.nasa.gov


SWEET Top
-
Level View

CF vs SWEET Representation

CF (
traditional single
-
attribute parameter name
):

tendency_of_mole_concentration_of_dissolved_

inorganic_phosphorus_in_sea_water_due_to_

biological_processes



SWEET
(multi
-
attribute parameter name):


Quantity= mole_concentration


Transformation= tendency


State= dissolved, inorganic


Substance= phosphorous


Medium= sea_water


Process= biological_processes

SWEET Data Ontology


Dataset characteristics


Format, data model, dimensions, …


Provenance


Source, processing history, …


Parameters


Scale factors, offsets, …


Data services


Subsetting, reprojection, …


Quality measures


Special values



Missing, land, sea, ice, ...

Best Practices


Keep ontologies small, modular


Use higher level ontologies where possible


Identify hierarchy of concept spaces


Try to keep dependencies unidirectional


Gain community buy
-
in


Involve respected leaders

Ontology Development Tools:
CMAP


Free, downloadable tool for knowledge
representation and ontology
development


Visual language with input/export to
OWL


Supports subset of OWL language


http://cmap.ihmc.us/coe

Resources


ESIP Semantic Web Cluster


Monthly telecons


Tutorials


Ontology development


Datatypes


data services


SWEET


http://sweet.jpl.nasa.gov



Rob Raskin raskin@jpl.nasa.gov