benno_GO-ESSP200809 - NOAA

sounderslipInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

65 εμφανίσεις

M. Benno Blumenthal


International Research Institute for Climate
and Society




http://iridl.ldeo.columbia.edu/ontologies/

Connecting netcdf/CF to a
semantic framework

RDF/OWL and earth science
metadata

The standards underlying the Semantic Web
--

Resource
Description Framework (RDF) and Web Ontology
Language (OWL), among others


show great promise in
addressing some of the basic problems in earth science
metadata. In particular they provide a single framework
that allows us to describe datasets according to multiple
standards, creating a more complete description than
any single standard can support, and avoiding the
difficult problem of creating a super
-
standard that can
describe everything about everything.


RDF is not a killer app


Resource Description Framework (RDF) is a
framework to write down relationships in a
reusable way, a semantic framework


A lot of CF is not written down in a usable way,
e.g. relationships between different values of
standard_names, or how data representations in
CF correspond to concepts or other standards.


It is
past

time to fix that


Some would prefer Java, some prefer english

Different Representations of The
CF Standard

CF

English

Java

RDF/OWL


English


gloriously vague and flexible


Java


complete implementation makes a great black
-
box, an API
becomes yet
-
another
-
standard


RDF/OWL


can facilitate creating/enhancing both the English and
Java versions (as well as other programming languages)

CF metadata in a semantic
framework


A literal level which explains which
attributes are available to be attached to
datasets/variables,


A more semantic level, which gives

explicit expression to concepts like
Coordinate and Non
-
Coordinate variables,
and how a Non
-
Coordinate Variable can
be geo
-
located.


Semantic interoperability

Writing down the CF standard on a semantic
level then allows interoperability with other
standards, e.g. other ways of marking
geolocated Non
-
Coordinate variables.

additional issues


how to less
-
ambiguously tag metadata in netcdf
files so that software can more easily

determine which attribute belongs to which
metadata standard


How to better register netcdf metadata
standards in general


How to better register CF concepts


so that interoperability (or indeed operability) can
occur.

Why RDF?

Make implicit semantics explicit

Web
-
based system for interoperating
semantics

Decontextualizes the information, facilitating
reuse

RDF/OWL is an emerging technology, so
tools are being built that help solve the
semantic problems in handling data

Standard Metadata

Users

Datasets


Tools

Standard Metadata Schema/Data Services

Many Data Communities

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Super Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Standard metadata schema

One take on semantic
interoperability


`When I use a word,' Humpty Dumpty said in rather a scornful tone,


`it means just what I choose it to mean
--
neither more nor less.'


`The question is,' said Alice, `whether you CAN make words mean
so many different things.'


`The question is,' said Humpty Dumpty, `which is to be master
--
that's all.'




Through the Looking Glass (And What Alice Found There)

Carroll, Lewis

Published: 1871

Type(s): Novels, Young Readers, Fantasy

Source: Wikisource


Super Schema: direct

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Standard metadata schema/data service

Flaws


A lot of work


Super Schema/Service is the Lowest
-
Common
-
Denominator, so you end up
saying less
-
and
-
less about more
-
and
-
more.


Science keeps evolving, so that standards
either fall behind or constantly change

RDF Standard Data Model
Exchange

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Tools

Users

Datasets


Standard

Metadata

Schema

Standard metadata schema

RDF

RDF

RDF

RDF

RDF

RDF

Standard metadata schema

Tools

Users

Datasets


Standard

Metadata

Schema

RDF

RDF

RDF

Tools

Users

Datasets


Standard

Metadata

Schema

RDF

RDF

RDF

Tools

Users

Datasets


Standard

Metadata

Schem

RDF

RDF

RDF

RDF Data Model Exchange

RDF

Tools

Users

Datasets


Standard

Metadata

Schema

RDF

RDF

RDF

Tools

Users

Datasets


Standard

Metadata

Schema

RDF

RDF

RDF

Why is this better?


Maps the original dataset metadata into a standard
format that can be transported and manipulated


Still the same impedance mismatch when mapped to the
least
-
common
-
denominator standard metadata, but


When a better standard comes along, the original
complete
-
but
-
nonstandard metadata is already there to
be remapped, and “late semantic binding” means
everyone can use the new semantic mapping


Can use enhanced mappings between models that have
common concepts beyond the least
-
common
-
denominator


EASIER


tools to enhance the mapping process,
mappings build on other mappings

RDF Architecture

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

Virtual (derived) RDF

queries

queries

queries

Triplets of


Subject


Property (or Predicate)


Object


URI’s identify things, i.e. most of the above

Namespaces are used as a convenient
shorthand for the URI’s

URI’s do not need to resolve

RDF: framework for writing
connections

Datatype Properties

{WOA} dc:title “NOAA NODC WOA01”

{WOA} dc:description “NOAA NODC
WOA01: World Ocean Atlas 2001, an atlas
of objectively analyzed fields of major
ocean parameters at monthly, seasonal,
and annual time scales. Resolution: 1x1;
Longitude: global; Latitude: global; Depth:
[0 m,5500 m]; Time: [Jan,Dec]; monthly”

Object Properties

{WOA} iridl:isContainerOf {Grid
-
1x1},

{Grid
-
1x1} iridl:isContainerOf {Monthly}

WOA01 diagram

Standard Properties

{WOA} dcterm:hasPart {Grid
-
1x1},

{Grid
-
1x1} dcterm:hasPart {MONTHLY}


Alternatively


{WOA} iridl:isContainerOf {Grid
-
1x1},

{iridl:isContainerOf} rdfs:subPropertyOf
{dcterm:hasPart}


{SST} rdf:type {cfatt:non_coordinate_variable},


{SST} cfobj:standard_name {cf:sea_surface_temperature},


{SST} netcdf:hasDimension {longitude}

Data Structures in RDF

Object properties provide a framework for
explicitly writing down relationships
between data objects/components, e.g.
vague meaning of nesting is made explicit

Properties also can be related, since they
are objects too

Virtual Triples

Use Conventions to connect concepts to
established sets of concepts

Generate additional “virtual” triples from the
original set and semantics

RDFS


some property/class semantics

OWL


additional property/class semantics:
more sophisticated (ontological)
relationships

SWRL


rules for constructing virtual triples

Define terms


Attribute Ontology


Object Ontology


Term Ontology


These are different ways RDF can be used

Attribute Ontology


Subjects are the only type
-
object


Predicates are “attributes”


Objects are datatype



Isomorphic to simple data tables


Isomorphic to netcdf attributes of datasets


Some faceted browsers: predicate = facet

e.g. longwell from MIT

cf
-
att


CF transcribed

cf
-
att with some attributes

RDF helps decontextualizes

{sst variable}

cfatt:standard_name

“sea_surface_temperature”


Where cfatt = the cfatt URI
prefix, temporarily

http://iridl.ldeo.columbia.ed
u/ontologies/cf
-
att.owl
#


Put data in netcdf file


Set conventions
attribute to “CF
-
1.0”


Set standard_name of
variable “sst” to
“sea_surface_temperat
ure”


Current system requires
data in a netcdf file for
CF to be understood

Object Ontology



Objects are object
-
type


Isomorphic to “belongs to”


Isomorphic to multiple data tables connected by
keys


Express the concept behind netcdf attributes
which name variables


Concepts as objects can be cross
-
walked


Concepts as objects can be interrelated

Example: controlled vocabulary

{variable} cfatt:standard_name {“string”}

Where string has to belong to a list of
possibilities.


{variable} cfobj:standard_name {stdnam}

Where stdnam is an individual of the class
cfobj:StandardName



Example: controlled vocabulary

Bi
-
direction crosswalk between the two is
somewhat trivial, which means all my
objects will have both


cfatt:standard_name

and

cfobj:standard_name


Example: controlled vocabulary

If I am writing software to read/write netcdf
files, I use the cfatt ontology and in
particular cfatt:standard_name


If I am making connections/cross
-
walks to
other variable naming standards, I use

cfobj:standard_name


Some cf
-
obj classes

Some cf
-
obj classes

Term Ontology

Concepts as individuals

Simple Knowledge Organization System
(SKOS) is a prime example


standard_name as object would be such





Nuanced tagging

Concepts as objects can be interrelated:
specific terms imply broader terms

Object ends up being tagging with terms
ranging from general to specific.


Search can then be nuanced


tagging can proceed in absence of perfect
information


Partial information can be written down

CF standard names

.. I would add that standard names alone (in the
cases where a standard name is sufficient) have
the same kind of role as common concepts. The
definitions of standard names allow some
vagueness, though some are more precise than
others, because their role is to indicate which
things should validly be regarded as the same
thing by visualisation and processing software


Jonathan Gregory 04/22/08 23:22:01 http://cf
-
pcmdi.llnl.gov/trac/ticket/24


CF standard regions

I don't think the regions can be exactly standardised, because part of
the reason for having names is in order to be somewhat "vague".
Just as a common standard name is given to quantities from
different data sources when those quantities are regarded as
comparable, the same standard region name would be given to data
which represent the same region in a way which is regarded

as comparable. For instance, different GCMs do not have exactly
the same shape for the Atlantic Ocean, but Atlantic meridional
overturning streamfunctions are calculated from each model, and
these are regarded as comparable.


Jonathan Gregory cf
-
metadata@cgd.ucar.edu

date

Wed, Aug 27, 2008 at 5:11 AM


I.E.

In other words, the broader the standard, the
more vague.


On the other hand, we can say something.
In fact, we can say quite a lot about how
these terms interrelate. And how they
relate to less broad, less vague systems.

What we can do easily

Establish URI’s for the concepts in CF (standard_names,
standard_regions, the attributes themselves) so that statements can
be written about them in XML and RDF.

Establish a machine
-
readable version of
http://www.unidata.ucar.edu/software/netcdf/conventions.html

so
that we can write code to extract (decontextualize) metadata from
netcdf files

Start writing down the relationships between the concepts.

Agree on a cf
-
att ontology, and work on cf
-
obj so that we can connect
with other conventions.

Set a convention for explicitly labeling netcdf attributes with their
convention, i.e. namespace labels so that process of figuring out
which convention covers which attribute is purely gramatical

Set a convention for referring to a URI
-
identified concept in a netcdf file

Search Interface


Items (datasets/maps)



Terms


Facets


Taxa

Search Interface Semantic API

{item} dc:title dc:description rss:link iridl:icon


dcterm:isPartOf {item2}


dcterm:isReplacedBy {item2}


{item} trm:isDescribedBy {term}


{term} a {facet} of {taxa} of {trm:Term},

{facet} a {trm:Facet}, {taxa} a {trm:Taxa},

{term} trm:directlyImplies {term2}

Faceted Search w/Queries

http://iridl.ldeo.columbia.edu/ontologies/query2.pl?...

RDF Architecture

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

Virtual (derived) RDF

queries

queries

queries

Data Servers

Ontologies

MMI

JPL

Standards

Organizations

Start Point

RDF Crawler

RDFS Semantics

Owl Semantics

SWRL Rules

SeRQL CONSTRUCT

Search Queries

Location

Canonicalizer

Time

Canonicalizer

Sesame

Search Interface

bibliography

IRI RDF Architecture

Cast of Characters

NC


netcdf data file format

CF


Climate and Forecast metadata
convention for netcdf

SWEET
-

Semantic Web for Earth and
Environmental Terminology (OWL
Ontology)

IRIDL


IRI Data Library

CF attributes

SWEET Ontologies

(OWL)

Search Terms

CF Standard Names

(RDF object)

IRIDL Terms

NC basic attributes

IRIDL

attributes/objects

SWEET as Terms

CF Standard Names

As Terms

Gazetteer Terms

CF data objects

Location

Thoughts


Pure RDF framework seems currently
viable for a moderate collection of data


Potential for making a lot of implicit data
conventions explicit


Explicit conventions can improve
interoperability


Simple RDF concepts can greatly impact
searches

Some Thoughts


Reproducibility implies complete metadata


Non
-
standard complete metadata just needs to
be mapped to more standard schemes


A multiple
-
scheme system like RDF retains
reproducibility even with partial mapping to
standards


Should be able to measure the misfit


find the
space of the “unexplained”


guidance for
developing standards.

Stovepipe Conventions


Fixed Schema


Agreed upon metadata domain


Agreed upon data domain


Designed to be a partial solution


General server software needs to decide
whether data legitimately fits the standard

User contemplates bash
-
to
-
fit