(SESF): toward configurable

architectgroundhogInternet και Εφαρμογές Web

4 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

146 εμφανίσεις

The Semantic
eScience

Framework
(SESF):
toward
a
configurable
data

application environment?

Peter Fox (PI)

Deborah
McGuinness

(co
-
PI)

Jim
Hendler

(co
-
PI)

Tetherless

World Constellation

http://tw.rpi.edu



Themes


Future Web


Web Science


Policy


Social

Xinformatics


Data Science


Semantic eScience


Data Frameworks

Semantic Foundations


Knowledge Provenance


Ontology Engineering Environments


Inference, Trust

Hendler

Fox

McGuinness

+ ~ 30 Post
-
doc, Staff, Grad, UGrad

Outline


Background


Why a framework and not a system?


How this came about


Lineage of this effort


Use cases, design and development
method(s
)


Some detail (perhaps more than you want)…*


Moving from core semantics to framework semantics
and configuration


New use cases


Open source:
ontologies

and software?


A role for participation

3

Tetherless World Constellation

4

Background

Scientists should be able to access a global, distributed
knowledge base of scientific data that:


appears to be integrated


appears to be locally available

But… data is obtained by multiple

means (instruments,
models, analysis)
using various protocols, in differing
vocabularies, using (sometimes unstated) assumptions,
with inconsistent (or non
-
existent) meta
-
data. It may
be inconsistent, incomplete, evolving, and distributed

And… there
exist(ed
) significant levels of semantic
heterogeneity, large
-
scale data, complex data types,
legacy systems, inflexible and unsustainable
implementation technology…


Frameworks vs. Systems


Prior to 2005,
we
built systems


Rough definitions


Systems have very well
-
define entry and exit
points. A user tends to know when they are using
one. Options for extensions are limited and
usually require engineering


Frameworks have many entry and use points. A
user often does not know when they are using
one. Extension points are part of the design


You don’t have to agree, this was our view



Tetherless World Constellation

5

Origins


In 2004 we started a virtual observatory project
based on semantic technologies


We needed implementations


and we achieved
that


Use case driven


in solar and solar
-
terrestrial
physics with an emphasis on instrument
-
based
measurements and real data pipelines


We also needed semantics for


Data integration (volcano
-
climate)


Science provenance (solar physics, air quality)


Data Mining (atmospheric physics)

Tetherless World Constellation

6

Late this year


We were funded to take our developments into a
configurable semantic data framework (ARRA)


Thus we are pushing on ontology languages and
tools on new ways:


OWL 2


RL in particular


Annotations


Property chaining


Query (SPARQL)


Rules (RIF and SWRL)


Non
-
specialist use cases

Tetherless World Constellation

7

Lineage (acronym alert)


Virtual Observatories


VSTO, OOI, BCO
-
DMO


Provenance


SPCDIS, MDSA and others


Data Integration
-

SESDI and others


Data Mining
-

SAM


Faceted Search
-

Many


Leads to a need for


Semantic toolkit and applications that serve any
discipline, purpose but are …


Configurable … (semantics played a big role in the
80’s)

Tetherless World Constellation

8

9

Semantic Web Methodology and
Technology Development Process

Use Case

Small Team,
mixed skills

Analysis

Adopt
Technology
Approach

Leverage
Technology
Infrastructure

Rapid
Prototype

Open World:
Evolve, Iterate,
Redesign, Redeploy

Use Tools

Science/Expert
Review & Iteration

Develop
model/
ontology

Evaluation


10

Science and technical
use cases

Find data which represents the state of the neutral
atmosphere anywhere above 100km and toward the
arctic circle (above 45N) at any time of
high
geomagnetic activity
.



Extract information from the use
-
case
-

encode knowledge


Translate this into a complete query for data
-

inference
and integration of
data from instruments, indices and
models


Provide semantically
-
enabled, smart data query
services via a SOAP web for the Virtual Ionosphere
-
Thermosphere
-
Mesosphere Observatory that
retrieve data, filtered by constraints on Instrument,
Date
-
Time, and Parameter in any order and with
constraints included in any combination.


Fox
-

APAC 2007, Driving
e
-
research:
Grids and Semantics

11

… … … …

VO
Portal

Web
Serv.

VO
API

DB
2

DB
3

DB
n

DB
1

Semantic

mediation

layer

-

VSTO

-

low

level

Semantic

mediation

layer

-

mid
-
upper
-
level

Education,

clearinghouses,

other

services,

disciplines,

etc
.

Metadata,

schema,

data

Query,

access

and

use

of

data

Semantic

query,

hypothesis

and

inference

Semantic

interoperability

Added

value

Added

value

Added

value

Added

value

Mediation Layer


Ontology
-

capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and
associated classes, properties) and Service Classes


Maps queries to underlying data


Generates access requests for metadata, data


Allows queries, reasoning, analysis, new hypothesis
generation, testing, explanation, etc.



Fox
-

APAC 2007, Driving
e
-
research:
Grids and Semantics

12

Partial

exposure

of

Instrument

class

hierarchy

-

users

seem

to

LIKE

THIS

Semantic

filtering

by

domain

or

instrument

hierarchy

13

Inferred

plot

type

and

return

required

axes

data


Fox
-

APAC 2007, Driving
e
-
research:
Grids and Semantics

14

VSTO
-

semantics and
ontologies

in an
operational environment:

www
.vsto.org


Web Service

15

Data integration Use
Case


Determine
the
statistical
signatures
of both
volcanic and
solar
forcings

on
the height
of the
tropopause

16

SESDI:
A Better Way to Access Data

The Problem

Scientists only use data from a single instrument because it is difficult to access,
process, and understand data from multiple instruments.

A typical data query might be:

“Give me the temperature, pressure, and water vapor from the AIRS instrument from
Jan 2005 to Jan 2008”

“Search for MLS/Aura Level 2, SO2 Slant Column Density from 2/1/2007”

A Solution

Using a simple process, SESDI allows data from various sources to be registered in
an ontology so that it can be easily accessed and understood. Scientists can use
only the ontology components that relate to their data. An SESDI query might
look like:

“Show all areas in California where sulfur dioxide (SO2) levels were above normal
between Jan 2000 and Jan 2007”

This query will pull data from all available sources registered in the ontology and
allow seamless data fusion. Because the query is measurement related, scientists
do not need to understand the details of the instruments and data types.

17

Detection and attribution
relations…

Ontology packaging

Tetherless World Constellation

18

SESDI Ontology fragment

19

Few things to show

20

21

Semantic framework indicating how volcano
and atmospheric parameters and databases
can immediately be plugged in to the semantic
data framework to enable data integration
.

SWEET

1.0 as
Upper
-
Level Earth System

Science Ontology: High
-
Level Classes

Substance

Physical

Law

Living

Substance

Physical

Process

Planetary

Realm

Physical

Phenomena

Data

Physical

Property

subClassOf

Math

Math

Calculus

Function

Geometry

Statistics

Space

SpaceCoordinates

SpaceDirection

SpaceDistribution

SpaceObject

Time

TimeGeologic

Vector

Physics

Dynamics

ElecMagnetism

Field

FluidStatics

Fluid Dynanics

Gravity

Particle

Radiation

Solid

SpectralRange

Thermo

Waves

Chemistry

Compound

Element

Organic

Process

State


Science

System

Units

Astronomy

Heliosphere

Planet

Star

Biology

Animal

Biome

Ecology

Plant

Process

Geology

Basin

Continental

Craton

Oceanic

Petrology

Resources

Tectonics

Volcanism

Hydrosphere

Chemistry

Cryosphere

Dynamics

Groundwater

Ocean

SurfaceWater
Human

Agriculture

Aviation

Commerce

Infrastructure



Data

DataFile

DataService

Instrument

Atmosphere

Boundary

Chemistry

Cloud

Dynamics

Electric

Front

Precipitation

Pressure

Temperature

Water

Wind

Geography

Border

Coast

Geomorphology

Landform

Soil

SWEET 2.0

Modular

Ontologies

Abstract to Applied

24

Physical quantity versus measured
as quantity

Value and units?

Reference frame?

Reference units?

Value and units?

25

Level 1:


Data Registration

at the Discovery Level,

e.g. Volcano

location and activity

Level 2:


Data Registration

at the Inventory Level,

e.g. list of datasets,

times, products


Earth Sciences Virtual Database

A Data Warehouse where

Schema heterogeneity problem is

Solved; schema based integration


Data Discovery

Level 3:


Data Registration

at the Item Detail

Level, e.g. access to

individual quantities

Ontology based

Data Integration

Using scientific

Workflows

Data Integration

A.K.Sinha, Virginia Tech, 2006

Data registration

26

Registering Volcanic Data (1)

27

Registering Volcanic Data (2)



No explicit lat/long data



Volcano identified by name



Volcano ontology framework will link name to

location

28

Registering Atmospheric
Data (2)

29

For SESDI what does this
mean?

Science

Data Ingest


Typical science data
processing pipelines


Distributed


Some metadata in silos


Much metadata lost


Many human
-
in
-
loop
decisions, events


No metadata
infrastructure for any
user

CHIP Data Ingest

30

Provenance and Domain concepts
in the use cases


What were the
cloud cover

and
seeing conditions




during the
observation period of this image
?


What
calibrations

have
been applied to this image
?


Why does this image

look
bad
?

data processing

data processing

solar science

provenance and
data processing

provenance and
solar science

Provenance, data processing (QA),
and solar science

solar science

31

Multi
-
Model Individuals

32

Using PML
and domain
and data
processing
ontologies

and OWL as
the
encoding

PML NodeSet using Multi
-
model
individuals

33

Data Mining in the ‘new’ Distributed
Data/Services Paradigm

Too many choices!!


And that’s only part of the toolkit


ADaM
-
IVICS toolkit has over 100+ algorithms

Ontology Use

Semi
-
automated
Workflow Composition

Filtering services based

on data format

Semi
-
automated
Workflow Composition

Filtering service options

based on both data format

and task selected

Semi
-
automated
Workflow Composition

Final Workflow

40

OPeNDAP

Hyrax
Architecture

OLFS

BES


OPeNDAP Lightweight Front end Server (OLFS)


Receives requests and asks the BES to fill them


Uses Java Servlets


Does not directly ‘touch’ data


Multi
-
protocol

Data


Back End Server (BES)


Reads data files, Databases, et c., returns info


May return DAP2 objects or other data


Does not require web server

Client

41

GridFTP

DAP2

HTTP

DAP2

ASCII output

HTML form

Info output

OPeNDAP

Lightweight
Front end Server

THREDDS

Request Formulation**

Request from client

Response to client

BES

SOAP
-
DAP (HTTP)

DAP2 (
GridFTP
, HTTP)

RDF, OWL, JSON (HTTP
)

PML output

42

Hyrax/ Back
-
end Server

Network Protocol and

Process start/stop

activities

Data Store Interfaces

BES Framework

PPT*

Initialization/

Termination

DAP2

Access

NetCDF3

HDF4

RDF/ SPARQL



Provenance

Commands**

BES Commands/

XML Documents

*PPT is built in (other protocols)

**Some commands are built in

Data

Data

Data

Data

Catalogs

Enough already…?

Core and Framework Semantics


With the substantial adoption of semantics in
science data applications


There is a need for a higher level of application/
tool infrastructure


And for a higher level of integrated functionality in
the framework


Others are experiencing the same lessons with
ontology and application development and realize
it is time to stop re
-
inventing

Tetherless World Constellation

43

What are Core and Framework
Semantics?


Core


Data Product/ processing, Provenance, Portal, Mining,
Security, Faceted search, Data and Service Registration,
Integration, …


Framework


Allow for semantic extensions including those from outside
the framework, i.e. a framework ontology


Keep pace with advances in semantic web technologies


Role of configuration


To support use case process flow


To support domain (use case) semantics


declarative and
procedural

Tetherless World Constellation

44

Modularization


Is a key enabling function requirement


We’ve had a lot of experience with it, good
and bad


Are now exploring and exploiting OWL 2 RL
property chains and SWRL built
-
in rules


Also developing ‘interface’
ontologies

to
accommodate use case process
flow(s
)

Tetherless World Constellation

45

Collaboration: portals


To make the framework really useful, higher
-
level components need to appear in sensible
places, e.g.


Faceted search as a WSRP in a
Drupal

6 module


Exploitation of content types in
Drupal


Also a portal ontology…


Integrated security and collaboration environment


Also good for Web 2.0 and interoperability

Tetherless World Constellation

46

47

So, what are the use cases?


Re
-
implement VSTO (and our other projects),
i.e. migrate them to the new framework


Ocean observing system test
-
bed


Biological and chemical oceanography


Education


Ecosystem assessment


Policy


carbon budgets

Tetherless World Constellation

48

Real use cases:

Marine habitat
-

change

Scallop,

number,

density

Scallop, size,

shape,
color
,

place

Scallop,

shell

fragment

Rock

What is this?

Flora or fauna?

Dirt/ mud; one person’s noise is another person’s signal

Several disciplines; biology,
geology, chemistry, oceanography


Several applications; science,
fishing, habitat change, climate and
environmental change, data
integration


Complex inter
-
relations, questions


Use case
: What is the temperature
and salinity of the water and are
these marine specimens usual or
part of an ecosystem change?

Src
: WHOI and the
HabCam

group

50

Data has Lots of Audiences

From “Why EPO?”, a NASA internal

report on science education, 2005

More Strategic

Less Strategic

51

What is a Non
-
Specialist
Use Case?

Teacher accesses internet goes
to An Educational Virtual
Observatory and enters a
search for “Aurora”.

Someone should
be able to query a
virtual
observatory
without having
specialist
knowledge

52

Teacher receives four groupings of search results:

1) Educational materials:
http://www.meted.ucar.edu/topics_spacewx.php

and
http://www.meted.ucar.edu/hao/aurora/


2) Research, data and tools: via

virtual observatories,
knows to search for brightness, or green/red line
emission

3) Did you know?: Aurora
is a

phenomena of the upper
terrestrial atmosphere (ionosphere) also known as
Northern Lights

4) Did you mean?: Aurora Borealis or Aurora
Australis
,
etc
.

What should the User
Receive?


Fox WHOI: Semantic Data
Frameworks March 20, 2008

53

Semantic Information Integration: Concept
map for educational use of science data in
a lesson plan


Fox WHOI: Semantic Data
Frameworks March 20, 2008

54

Open Source
-

implications


Communities of practice and governance for
knowledge



eek


isn’t this ‘owned’ by
scientists?


Maintaining a robust evolving software
framework; software and
ontologies

is a
challenge


Must take advantage of related open source
developments


Publication, citation and attributions mechanisms
must be explored as incentives for participation

Tetherless World Constellation

55

Participation


Communities are essential for use case
development; specialist and non
-
specialist


We welcome participation at all levels in the
effort; technology,
ontologies
, tools,
applications, …


There are a series of community meetings
being planned over the next 6
-
18 months

Tetherless World Constellation

56

Summary


We are now undertaking an effort to develop a
configurable semantic data framework


Use case driven


Ontology driven at many levels


Application oriented


All along the way, we will continue to evaluate
our semantic developments and implementations
to gauge their benefits or deficiencies


We continue to need a broad range of
participants and communities to enable success

Tetherless World Constellation

57

Further Information


http://tw.rpi.edu/portal/SESF


And SPCDIS, MDSA, SAM, VSTO, DQSS,
Aerostat, BCO
-
DMO, …


Contacts:


pfox@cs.rpi.edu


dlm@cs.rpi.edu


hendler@cs.rpi.edu


Tetherless World Constellation

58

Back shed


Tetherless World Constellation

59

60

Virtual Observatories

Make data and tools quickly and easily accessible to a
wide audience.

Operationally, virtual observatories need to find the
right balance of data/model holdings, portals and
client software that researchers can use without
effort or interference
as if all the materials were
available on his/her local computer using the user’s
preferred language: i.e.
appear to be local and
integrated

Provide
controlled vocabularies that

are used
for
interoperation in appropriate domains along with
database interfaces for access and
storage. Must also
provide “
smart” tools for evolution and maintenance.

61

Semantic Web Benefits


Unified/ abstracted query workflow: Parameters, Instruments, Date
-
Time


Decreased input requirements for query: in one case reducing the number of
selections from
eight

to
three


Generates only syntactically correct queries: which was not always insurable in
previous implementations without semantics


Semantic query support: by using background
ontologies

and a
reasoner
, our
application has the opportunity to only expose coherent query (portal and
services)


Semantic integration: in the past users had to remember (and maintain codes)
to account for numerous different ways to combine and plot the data whereas
now semantic mediation provides the level of sensible data integration
required,

and exposed
as smart web services


understanding of coordinate systems, relationships, data synthesis,
transformations.


returns independent variables and related parameters


A broader range of potential users (PhD scientists, students, professional
research associates and those from outside the fields)

Other projects


ontologies

for
faceted search

Tetherless World Constellation

62