International Water Information

sounderslipInternet and Web Development

Oct 22, 2013 (3 years and 10 months ago)

99 views

SAN DIEGO SUPERCOMPUTER CENTER



International Water Information
Systems: Evolving the CUAHSI HIS
to a Standards
-
based Infrastructure

David Valentine

Ilya

Zaslavsky

David
Arctur

SAN DIEGO SUPERCOMPUTER CENTER

Overview


Background
:


Acronyms


Translator


Water Information Services Concept Study


Viewpoints


Enterprise Viewpoint


Information Viewpoint


WaterML


Computation Viewpoint


Mapping to OGC


Engineering Viewpoint


Catalog


Discovery


Services Monitoring and Reporting

SAN DIEGO SUPERCOMPUTER CENTER

Delivering Water Information is more
than time
series


Services/Data Sources


10’s to 1000’s


Sites


1 to 1 million


Phenomena


1 to 10,000


Series


10’s to millions


Site, Phenomena, Data Type, statistics, Time Range, Count


quality control level (1+) source (1+), method (1+)


Domain Lists


Organizations, Methods, QC, Units


Results/Data Values


date, value, censored, quality control level, source, method, qualifier (1+), sample code


Analytical Chemistry Details


WQX




HIS is not one service.

HIS based on OGC standards will be a set of
services

SAN DIEGO SUPERCOMPUTER CENTER

Open Geospatial Consortium
Acronyms


WMS



Web Map Server. Returns an image


WFS


Web Feature Server. Returns Geography Markup Language.


Demonstrated by UT to provide series and sites


CSW


Catalog Services for the Web


Geodata
, Geo services


For Example: ESRI
Geoportal


SOS


Sensor Observation Service


Create, manage sensors.


Discover and retrieve results


WPS


Web Processing Service


SAN DIEGO SUPERCOMPUTER CENTER

CUAHSI to Open GIS translator

CUAHSI

OGC Observations

and Measures

Site

Feature

Variable

Observed Phenomena

Series

~Observation

DataValue

Result

Method

Procedure

TimeSeries

Values

list

TimeSeriesCoverage

WaterML:Timeseries

Controlled

Vocabulary

Coded Domain

SAN DIEGO SUPERCOMPUTER CENTER

Water Information Services Concept
Development Study Report


Open Geospatial Consortium Engineering
Document


Outlining the CUAHSI Experience, and the
migration to OGC Standards





SAN DIEGO SUPERCOMPUTER CENTER

Geospatial Service Architecture Viewpoints

SAN DIEGO SUPERCOMPUTER CENTER

Drivers for the Concept Development Study


New requirements stemming from operational experience with existing
system, as expressed by government and other users:


Transition to OGC model


for better interoperability, including international:
what is the
suggested path, what are new service interfaces
, and what
may be missing from
this reference model
?


Federation of catalogs


since many data providers stand up catalogs, also better
scalability:
what is the suggested combination of catalog technologies and
interfaces
?


Recognition that we don’t need to search over all services:
what are the better search
patterns
(e.g. 3
-
step data access: identify services, then extract time series metadata,
and then request data content for the time series)?


Recognition that we can (and need to) rely on common implementations of mature,
modular standard specifications:
what is an appropriate operational governance
model
for distribution of roles and responsibilities within such a modular system?

SAN DIEGO SUPERCOMPUTER CENTER

ENTERPRISE VIEWPOINT

Purpose, scope, policies: What for? Why? Who? When?

© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER

The Enterprise Agenda


Addressing key bottlenecks of hydrologic data sharing and
integration for the next decade:


As massive volumes of hydrologic data become available it is important to
develop sophisticated data integration strategies and architectures
enabling re
-
purposing and re
-
using observations


Distributed hydrologic data should be easily discoverable


Data are structurally and semantically heterogeneous, follow different
spatial and temporal sampling patterns, and undergo different types of
processing before they become available


Addressing problems that could not be addressed before (e.g. regional
and continental scale modeling, global climate change effects, large
-
scale
disaster response) requires clarification in the roles and responsibilities of
all system stakeholders, including government and academic monitoring
and research activities


Making hydrologic data publishing easy with standards
-
compliant
mainstream software


Integrating government and academic sources of hydrologic observations


Integrating data across research domains and languages

© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER

Answering key Enterprise
-
level questions
-

1


Suggested migration path:


Mapping CUAHSI HIS architecture to OGC reference model


Mapping of information models (for features, time series, catalogs) to OGC encoding
specifications described in the Information Viewpoint


Mapping of data access, metadata, catalog and processing services to OGC service interface
specifications described in the Computational Viewpoint


Mapping of specific technologies used in CUAHSI HIS to technologies implementing OGC
-
RM is described in the Engineering Viewpoint


Elements missing
from
the OGC reference model
:


Managing semantic descriptions and controlled vocabularies


How is the uniform semantic framework established and curated


Where is semantic compliance established and validated


Which semantic compliance responsibilities are assumed by each role in the system


Modeling of time series catalogs, and their interaction with service registries


It appears that for catalogs of services and datasets OGC offers mature CSW
-
ISO
specifications, while suggesting custom CSW/
ebRIM

development


see Information
Viewpoint


Access control at the level of time series

© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER

Answering key Enterprise
-
level questions
-

2


Catalogue federation:


Multitude of CSW profiles and implementations make it difficult to federate
catalogs;


Federation use cases need to be defined, demonstrating advantages of catalog
federation for hydrologic observation service catalogs and for catalogs that
index resources other than hydrologic data services


Search patterns:


How OGC query interfaces express distributed discovery of hydrologic data,
using semantic, spatial and temporal filters for services and time series


What are pros and cons of different architecture patterns enabling data
discovery and retrieval


Operational
governance
model:


What are the key roles in the operational system, and how do they change with
the transition to the standards
-
based model:


With respect to governing: encodings and service signatures; community vocabularies and
ontologies; mappings between semantic concepts and domain of values; catalog
curation
;
validation and system monitoring


Defining best practices for publishing hydrologic data and catalogs, enabling
applications /implementations clearinghouse

© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER

INFORMATION VIEWPOINT

Information

sources
and models: What
is it about?

© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER

Information Models and Encodings


For hydrologic observations:


WaterML

1.x/ODM (in the original CUAHSI HIS)


transitioning to



WaterML

2.0 (upcoming OGC standard)


WQX (EPA): eventually may serve as a model for an O&M profile for
analytical chemistry/water quality sampling


Other extensions: a lightweight profile of WaterML2; encoding of rating
curves; forecasting (and uncertainty)


For hydrologic catalogs:


Time series catalog encoding as defined by CUAHSI HIS (GML simple
features)


ebRIM

v. 3.0 (XML)


ISO 19119 (services), ISO 19115
-
2


Other relevant:


Instruments:
SensorML
; access rules:
GeoXACML
; error/accuracy:
UncertML

© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER

Water Markup Language
1.0


An XML schema used by CUAHSI web services
to communicate time series information in a
standard format
.


Uses Observational Data Model semantics


Result

of the
CUAHSI Hydrologic

Information
S
ystem project. It is a standardized way to
convey
water information

over
web services.

SAN DIEGO SUPERCOMPUTER CENTER

Water Markup Language 2.0


An international effort to communicate the
semantics of water information under the
Hydrology Working Group of the Open
Geospatial Consortium


Create an open, re
-
usable and
useful, standard for the
exchange of hydrological data sets


A
UML

model
attempting to capture the
semantics of hydrologic
information


A XML schema, which uses GML to represent
the
information.

WaterML

2.0

Information is linked

Request
WaterML

2

Request Feature (WFS)

Request Procedure

SAN DIEGO SUPERCOMPUTER CENTER

COMPUTATION VIEWPOINT

Types of services and
protocols: How
does each bit work?


© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER

Services and Protocols


For hydrologic observations:


WaterOneFlow

(in the initial CUAHSI HIS)
-

transitioning to



Sensor Observation Service (SOS1/ SOS2, with Data Availability Extension)


For hydrologic features:


Web
Feature Service (WFS
)


For hydrologic
time series and individual observations at points:


Web Feature Service (WFS)


CSW
-
ebRIM
, CSW
-
OWL


For hydrologic service registry:


CSW
-
ebRIM
,
CSW
-
OWL, CSW
-
ISO


For querying:


Filter Encoding Standard (FES)


For access control:


Shibboleth, Distributed
Access Control System (DACS)


Other to consider:
Sensor
Instance Registry (SIR
), Sensor
Observable
Registry (SOR
)

© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER

Mapping
WaterOneFlow

to OGC

WaterOneFlow

(All
WaterML

1.x)

WFS

WFS

(
DataCart
)


CSW

SOS

Location/Site

GetSites

(0..n)

GetFeature

(0..n)

(returns GML
/
WaterObservat
ionPoint
)

GetFeature

(returns
Simple GML)

GetRecords

(
returns
Service
Metadata

Records)

(HDWG

Best Practice:
use WFS)

GetFeatureOfInterest

(
optional
)

DescribeSensor
(m)

Variable

GetVariables

GetCapabilities

(as keywords)


GetCapabiliti
es

(as
keywords)


Series

GetSiteInfo

GetFeature

(returns
Simple GML)


GetRecords

(
returns
Custom
Record)


GetCapabilities

/offering


Get Data

Availability
(optional extension)

DataValues
/
Observations/
Results

GetValues

(records
contain
Pointer)

(records
contain
Pointer)

GetObservation

(returns
WaterML

2)

(records contain
pointer to Feature)

SAN DIEGO SUPERCOMPUTER CENTER

ENGINEERING VIEWPOINT

Solution
types, distribution infrastructure: How
do the
components work together?

© 2011 Open Geospatial Consortium, Inc.

© 2011 Open Geospatial Consortium, Inc.

Possible Architecture

A large observation data
repository case:

a CSW for the data repository is
registered in the federated
catalog system

Main
Catalog

Distributed
Catalog

CSW
-
GetRecords

Persisted
Datasource

CSW
-
GetRecords

CSW
-
GetRecords

CSW
-
GetRecords

WFS Filters

SOS Filters

WCS Filters

CSW
-
Harvest

GetCapabalties

WFS

WCS

SOS

WMS

FTP/HTTP

Metadata

Coverage

Map

Access

Data

Clients

SAN DIEGO SUPERCOMPUTER CENTER

CATALOGING AND DISCOVERY

SAN DIEGO SUPERCOMPUTER CENTER

Catalog technologies (1)


Step 1: Study of distributed search
vs

harvesting metadata into a
central catalog

© 2011
Open

Geospatial Consortium, Inc.

Goal: make joining the system easy by relying on COTS catalogs

Solution in CUAHSI HIS: use a combination of distributed search and harvesting, to optimize performance while
maintaining autonomous catalog nodes. For example, large agency repositories would expose catalog services
and vocabulary services to enable catalog federation and distributed search; for smaller academic services
harvesting (along with versioning, provenance management and centralized
curation

and monitoring) is a better
strategy

SAN DIEGO SUPERCOMPUTER CENTER

Catalog technologies (2)

© 2011 Open Geospatial Consortium, Inc.

Gi
-
CAT

HydroPortal
SDSC

HIS Central WFS

HydroPortal

UTA
CSW service

HydroPortal


UTA

ESRI WFS

HydroPortal

CUAHSI

Multiple
HydroPortal

CSW
servers

Harvested
THREDDS

(
Gi
-
CAT)

Motherlode

NAM

Models

Solution in CUAHSI HIS: hierarchical organization of federated catalogs to support indexing both time series and
grid data services, taking advantage of different capabilities of CSW implementations. CUAHSI Central Office
maintains both
Gi
-
CAT and ESRI
HydroPortal

CSW nodes, to federate catalogs from multiple organizations. The
SDSC
HydroPortal

federates time series service catalogs and indexes academic WFS time series metadata
services

SAN DIEGO SUPERCOMPUTER CENTER

Discovery Patterns


Step 1: define different discovery use cases within the scope of time
series analysis, and assemble supporting technologies (catalogs, filter
encoding, query services and user interfaces)


Step 2: define new discovery patterns given the hierarchy of catalogs


Step 3: implement new discovery patterns in software

Solution in CUAHSI HIS: filtering by location (Where), time (When), attribute (What), provider (Who)

3
-
step data discovery and access pattern:


Initial
semantics
-
based and location
-
based discovery over integrated catalog at the CUAHSI Central Office,
which aggregates service registries and semantic search from registered CSW catalogs (faceted search on the
UI; ontology
-
based search)


More detailed time series discovery over the appropriate WFS services only (either directly at sources, or as
harvested and curated at the central time series catalog at SDSC)


Data access and retrieval once time series are found

Where

What

When

Who (services)

Goal: make data discovery efficient over distributed catalogs

© 2011 Open Geospatial Consortium, Inc.

SAN DIEGO SUPERCOMPUTER CENTER


SAN DIEGO SUPERCOMPUTER CENTER



Services Monitoring
a
nd Reporting

A distributed system with distributed responsibility
requires monitoring of the services


Monitors


Monitoring Service


R
-
U
-
On.com


Machine Monitors


Web Site Monitors


Custom Monitors


HIS Central Services


WaterOneFlow

Services


Reliability will be calculated


Reporting


Logged requests are analyzed via a Microsoft SQL Server
Reporting Server


Beginning to test Google Analytics

SAN DIEGO SUPERCOMPUTER CENTER

R
-
U
-
On

SAN DIEGO SUPERCOMPUTER CENTER

Reporting Service

SAN DIEGO SUPERCOMPUTER CENTER

SAN DIEGO SUPERCOMPUTER CENTER

Summary


Water Information Services Concept Development Study
Report provides an outline for moving the HIS
infrastructure to OGC Standards


HIS based on OGC standards will be a set of services


Discovery via CSW and WFS


Data Retrieval via SOS


Mapping via WMS/WFS


WaterML

2 is a profile of OGC Observations and
Measures


Services Monitoring and Reliability will be a component
of future systems

SAN DIEGO SUPERCOMPUTER CENTER

Information Scope

Observation style

Description

In
-
situ, fixed observation
style

Generally temporally dense, spatially sparse, small
number of observed phenomena. Examples: river level or
stage, river discharge, storage level, rainfall etc.

Ex
-
situ, complex
processing observations

Temporally sparse, spatially sparse, many observed
phenomena. Examples: nutrients (nitrate, phosphorus
etc.), pesticides (
atrazine
,
glyphosate

etc.),
biologicals
,
pH, turbidity etc.

Complex data products

These consist of processed or synthesized observational
data, mainly created to provide estimation of not directly
measurable phenomena or predictions of future values.
Examples: outputs from models or algorithms, water
storage estimates, calculation of complex physics
-
chemistry, biological indices (French : IBGN,
German/Austrian :
Saprobic

Indice
, ...).

SAN DIEGO SUPERCOMPUTER CENTER

Harmonzing Hydrologic Data Semantics


Harmonize


Time series structures (results);


General metadata for the procedure used in measurement;


Minimal metadata data for spatial features (descriptions of stations)
and guidance on linking to external descriptions;


Techniques for linking to definitions of observed phenomenon


First step to defining a common information
model


Paper to be released in April

SAN DIEGO SUPERCOMPUTER CENTER

Tradeoffs


Soft
-
typing vs. hard
-
typing







Data Exchange
vs

Archival


Standalone vs. Reference to information

© 2009 Open Geospatial
Consortium, Inc.

44

SAN DIEGO SUPERCOMPUTER CENTER

Features


Defining a large set of
hydrologic features is
out of scope


Define a set of in
-
situ
monitoring points:


Monitoring station
descriptions


Requirements for
metadata:


Existing standards


GRDC profile


Domain users

SAN DIEGO SUPERCOMPUTER CENTER

Time Series Encoding


A conceptual time series model, with:



A GML
-
style coverage encoding



A SWE Common encoding


Others (as determined by Interoperability Experiments)

SAN DIEGO SUPERCOMPUTER CENTER

Procedure Descriptions


Define a basic process structure with metadata



Provide linkage mechanisms for full descriptions

Example river flow observation in UML


48

A
doppler

flow meter was used to make a flow measurement of the Macquarie river at the Macquarie at
Trefusis

station.

The result of this observation was a time series. The Macquarie River is in the South
Esk

Basin


GML note: Nearly Everything is a
Feature Type

SAN DIEGO SUPERCOMPUTER CENTER

WaterML 2

© 2009 Open Geospatial
Consortium, Inc.

49

SAN DIEGO SUPERCOMPUTER CENTER

Interim Summary


We have a group working on an international
standard for communicating water information


This is going to be a long process


HIS provided an entire stack:


We need to talk about data discovery, and delivery, aka
s
ervices


Testing
WaterML

and Services


Interoperability Experiments


SAN DIEGO SUPERCOMPUTER CENTER

Original Water Web
Service


Get Sites


Get Site Info (Series)


Get Variables


Get Values

SAN DIEGO SUPERCOMPUTER CENTER

Interoperability Experiments



Groundwater (Dec 2009 to Dec 2010)


Surface Water (June 2010 to ?)


Water Quality (TBD)


Modeling (TBD)

SAN DIEGO SUPERCOMPUTER CENTER

Groundwater Interoperability
Experiment


Plotting of Water Well levels across the US/Canada
Border


Uses:



GroundwaterML

as the description for the Well “features”


Sensors Web Enablement Common for results


Lessons so far


Sensor Observation Service


1 million wells features breaks



works with some hacks


Features


What do you link the data with.


A well,


a well depth,


screen at a depth in a well


Demo
link

SAN DIEGO SUPERCOMPUTER CENTER

Surface Water Interoperability
Experiment


Test the transmission of surface water


Three use cases


Cross Border


Visualize information across border


Forecasting


Find
streamflow

forecast data and download


Global Runoff (WMO Global Runoff Data Center)


Provide automated runoff volume data for large rivers

SAN DIEGO SUPERCOMPUTER CENTER

Participation


Do you have a use case that you think should be
covered by a Water Information Standard?



In order to become a participant in this IE, an
organization must be willing make a resource
commitment and a substantial contribution in one or
more of the following areas:


An OGC web service component (SOS, WFS, WMS) for
surface water data;


a web client that makes use of service components, OR


testing of the Services/Clients, OR


compilation of documentation into one or more of the
Interoperability Experiment deliverables (note that all
participants must also provide sub
-
reports for inclusion in the
final reports)


SAN DIEGO SUPERCOMPUTER CENTER

Summary


Transitioning to a vetted open standard will be a
long process


WaterML

2 and how water information is
delivered will be tested through a series of
interoperability experiments


A hydrologic information system will be a set of
services, not all of which can be defined using
presently defined open
gis

standards


We will need to some roll our own


We can use these standards to deliver data
, now

SAN DIEGO SUPERCOMPUTER CENTER

Hydrology Domain Working Group


Google: Hydrology Domain Working Group


Website:


http://external.opengeospatial.org/twiki_public/bin/view/Hydrolo
gyDWG/WebHome


Interoperability Experiments


Hydro.dwg

mailing list


Hydro.dwg@lists.opengeospatial.org


https://lists.opengeospatial.org/mailman/listinfo/hydro.dwg


WaterML

2.0 Development


Presently limited to participants and observers


Contact us to become an observer


April 2010: Finalized Harmonization Report


June 2010: Overview, Use Cases and Examples