CDM_Caron - Capita

tunisianbromidrosisInternet and Web Development

Feb 5, 2013 (4 years and 5 months ago)

211 views

Unidata’s

Common Data Model

and the

THREDDS Data Server

John Caron

Unidata/UCAR, Boulder CO

Jan 6, 2006

ESIP Winter 2006

Outline


Definitions


Creating a Common Data (Access) Model
from NetCDF, HDF5, OPeNDAP


CDM Coordinate Systems, Data Types


CDM implementation


NetCDF Markup Language (NcML)


The THREDDS Data Server


NetCDF
-
3


Machine and OS independent file format
for “self
-
describing” scientific data


C library (Fortran, C++, Perl, IDL, MatLab,
Python, Ruby), Java library


Efficient subsetting of multidimensional
arrays.


> 20,000 downloads last year

HDF5


Machine and OS independent file format
for “self
-
describing” scientific data


C library (Fortran, Java, PyTables)


Evolution from HDF4, but different.


HDF
-
EOS, HDF5
-
EOS, standard formats
for EOSDIS, ASCI, NPOESS


Parallel
-
IO, chunked storage, compression
filters, many data types.


Developed at NCSA, now independent

NetCDF
-
4


Project funded by NASA to create new
version of netCDF using the HDF5 file
format.


“Extend and merge” netCDF and HDF5



Widespread use and simplicity of netCDF


Generality and performance of HDF5

NetCDF
-
Java 2.2 (nj22)


100% Java library


Prototype implementation of CDM


File formats:


General:
NetCDF
,
HDF5, OPeNDAP


Grids:
GRIB1
,
GRIB2


Radar:
NEXRAD, NIDS, DORADE


Satellite:
DMSP
,
GINI


Access to THREDDS catalogs

OPeNDAP


Client
-
server protocol for scientific data
access


C++ client and server, Java client and
server libraries.


Current version 2.0; NASA ESE standard


Working on new 4.0 protocol spec

THREDDS


Originally funded by NSDL


“discovery and use of scientific data”


Middleware between data providers and users


Dataset Inventory Catalogs (XML)


Now part of Unidata core funding


Data Serving (pull)




What’s a Data Model?


Its about scientific data: storing, accessing


It’s an abstraction


Equivalent to an abstract object model in
OOP


An
Abstract Data Model

describes data
objects and what methods you can use on
them



What’s a Data Model?


An
API

is the interface to the Data Model for a
specific programming language


A
file format

is a way to persist the objects in
the Data Model.


A
data access protocol

plays the role of a file
format.


The
Abstract Data Model

removes the details
of any particular API and the persistence format.

Creating a

Common Data Access Model
from NetCDF, HDF5, OPeNDAP

NetCDF
-
3

Data

Model

OPeNDAP

Data

Model

(DAP
-
2)

HDF5

Data

Model

Common
Data

(Access)

Model


Coordinate Systems

and Scientific Data Types


Coordinate Systems

Common Data Model Layers

Data Access

Scientific Datatypes




Grid

Point

Radial

Trajectory

Swath

Station

Coordinate Systems needed


NetCDF, OPeNDAP, HDF data models do
not have integrated coordinate systems



so georeferencing not part of API


Need
conventions

to specify (eg CF
-
1,
COARDS, etc)


Contrast GRIB, HDF
-
EOS, other
specialized formats


Must be done in a general way



Same underlying
mathematics as
VisAD, ASCII

Coordinate Systems

Scientific DataTypes


Based on datasets Unidata is familiar with


APIs are evolving


How are data points connected?


Intended to scale to large, multifile
collections


Intended to support “specialized queries”


Space, Time


Corresponding “standard” NetCDF file
conventions


Point Observation Data

PointObsDataset Methods

// Collection of StructureData

Collection getData(


LatLonRect boundingBox,


Date start, Date end);


Trajectory Data

TrajectoryObs Methods

int getNumPoints();

StructureData getData(int point);

Station Data

StationObs Methods

// return List of Station

List getStations();


// return List of StructureData

List getData(


Station s,


Date start, Date end);

Radial Data

Radial methods

interface Radial {


int getNumGates();


float getData(int gate);



float getStartingGate();


float getGateSize();


float getElevation();


float getAzimuth();


double getTime();

}


Gridded Data

Grid methods

interface GridCoordSys {


CoordinateAxis getTaxis();


CoordinateAxis getXaxis();


CoordinateAxis getYaxis();


CoordinateAxis getZaxis();


Projection getProjection();

}

Array getDataCube(Range time, Range
z, Range y, Range x);

Image/Swath

Standardizing NetCDF Formats


Grid: CF
-
1 Convention


Need improvements for regional models
(WRF), GIS info


Radar: “Radar Exchange Format”


With radar community (led by NCAR ATD)


Point Observations


Unidata Observation Dataset Conventions

CDM implementations:

NetCDF
-
4 and NetCDF
-
Java 2.2


34

NetCDF
-
4 C Library

HDF5 Library

netCDF
-
4 Library

netCDF
-
3

Interface

NetCDF
-
4 C Library

NetCDF
-
4 Status


4.0 Beta implements CDM access layer


complete, but waiting for HDF5 release 1.8 to
finalize file format


4.1: adding Coordinate Systems


4.?: merge OPeNDAP access (pending
funding)

NetCDF
-
Java 2.2 (nj22)


Prototype implementation of CDM


File formats:


General:
NetCDF
,
HDF5, OPeNDAP


Grids:
GRIB1
,
GRIB2


Radar:
NEXRAD, NIDS, DORADE


Satellite:
DMSP
,
GINI


Access to THREDDS catalogs


Implements NcML


Coordinate Systems

Common Data Model

Data Access

Scientific Datatypes




Grid

Point

Radial

Trajectory

Swath

Station

NetcdfDataset

Application

Scientific Datatypes

NetCDF
-
Java
version 2.2
architecture

OPeNDAP

THREDDS

Catalog.xml

NetCDF
-
3

HDF5

I/O service provider

GRIB

GINI

NIDS

NetcdfFile

NetCDF
-
4



Nexrad

DSMP

CoordSystem Builder

Datatype Adapter

ADDE

NetCDF
-
Java 2.2 Status


Data Access layer: Beta quality


also waiting for HDF5 release to finish
NetCDF
-
4, commit to API


Coordinate Systems: early Beta


Finishing docs, runtime plugability


Data Types: Alpha, still experimenting with
APIs

NetCDF Markup Language (NcML)


XML representation of netCDF metadata
(like ncdump
-
h)


Create new netCDF files (like ncgen)


Modify existing datasets


Add/delete/rename


Create logical sections of existing variables.


Create unions and aggregations of
multiple existing datasets.


<?xml version="1.0" encoding="UTF
-
8"?>


<netcdf
xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml
-
2.2"



location=“/data/nids/N0R_20041119_2147">



<attribute name=“DataType" value=“Radar" />


<remove type=“attribute” name=“password" />




<variable name="Reflectivity" orgName=“R34768”>


<attribute name="units" value=“dBZ" />


</variable>


</netcdf>

NcML example

NcML Aggregation


Union


Join Existing


Join New


Forecast Model Run

+

+

=

+

=

NcML Aggregation Example

<netcdf
xmlns=“http://www.unidata.ucar.edu/schemas/netcdf/ncml
-
2.2”>




<aggregation dimName="time" type="joinNew">


<variableAgg name="Temperature"/>


<variableAgg name="Pressure"/>


<scan location=“C:/data/goes/" suffix=".gini"/>


</aggregation>



</netcdf>

THREDDS Data Server


Integrates data access with THREDDS
catalogs and services


Tomcat/Servlet, 100% Java, single war file


Data input is netCDF Java 2.2 library


Data output:


OPeNDAP


HTTP Server


OGC Web Coverage Server (gridded)


HTTP Tomcat Server









THREDDS Data Server

Datasets

Catalog.xml

hostname.edu

THREDDS Server


Application

NetCDF
-
Java

library

IDD Data


OPeNDAP


HTTPServer


WCS


HTTP Tomcat Server









TDS as WCS Gateway

Catalog.xml

hostname.edu

THREDDS Server


Application

NetCDF
-
Java

library


OPeNDAP


HTTPServer


WCS

OPeNDAP Server

anotherHost.org


HTTP Tomcat Server









TDS and NcML

Catalog.xml



hostname.edu

THREDDS Server

Application

Netcdf
-
Java


OPeNDAP

Datasets

NcML


WCS

TDS and NcML


Server serves the dataset “wrapped” by
the NcML


Client sees OPeNDAP or WCS, not NcML


Can “fix” metadata problems


Can augment metadata


Use NcML aggregation on the TDS


replaces the old “Aggregation Server”


HTTP Tomcat Server








TDS and Digital Libraries

Datasets

Catalog.xml

otherhost.gov

THREDDS Server


Application

NetCDF
-
Java

library


OPeNDAP


HTTPServer


WCS

OPeNDAP Server

hostname.edu

OAI Harvester

DL Records

TDS and Digital Libraries


Framework to add metadata


By hand (collection level)


Automatic extraction from datasets


Send records to existing DLs


No search


Both collection and inventory level

Future Plans


NetCDF
-
Java


Get API’s stable, docs, runtime plugability


NetCDF
-
4 (!)


HDF4, HDF
-
EOS, BUFR (need funding)


NetCDF
-
4 C Library


DataTypes too immature to port


NcML?


Java on the server

TDS Future Plans


Aggregation


Driven by IDD data (motherlode)


Pluggable Authorization


access control by dataset


Performance


Services


Coordinate System Verifier (eg CF
-
1)


Data access


Subset and get netcdf file


File Format

#N

File Format

#2

File Format

#1

CDM

Visualization

&Analysis

Conclusion

N + M instead of N * M things on your TODO List!

NetCDF file

OpenDAP Server

WCS Service