EMODNET Chemistry 2

hurriedtinkleAI and Robotics

Nov 15, 2013 (3 years and 9 months ago)


Semantic Suggestions

Roy Lowry and Adam Leadbetter

British Oceanographic Data Centre

Semantic Issues

Parameter semantic issues encountered
during the pilot

Naming of the aggregated products

Inability to aggregate across multiple P01 codes

Difficulty mapping local parameter vocabularies to

P01 scalability issues

Inability to discover a specified contaminant

Aggregation Naming


During the pilot a lot of (circular) e
mail traffic
concerned the labelling of aggregated parameters


Naming needs to be governed

Governance decisions need to be implemented as
a controlled vocabulary

P01 Aggregation Issues


Aggregation tools create an aggregated parameter
for every P01 code in the source dataset

Different P01 codes used for parameters that are
not significantly different (or even not different at

Fixes for this (retagging source data or merging
channels in the aggregation tool) is both labour
intensive and error prone

P01 Aggregation Issues


Define each aggregation as a set of P01 codes

Store and serve resultant mapping in the NERC
Vocabulary Server

Update aggregation tools to access mapping and
use it to dynamically merge channels with
different P01 codes

P01 Mapping Difficulties


There’s a lot (>28000) of codes in P01

Finding the code needed for a given local
parameter vocabulary term seems to cause a lot
of difficulty

Text generated from a semantic model isn’t always
intuitive (e.g. [dissolved plus reactive particulate
phase] = ‘unfiltered’)

P01 Mapping Difficulties


Mapping based the semantic model (matrix,
, gender, organ) rather than the
preferred label text

Improvements to the search algorithm in the
client (e.g. Addition of ‘excluding’ clause)

Exposure of P01 subsets through NVS2 concept
schemes (thesauri)

Training in how to map

P01 Scalability Issues


Many contaminants in many different biological
entities = a number of P01 codes that is predicted
to be unmanageable

Solution (not favoured)

Redesign formats to use discrete semantic model
not P01 code

Different formats for different data types

Moves complexity from semantic domain into the data

P01 Scalability Issues

Solution (preferred)

Retain P01 as a register of semantic element

Automate concept registration (part of a semantic
based mapping tool perhaps)

Use NVS V2 concept schemes to expose P01
subsets to make navigation easier

Contaminant Discovery Issues


Parameter discovery (CDI interface) is based on

P02 groups contaminants with variable granularity

Good for PCBs

Not so good for ‘other organic contaminants’

A search for datasets with cadmium in


flesh isn’t possible

The nearest is metals in biota, which will give
many unwanted hits

Contaminant Discovery Issues

Possible Solution

Mine the P01 codes in the SeaDataNet file stock
into the CDI

Use these for drill
down parameter discovery in
the CDI search engine

Taking This Forward

Some of the solutions presented are ODIP
pilot candidates

Specifications of these are currently vague

Not absolutely clear who should be doing
what and when

Meeting (Liverpool or London if easier) to
develop the specifications and an
implementation roadmap