EMODNET Chemistry 2

hurriedtinkleAI and Robotics

Nov 15, 2013 (3 years and 9 months ago)

57 views

EMODNET Chemistry 2

Semantic Suggestions

Roy Lowry and Adam Leadbetter

British Oceanographic Data Centre

Semantic Issues


Parameter semantic issues encountered
during the pilot


Naming of the aggregated products


Inability to aggregate across multiple P01 codes


Difficulty mapping local parameter vocabularies to
P01


P01 scalability issues


Inability to discover a specified contaminant

Aggregation Naming


Problem


During the pilot a lot of (circular) e
-
mail traffic
concerned the labelling of aggregated parameters


Solution


Naming needs to be governed


Governance decisions need to be implemented as
a controlled vocabulary

P01 Aggregation Issues


Problem


Aggregation tools create an aggregated parameter
for every P01 code in the source dataset


Different P01 codes used for parameters that are
not significantly different (or even not different at
all)


Fixes for this (retagging source data or merging
channels in the aggregation tool) is both labour
intensive and error prone

P01 Aggregation Issues


Solution


Define each aggregation as a set of P01 codes


Store and serve resultant mapping in the NERC
Vocabulary Server


Update aggregation tools to access mapping and
use it to dynamically merge channels with
different P01 codes

P01 Mapping Difficulties


Problem


There’s a lot (>28000) of codes in P01



Finding the code needed for a given local
parameter vocabulary term seems to cause a lot
of difficulty


Text generated from a semantic model isn’t always
intuitive (e.g. [dissolved plus reactive particulate
phase] = ‘unfiltered’)

P01 Mapping Difficulties


Solutions


Mapping based the semantic model (matrix,
substance,
taxon
, gender, organ) rather than the
preferred label text


Improvements to the search algorithm in the
client (e.g. Addition of ‘excluding’ clause)


Exposure of P01 subsets through NVS2 concept
schemes (thesauri)


Training in how to map

P01 Scalability Issues


Problem


Many contaminants in many different biological
entities = a number of P01 codes that is predicted
to be unmanageable


Solution (not favoured)


Redesign formats to use discrete semantic model
not P01 code


Different formats for different data types


Moves complexity from semantic domain into the data
files


P01 Scalability Issues


Solution (preferred)


Retain P01 as a register of semantic element
combinations


Automate concept registration (part of a semantic
model
-
based mapping tool perhaps)


Use NVS V2 concept schemes to expose P01
subsets to make navigation easier

Contaminant Discovery Issues


Problem


Parameter discovery (CDI interface) is based on
P02


P02 groups contaminants with variable granularity


Good for PCBs


Not so good for ‘other organic contaminants’


A search for datasets with cadmium in
Mytilus

edulis

flesh isn’t possible


The nearest is metals in biota, which will give
many unwanted hits

Contaminant Discovery Issues


Possible Solution


Mine the P01 codes in the SeaDataNet file stock
into the CDI
metadatabase


Use these for drill
-
down parameter discovery in
the CDI search engine

Taking This Forward


Some of the solutions presented are ODIP
pilot candidates


Specifications of these are currently vague


Not absolutely clear who should be doing
what and when


Meeting (Liverpool or London if easier) to
develop the specifications and an
implementation roadmap