The GBIF KOS Work Program: Prioritized Requirements and Proposed Solutions

seaurchininterpreterInternet and Web Development

Dec 7, 2013 (3 years and 8 months ago)

133 views

Dag Endresen (dendresen@gbif.org)

Knowledge Systems Engineer

GBIF


New Orleans (Louisiana, USA)

20 October 2011

Biodiversity Information Standards, TDWG

Annual Meeting 2011, New Orleans

The GBIF KOS Work Program:

Prioritized Requirements and
Proposed Solutions

Outline


Element vocabularies and value
vocabularies


Vocabulary management tools


Vocabularies exchange format (SKOS)


Vocabulary registry (portal)


New data types

Standards

Biodiversity Information Standards (TDWG),
Dublin Core Metadata Initiative (DCMI),
Genomics Standards Consortium (GSC), etc...
provide domain standards. We want to reuse,
map and relate terms across these standards.


Why: Gain understanding across domains

Element vocabulary (glossary)

Darwin Core (DwC), Dublin Core (DCMI), Ecological
Metadata Language (EML), Gene Ontology (GO),
TDWG Ontology, etc... provide definitions for conceptual
terms. We want to reuse, map and relate terms from
basic vocabularies with concept definitions.


Why: reuse terms and share a common definitions
and understanding of biodiversity concepts.

Vocabulary management tools


GBIF Vocabularies


Custom Scratchpad Tool (Drupal)



Semantic Wiki
(SpeciesID, Key to Nature)


Protégé (collaborative Protégé)


SKOSEd plugin, Web
-
Protégé


Top Quadrant EVN (commercial)


Pool Party (commercial)


ThManager (open source)


ISOcat (Clarin, linguistics)


iQvoc (open source)


TemaTres (open source, Spanish)

GBIF Vocabularies

http://vocabularies.gbif.org


Collaborative

development of
community terminology,
including biodiversity
concept definitions and
controlled value lists.

Controlled Vocabularies

The “Vocabularies” are Value Vocabularies
(authority files) of accepted values for terms
where controlled values are already available
-

or
appropriate to develop.

dwc:
basisOfRecord

PreservedSpecimen

FossilSpecimen

LivingSpecimen

HumanObservation

MachineObservation

NomenclaturalChecklist

Occurrence

Taxon

Location

dc:
Type

Collection

Dataset

Event

Image

InteractiveResource

Service

Software

Sound

Text

PhysicalObject

StillImage

MovingImage

gbif:
nomenclatural_code

ICBN

ICZN

ICVCN

ICNB

ICNCP

BioCode

Why: standardize how biodiversity data is
provided when controlled values are appropriate

Controlled Vocabularies

“Extensions” are Element Vocabularies defining
new terms organized as extensions to Core Types
(dwc: Taxon and dwc: Occurrence).



Audubon Core (multimedia/images)


DwC
-
Germplasm (plant genetic resources)


EOL Data Object (species profiles)


GISIN Species Status (invasive species)


…etc

Why: Provide a mechanism for thematic
communities to define their own specific terms.

GBIF Vocabularies

Core types



could be more than DwC: Taxon and DwC: Occurrence


habitat, spatial areas, lines, grid, places, images/multimedia,
literature, people, institutes, collections, collection specimens, etc…?



Extensions


= element/attribute vocabulary, definition of terms


Separate the definition of terminology from application models


Is “extensions” the appropriate label?



Vocabularies


= value vocabulary, authority files


external examples: countries, languages, …


biodiversity domain: taxonRank, basisOfRecord, …

http://vocabularies.gbif.org


GBIF Vocabularies

GBIF Vocabularies is hosted by the Scratchpads server in London



Install the GBIF Vocabulary Service in Copenhagen?



Further developments are needed.



Package the Vocabulary Service as an open
-
source tool?



Develop as Drupal modules, migrate to Drupal 7?


Element vocabularies are not always an “extension” of Darwin Core…?



Add management interface with definitions for new core types?



Rename “Extensions” to “Element
-
” or ”Attribute Vocabularies”?



Rename “Vocabularies” to “Value Vocabularies” or “Authority
files”…?


Export and import of vocabularies to and from other management
systems (SKOS, RDF, OWL as vocabulary exchange format?)



SKOS import and export features to be developed?


Improved Human readable interface



Export to HTML/PDF format for human readable documentation of a
vocabulary?

Vocabulary Registry/Portal


GBIF Vocabulary Registry


Is the present registry sufficient?



GBIF Vocabularies


Develop the Scratchpads solution further
as a vocabulary registry?



NCBO BioPortal alternative


Start using the NCBO BioPortal software

Why: Support the discovery of biodiversity
terminology and standard vocabularies.

GBIF Vocabulary Registry

The official versions of the “vocabularies” and
“extensions” for deployment are available from the
GBIF Registry (
http://rs.gbif.org
). They are used from
here by the GBIF infrastructure such as the IPT and HIT.


Separate service for discovery



different service from
the GBIF Vocabulary site (
management


discovery
).

GBIF Vocabulary Registry


Promote SKOS as the preferred vocabulary
(exchange) format?


Gradually replace XML Schema for defining standards?


Why: Promote ease of vocabulary exchange,
import and export.

http://rs.gbif.org


Simple Knowledge Organization System (SKOS)

GBIF Vocabulary Registry


Add human interface to explore SKOS
documents at the GBIF Registry?


OWLDoc (CO
-
ODE, static HTML)


OWL Ontology Browser (CO
-
ODE, dynamic)


Using the BioPortal Registry

GBIF KOS Task Group:


“GBIF should deploy an instance of the
BioPortal platform for biodiversity ontologies
as a complement to the GBIF Vocabularies
Server.”

Using the BioPortal Registry


Include Biodiversity Vocabularies to the
NCBO BioPortal…?


Will support the mapping of terms to the
major Genomics Vocabularies.



Establish a “GBIF BioPortal” using the
same BioPortal software?


Will focus on Biodiversity Community identity
and relevance.

Workflow

Draft vocabulary

Review version

Published version

Approve?

… and other SKOS compliant
vocabulary management tools.

-
> Uptake by the GBIF infrastructure
including the IPT and the data portal.

“In anticipation of the
integration and serving of
future data types
, GBIF will
work closely with partners to
enable data integration and
interoperability across
phenotypic
,
genomic
,
taxonomic
,
geospatial

and
ecosystem

domains.”

GBIF Strategic Plan 2012
-
2016:


Further activities as part of the Plan will include improving
the Data Portal system and
expanding the depth and range
of data types



“specimen, observation, descriptive, literature, name/concept
, image, character,
OGC, etc”

GBIF Strategic Plan 2007
-
2011:

New Core Types?


DwC: Taxon


DwC: Occurrence




Aububon Core (images/multimedia)


Invasive Species (invasive in region/country)


New Spatial Objects (from point locations to include
polygon, poly
-
line and grid objects)


etc…


Is the general principle on
Extension of Core Types
also
suitable for new data types?

Data types

Hazelnut (
Corylus avellana

L.)

dwc:
MeasurementOrFact

dwc:
Taxon

Laurel (Laurus azorica
(Seub.) Franco)

Almonds (
Prunus

dulcis

(Mill.)
D.A.Webb
) in
Manouba
, Tunisia

dwc:
Occurrence

dwc:
Identification

Bread wheat (
Triticum
aestivum

L.)

dwc:
identificationID

dwc:
dateIdentified

dwc:
identifiedBy

dwc:
taxonID

dwc:
scientificNameID

dwc:
scientificName



dwc:
measurementID

dwc:
measurementValue

dwc:m
easurementUnit

dwc:
measurementDeterminedBy



dwc:
taxonID

dwc:
scientificNameID

dwc:
scientificName

dwc:
taxonConceptID

dwc:
kingdom

dwc:
family

dwc:
genus

dwc:
specificEpithet



dwc:
occurrenceID

dwc:
basisOfRecord

dwc:
eventID

dwc:
eventDate

dwc:
locationID

dwc:
decimalLongitude

dwc:
decimalLatitude

dwc:
taxonID

dwc:
scientificNameID

dwc:
scientificName



dc:
identifier

dc:
bibliographicCitation

dc:
title

dc:
creator

dc:
date

dc:
source

dc:
language

dwc:
taxonRemarks



dc =
http://purl.org/dc/terms/

dwc =
http://rs.tdwg.org/dwc/terms/


gbif =
http://rs.gbif.org/terms/1.0/


gbif:
Reference

etc…

dwc:
vernacularName

dc:
language

dc:
temporal

dwc:
locality



Star schema

Hazelnut (
Corylus avellana

L.)

dwc:
MeasurementOrFact

dwc:
Taxon

Laurel (Laurus azorica
(Seub.) Franco)

dwc:
Identification

Bread wheat (
Triticum
aestivum

L.)

gbif:
Reference

audubon:
Image

dc =
http://purl.org/dc/terms/

dwc =
http://rs.tdwg.org/dwc/terms/


gbif =
http://rs.gbif.org/terms/1.0/


audubon:
http://rs.tdwg.org/ac/terms/


Star schema (??)

Hazelnut (
Corylus avellana

L.)

dwc:
MeasurementOrFact

dwc:
Taxon

Laurel (Laurus azorica
(Seub.) Franco)

dc =
http://purl.org/dc/terms/

dwc =
http://rs.tdwg.org/dwc/terms/


gbif =
http://rs.gbif.org/terms/1.0/


audubon:
http://rs.tdwg.org/ac/terms/


gbif:
Reference

audubon:
Image

Almonds (
Prunus

dulcis

(Mill.)
D.A.Webb
) in
Manouba
, Tunisia

dwc:
Occurrence

Metadata

CollectionObject

Place/location

etc…

Dag Endresen (dendresen@gbif.org)

Knowledge Systems Engineer

GBIF


New Orleans (Louisiana, USA)

20 October 2011

Biodiversity Information Standards, TDWG

Annual Meeting 2011, New Orleans

The GBIF KOS Work Program:

Prioritized Requirements and
Proposed Solutions