CONTROLLED VOCABULARY WORKSHOP

architectgroundhogInternet and Web Development

Dec 4, 2013 (3 years and 6 months ago)

54 views

CONTROLLED VOCABULARY WORKSHOP
MARCH 26
-
27, 2011

OBJECTIVES


Finalize
VOCAB “Terms of Reference”

´
Define
use cases for the keyword database and
its development

´
Develop
procedures for capturing and
managing keyword
taxonomies

´
DONE: Identify
suitable existing database
structures or software for managing the
controlled vocabulary and adopt or modify
them to meet the use
cases



AGENDA

Saturday March
26



7
-
8 AM

Breakfast at the SEV

8
-
8:30 AM

Welcome, Review of Agenda, progress report

8:30
-

9 AM

Setup Use Case Working
Groups

9
-
11 AM

Use Case Working Groups

11
-
Noon

Report back from working groups, VTC with other members for input

Noon


1:30 PM

Lunch

1:30
-
2:30 PM

Review of Controlled Vocabulary Terms of Reference


List
Management

2:30
-
3:30 PM

Work on using
TemaTres

software, including web services


work on
taxonomies

3:30
-
5:00 PM

Work on some draft taxonomies

6 PM

Depart for Dinner in
Sicorro

AGENDA

Sunday March 27



7
-
8 AM

Breakfast

8
-
9 AM

Planning for workshop at SC with domain researchers

9
-
11 AM

Work on Use Cases with focus on implementation steps

11
-
Noon

Report on use cases, VTC with other members for input

Noon
-

1:30 PM

Lunch

1:30
-
3 PM

Write
-
up Use case scenarios, work on draft
taxonomys

3
-
3:30 PM

Wrapup

4 PM

Depart for ABQ hotels and airport

STATUS


Tematres

web
-
based thesaurus tool installed


Taxonomys

implemented


Habitats/Ecosystems


Substances


Processes


Organisms


Terms classified (things, materials,
activities/processes, properties, etc.)


A few terms recommended for removal


STATUS


416 terms are part of the
polytaxonomy


Includes some new higher
-
level terms


264 terms remain to be linked


Synonyms are listed, but not yet added


A production server has been established for
the controlled vocabulary


Ability to create instances for individual sites


Eda

has worked a lot on import/export issues


USE CASE WORKING GROUPS


Straw Man List


Vocabulary
use for searching and
browsing


Eda
, Don,
Corrina



Putting
the vocabulary into LTER
documents


Kristin,
Margaret, John


List Management


decision processes


Focus first on WHO DOES WHAT (
not

how)


May be a diagram/flow chart showing actors, actions and
results


Once the first step is accomplished then consider:


How it might be accomplished technically


What resources would be required


Who should be responsible for the implementation


WORKING GROUP NOTES

PUTTING WORDS IN DOCUMENTS


JP’s use case


Draft EML document


Use Duane’s HIVE tool to suggest probable words


check off ones you want,


returns


EML snippet to screen, for cut and paste into doc.


Or Revised EML document with keywords added


Or XML document with keywords (in
keywordset

node,
including thesaurus) to be used with web service client
(allowing additions to relation databases etc.)


KRISTIN’S USE CASE


Populate Drupal web site with
polytaxonomy


Within Drupal Metadata Editor
-

Browse


drop
down list of levels, or search to find terms


Select term you want and it is automatically
added to backend database that is used by the
module that creates EML

MARGARET’S USE CASE


Browse or search keywords and check off
desired terms


As things are checked off, generates internal
list that is archived at a particular URL


Web service provides XML snippet that can go
into EML

USING FOR NON
-
DATASETS


E.g., publications, projects etc.


May not have EML representations


Browse or search to locate potential terms


Return


Simple list for inclusion (cut and paste) into
publications etc.


EML snippet as part of an XML document for use with a
web service client to interface with desired systems


Note: this could also use HIVE search tool instead
of raw browse

BEST PRACTICES


Need a best practices guide that addresses use of
the controlled vocabulary


Goal


assure that LTER data is discoverable


Examples:


Use the most specific terms you can


Specify how many or what categories of terms should
be included where applicable
-

examples


Specifing

a desirable number of terms


E.g., At least one term from at least X of the LTER
taxonomys


Should have at least one core area


RATING DOCUMENTS


Run document through congruency checker


It says how many keywords and
taxonomys

are
represented in an EML document


Allows checking for conformance with best
practices

WORKING GROUP
-

MANAGING VOCABULARY


Principles


want to hit “sweet spot” for number of keywords


Enough to make reasonable search and browsing possible


Not so specific that only data from a particular site or dataset would be returned
from a search


Could be words used widely at a single site


Want to avoid words that are too esoteric


The list should be modified periodically to capture additional words as
they become widely used in the network


Each site should be able to propose new preferred terms, in suitable
forms that are widely used in datasets from the site. A proposal should
include justification, including information on related terms used at
other LTER sites and where the term might be placed into the
taxonomies


Sites can propose also non
-
preferred terms linked to existing preferred
terms


Sites should be able to maintain independent, site
-
specific controlled
vocabularies


CRITERIA FOR ACCEPTING OR REJECTING
PROPOSED PREFERRED
TERMS


The proposed terms should be suitable for
inclusion (e.g., not locations or specific
taxonomic identifiers)


Proposed terms should not be redundant with
existing term(s) already in the vocabulary


Terms and their proposed places in
taxonomys

should conform in form with NISO Z39.19 2005
and successor documents (e.g
.,
sections 6.5.1
,
8.3
)

CRITERIA FOR ACCEPTANCE OF PROPOSED
NON
-
PREFERRED TERMS


The proposed terms should be suitable for
inclusion (e.g., not locations or specific
taxonomic identifiers
)


The proposed terms must be sufficiently close
synonyms to the preferred term to which they
will be linked

CRITERIA FOR REMOVING OR ALTERING
PREFERRED TERMS


Terms will never be altered, but they can be
demoted to non
-
preferred status


Terms can only be removed if they are not
currently in use by datasets


Removals or alterations of terms are expected
to be rare

CHANGING LOCATION OF TERMS IN
TAXONOMIES OR THESAURI


These have large subjective elements. Other
resources should be frequently consulted when
making changes


Sites or individuals can propose and justify
changes that will be evaluated relative to NISO
Z39.19


PROCESS


VOCAB committee may do research to identify terms
that should be added based on use in site
-
specific
vocabularies, use in datasets and other sources of
information.


VOCAB committee receives and evaluates proposed
changes


Based on criteria make changes to development version of
the controlled vocabulary database


The Controlled Vocabulary may make immediate
changes in the current official version to correct gross
errors


New versions will be issued by VOCAB from time
-
to
-
time,
and a request for endorsement will be forwarded to
IMEXEC

SCIENCE COUNCIL WORKSHOP


Objective


Engage SC members


sell on idea


develop some advocates


Process followed: Objectives
-

Rules for
taxonomys



Get guidance on specific issues


The Controlled vocabulary


Need for related terms?


Are there things missing?


core areas?


Are there things that should be removed?


Are there things that are out of place?


Specifc

areas of concern


Use Cases


Feedback on proposed uses


Priorities for getting implemented



Tasks before workshop


Add definitions for all words to the taxonomy


Prioritize ones that are difficult


Get way to display entire vocab.


Improve diagram for content


Send SC members link to
Tematres



have them do test searches


AGENDA


Introduction


1 hour


Around the room introductions


why we need controlled vocabulary


steps taken so far


Background


procedures for creating controlled
vocabularies


Meeting objectives


How to use Controlled Vocabulary


1 hour


Question for SC members


What are your experiences with finding LTER data


What would most help you find data in the future?


Discussion of data discovery use cases


SC AGENDA


Tour of Controlled Vocabulary


1 hour


General Introduction


Breakout groups (pair SC member with IM) to look at areas
of specific
interest


Feedback to entire group on things in the controlled
vocabulary that need improvement


1 hour


Discussion of specific issues


Core areas as top level hierarchy


now integrated elsewhere


Management of the vocabulary


role of researchers


Discussion of next steps


How do we engage larger LTER community?


How much, and what sort of engagement is needed