Metadata for Digital Archives

hurriedtinkleΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

56 εμφανίσεις

Metadata for Digital Archives

Philip Jones



STG, Inc.

National Climatic Data Center, Archive Branch



NCDC Metadata Workshop

Asheville, NC

27 October 2011

Presentation Outline

Metadata Foundation


OAIS
-
RM Framework


Archive Procedures


Metadata Approach


Metadata for Collections


Standard Metadata


Creation and Management


Status


Workshop Questions


A
reference model
:


is based on a small number of
unifying
concepts


is an
abstraction
of the key concepts,
their relationships, and their interfaces
both to each other and to the external
environment


may be used as a
basis for education
and explaining standards

to a non
-
specialist

Open Archival Information System Reference Model
(OAIS
-
RM)


Focuses on
long
-
term preservation of independently understandable
information

for the Designated Community (
Meaning, the community
should be able to understand the information without needing the
assistance of the experts who produced the information.
)



Addresses a full range of
archival preservation functions

(including
ingest, archival storage, data management and access)



Applicable to
any archive



NOAA National Data Centers

follow the archiving recommendations set
forth in the OAIS
-
RM



Defines a
minimal set of responsibilities

that an organization must
discharge in order to operate an OAIS archive



Open Archival Information System Reference Model
(OAIS
-
RM)

1.
Negotiate for and accept appropriate information from information Producers.

2.
Obtain sufficient control of the information provided to the level needed to
ensure Long
-
Term Preservation.

3.
Determine, either by itself or in conjunction with other parties, which
communities should become the Designated Community and, therefore,
should be able to understand the information provided.

4.
Ensure that the information to be preserved is Independently Understandable

to the Designated Community. In other words, the community should be able
to understand the information without needing the assistance of the experts
who produced the information.

5.
Follow documented policies and procedures which ensure that the information
is preserved against all reasonable contingencies, and which enable the
information to be disseminated as authenticated copies of the original, or as
traceable to the original.

6.
Make the preserved information available to the Designated Community.


OAIS Mandatory Responsibilities

1.
Negotiate for and accept appropriate information from information Producers.

2.
Obtain sufficient control of the information provided to the level needed to
ensure

Long
-
Term Preservation
.

3.
Determine, either by itself or in conjunction with other parties, which
communities should become the

Designated Community

and, therefore,
should be able to

understand

the information

provided.

4.
Ensure that the

information to be

preserved

is

Independently Understandable

to the

Designated Community
. In other words, the community should be able
to
understand the information

without needing the assistance of the experts
who produced the information.

5.
Follow documented policies and procedures which ensure that the
information
is preserved

against all reasonable contingencies, and which enable the
information to be disseminated as authenticated copies of the original, or as
traceable to the original.

6.
Make the
preserved information

available to the

Designated Community
.


*
Central Focus: Long
-
term preservation of independently
understandable information for the Designated Community

OAIS Mandatory Responsibilities

Producer’s knowledge must be preserved with the data (as metadata)

OAIS Functional Entities and Information Flows

YOU ARE HERE

YOU ARE HERE

YOU ARE HERE

YOU ARE HERE

OAIS Information Packages (SIP, AIP & DIP)


An

OAIS

Information

Package

is

a

“conceptual

package”

that

contains

the

Data

object

(or

preservation

target)

and

its

RI

and

PDI



An

Information

Package

is

summarized

and

searched

by

its

Descriptive

Information


Technical

“syntax

and

semantic”

metadata

is

referred

to

as

Representation

Information

(RI)



“Bibliographic”

metadata

is

referred

to

as

Preservation

Description

Information

(PDI)

(it

includes

Provenance
,

Reference
,

Fixity

(integrity),

and

Context
)


Reference
Identification

Representation
Information

Context/Quality

Provenance

Collection/Series

Dataset Title

Data Center ID

Journal Reference

DOI


Format Specification

Auxiliary data

Navigation

Calibration

Science Paper

Validation Study

Related Datasets

Mission

Originator/PI

Contributing Collections

Platforms

Instruments

ATBD

File Objects

File name

UUID


(checksum for
integrity)

File Format

File Structure

Companion File

File content statistics

Parameter thresholds

Percentages

Input files

Processing time

Modified date

Storage and handling

Parameters

Variable name

Standard name

Data Type

Units

Scale

Offset

Time reference

Quality Flag

Quality Indicator

Input source

Update Flag

Granularity

Types of information

-

Additional associations could be made for
Missions, Platforms and Instruments

-

Self
-
describing file formats

like netCDF are effective at documenting file and

parameter
-
level metadata, and are
preferred for long
-
term archives

Preliminary

Phase

Formal Definition

Phase

SIP Validation

Phase

Preliminary

Agreement

Dictionary

Formal Model

Submission Agreement

Anomalies

Validation

Agreement

SIP Transfer

Phase

AIP Creation

Phase of Objectives

Archive Appraisal and Approval (Metadata scoped)

Submission Agreement (Metadata defined)

NOAA Procedure for Scientific Records Appraisal and
Archive Approval

(NOAA "What to Archive“)


A NOAA
-
wide procedure to
identify, appraise
, and
decide

what records are
preserved in a NOAA Facility (for acquisition and disposition)


A Data Center
Appraisal Team

representing all aspects
-

science, development, IT,
engineering, archiving, access and user needs


makes the recommendation




Procedure covers the steps leading to the
Archiving Decision
,
but

implementing the
support is after the formal approval and outside of the Procedure


IMPORTANT:
Data Center involvement during the product planning and design
phase allows for metadata guidance, coordination of schedules, etc…


Cost/Schedule/Value

Wanted Services



Data Center agreement negotiated with the Provider to ensure that the
OAIS needs are covered before the data are transferred



Worked
after the formal Archive Decision
, throughout the product
development phase and up to the implementation of the submission



It
clarifies the Producer

Archive technical relationship
, and ultimately,
the agreement improves the integrity of the archived information



Includes
information on:


-

Provider and Data Center Contacts


-

Dataset descriptions


-

Dataset Access/Use Restrictions


-

Submission Schedule


-

Transfer Protocols


-

Verification Process


-

Error Handling


-

Archive Identifiers


-

Data Center Access


-

Archive Disposition

Submission Agreements

Guidance for Archive Files:


Self
-
describing file formats

(Producer
-
defined formats do not meet most
user community needs, and require additional documentation and costs
for information preservation)


Adherence to
Conventions

(e.g., CF) with appropriate amount of metadata
content for the data


Descriptive File Naming Convention

(negotiated through SA)


Metadata for Collections:


Use community
-
accepted
FGDC CSDGM ERSM
or
ISO 19115
-
2

Standard
Metadata

implemented in XML


Additional kinds of documentation (necessary to complete the
OAIS
Information Package!)

are referenced from the standard FGDC/ISO
metadata. These include Algorithm Descriptions, Studies/Reports, Format
Specifications, Auxiliary Data, Special Metadata Supplements, etc.


Archive Metadata Approach

ID:

gov.noaa.ncdc:C00366

gov.noaa.class:AVHRR

gov.noaa.ncdc:C00846

gov.noaa.ncdc:C00785

Title:

CRN Raw Observations

AVHRR Level 1B

Daily OISST v3

Keyed East India Co. Logs

Originator:

CRN

OSDPD

NCDC

NCDC

Spatial Extent:

-
172.0,
-
66.0, 72.0, 18.0

-
180.0, 180.0, 90.0,
-
90.0

-
180.0, 180.0, 90.0,
-
90.0

-
180.0, 180.0, 70.0,
-
50.0

Temporal Extent:

2001
-
10
-
01

to Present

1978
-
11
-
05 to Present

2001
-
11
-
01 to Present

1789 to 1834

Theme Keywords:

AIR TEMPERATURE, …

VISIBLE RADIANCE, …

SEA SURFACE TEMP, …

WINDS, SEA STATE, …

Platforms:

CRN Stations

NOAA
-
15, NOAA
-
16, …

NOAA
-
15, NOAA
-
16, …

EIC

Ship
Voyages

Instrument:

RAIN GAUGES, …

AVHRR

AVHRR

WIND VANES, …

Processing Level:

Level 0

NOAA Level 1B

NOAA Level 3

NOAA Level 1

File Format:

Binary

Binary

netCDF

ASCII

Links:

Documentation

Data, Documentation

Data, Documentation, Code

Data, Documentation

Standard Metadata
documents diverse data collections

using common attributes in a standard format



Early in the process,
capture all information items

from the Provider
needed for a complete understanding of the data by
current and future users
.
(
What we are missing in today’s archives gives clues for what we will need in
tomorrow’s archives.
)




Provide
links to resources

with more information. (Resources are described
and identified but the information is not duplicated in the Standard
Metadata).




Use
standard vocabularies

(GCMD Keyword Lists) and
code lists

(ISO).




Maintain a Standard Metadata record
per product type and major version
.




Modify Standard Metadata

descriptions, identifiers and resource links as
needed (feedback encouraged).

Best Practices

Provider / Steward
Access
Archive
Data

Descriptions

References

Contacts

Identifiers

Archive Links

Compliance

Maintenance

Identifiers

Access Links

Requirements

Testing/Usage

Standard Metadata Creation Requires Group Effort

Metadata creation by less than the number of needed
groups results in an incomplete metadata picture

OAIS Functional Entities and Information Flows

Standard Metadata Management

Standard Metadata
should have the same general information flow as
the data it documents, and be maintained with the data in the archive

Standard Metadata Summary


540 metadata records (80 for CLASS)


All are in FGDC


All are able to be converted to ISO using an XSL transform developed by
the NOAA metadata community


~90 are baselined in ISO (80 are CLASS records)


~10% of the 540 are legacy metadata for publications and charts not
representative of archived data collections


700+ dataset archives (DSI #s), so many data collections need Standard
Metadata


Retrospective metadata creation is not easy!


Standard Metadata Repository


NOAA Metadata Manager Repository

(NMMR) is being sunsetted:


Used by NCDC to manage collection Standard Metadata


Provided a repository, validation, and publishing to a Web Accessible Folder (WAF)


NMMR Replacement Status:


Copies obtained on an NCDC server


Standard Metadata Record registration still managed in Archive Branch


Validation using a XSD with local XML editors or other tools


Subversion Repository for maintaining official copy in work


Complete publishing: 1) transitioning from GOS to geo.data.gov; 2) Identifying
requirements and testing for geoportal


WAF locations with published XML files:

CLASS ISO:
http://www1.ncdc.noaa.gov/pub/data/metadata/published/class/iso/xml/


CLASS FGDC:
http://www1.ncdc.noaa.gov/pub/data/metadata/published/class/fgdc/xml/

Archive FGDC:
http://www1.ncdc.noaa.gov/pub/data/metadata/published/archive/fgdc/xml/


Additional Related Steps
:


Reconcile differences between Archive Data Holdings and Standard Metadata


Establish a workflow and review process

Metadata Workshop Questions

Q1
.


Are you a metadata user, producer or both?

A1
. Assist in creation of and responsible for long
-
term management for
Standard Metadata (identify other types for archiving)


Q2
.


What is your biggest challenge as a metadata user and/or producer?

A2
.
Top three
:

-

Obtaining adequate documentation and accurate and current content for
Standard Metadata

-

Recognizing when things change in a data/product which require updates to
metadata and documentation

-

Best practice questions for Standard Metadata creation (depends on users)


Q3
.


If you could have anything to assist you in your metadata needs, what
would you want in 1 year and in 5 years?

A3
.
In 1 year
:

-

Synchronize metadata creation with data production schedules

-

A way to coordinate the maintenance of metadata and documentation across
groups including scientific stewardship teams (by discipline)


In 5 years
:

-

Modifications to Ingest to flag changes in a data stream and to extract file
-
level metadata attributes at Ingest to populate databases for file queries

Questions?



Back Up Slides


NOAA Administrative Order (NAO ) 212
-
15, November 4, 2010, entitled
“Management of Environmental and Geospatial Data and Information.” Available at
http://www.corporateservices.noaa.gov/~ames/NAOs/Chap_212/naos_212_15.html




NOAA Procedure for Scientific Records Appraisal and Archive Approval: Guide for
Data Managers, U.S. DOC / NOAA, September2008. Available at
https://www.ngdc.noaa.gov/wiki/images/0/0b/NOAA_Procedure_document_final.pdf




Producer
-
Archive Interface Methodology Abstract Standard (PAIMAS), CCSDS
651.0
-
B
-
1, May 2004, (ISO 20652:2006). Available at
http://public.ccsds.org/publications/archive/651x0m1.pdf




Reference Model for an Open Archival Information System (OAIS), CCSDS 650.0
-
B
-
1, January 2002, (ISO 14721:2003). Available at
http://public.ccsds.org/publications/archive/650x0b1.pdf




ISO Metadata Resources at the GEO
-
IDE Wiki:
https://geo
-
ide.noaa.gov/wiki/index.php?title=Main_Page



References