Metadata Activities in Biology

jumentousmanlyInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

113 εμφανίσεις

Metadata Activities in Biology


LTER Network Office, Department of Biology, University of New Mexico, Albuquerque, New Mexico, USA


National Biological Information Infrastructure
USGS Western Fisheries Research Center

Seattle, Washington, USA


National Biological Information Infrastructure, USGS Center for Biological Informatics, Oak Ridge National Laboratory, Oak
Ridge, Tennessee, USA


Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA

The National Biological Information Infrastructure program has advanced the biological
sciences ability to
share, integrate and synthesize data by making the
metadata program a core of its


Through strategic partnerships, a series
crosswalks for the main biological metadata specifications have enabled data providers and
international clearinghouses to aggregate and disseminate tens of thousands of metadata
sets describing petabytes of data records.

New efforts at the National Bi
ological Information
Infrastructure are focusing on better metadata creation and curation tools, semantic
mediation for data discovery and other curious initiatives.

KEYWORDS: metadata, metadata creation, metadata curation, metadata quality control


The National Biological Information Infrastructure (NBII)

facilitates the access and use of
the biological information. The NBII relies heavily on quality metadata to service biological
information. NBII follows the metadata dire
ctives outlined in the nineties (United States
Executive Order 12906

Ever since, the NBII has been constantly adapting to the
profound changes in technologies related to communication and digital data process and
storage. The NBII achieved its goal of
unifying diverse, high
quality biological databases,
information products, and analytical tools forging partnerships with government agencies,
academic institutions, non
government organizations, and private industry. In this paper,
we will review some of
the metadata related achievements and experiences of the NBII,
with a particular focus on the case example of the partnership with NSF's Long Term
Ecological Research Program (LTER)


The United States Geological Survey
(USGS), home to the

I program
, has been a leader
in establishing global infrastructures, standards, and collaborations in support of biological

management and delivery. The NBII, began

in 1993 (Sepic and Kase, 2002)
is a
broad, collaborative program to provide increase
d access to data and information on the
biological resources. Coordinated by the USGS, t
he NBII links diverse, high
quality biological
databases, information products, and analytical tools maintained by NBII partners and other
contributors in
government agencies, academic institutions, non
government organizations, and
private industry. NBII partners and collaborators also work on new standards, tools, and
technologies that make it easier to find, integrate, and apply biological resources info
Resource managers, scientists, educators, and the general public use the NBII to answer a wide
range of questions related to the management, use, or conservation of this nation's biological

For the past several years, t
he NBII program
centered their work on forging
partnerships with biology and ecology community stakeholders, with special
emphasis in facilitating the interoperability among
all biological producing organizations.

Scientific data is the key component
to advance science, but data without context lacks
value (Michener, 2005).

Here, we call metadata the context of scientific data (See Hodge,
(2001) for a nice introduction to metadata).

There was no nationally coordinated effort to
capture biological met
adata until the NBII program started.

Before NBII

some collectives,
groups and organizations captured mostly unstructured biological metadata. These early
uncoordinated metadata efforts
resulted in the preservation of the value of the
data, but

little interoperability
data sharing were


Interoperable data means,
in this context, that the data and metadata are available and usable by applications and
services that are developed by other agencies
and the public in gene

Interoperable metadata details should therefore be expressed in a common specification
used by many groups, entities, agencies and even countries

The last decade technological
advances provided us the affordable infrastructure for documenting and p
reserving data
beyond the realm of the scientific publications.

However, metadata awareness and the
culture shift that entails, needs more than infrastructure. NBII's efforts are focused in both
deploying the infrastructure needed to nurture the metadata
lifecycle as well as providing
, quality control, and other metadata services

for organizations. NBII's metadata
program has a strong educational component on the value of metadata

and the importance
of scientific data management

The Long Term E
cological Research network has been conducting ecological and biological
research for over twenty five years.

Over two thousand scientists have conducted
ecological and biological research (Hobbie, 2003) and monitored changes in our

Arctic, m
arine, alpine, desert, prairies and other habitats are hosts to a variety
of LTER long term observations and scientific projects.

The LTER network is centralized by a
coordinating office

and it covers 26 different locations

in the western hemisphere, wi
special representation in the United States mainland.

The long
term aspect of the LTER
studies called for a plan to preserve data and metadata for the long term.

LTER has had
data preservation plans and mandates for a long time, and recently, the LTER

adopted a metadata specification as the network metadata standard
. The LTER information
managers committee wrote a metadata best practices document

to coordinate the
metadata process.

NBII and LTER partnered in 2004 to foster the program intero
perability, leverage and share
resources and knowledge. A liaison position was created
by the NBII Program
to make the
interaction fluid enough to accomplish the goals set in the agencies cooperative agreement.
Specific and i
mediate goals included sharing

the wealth of metadata holdings of both
, harmonizing data standards, leveraging joint training activities, and
support biological/ecological community data management initiatives.

Status at a glance

The NBII hosts a wealth of informati
on on standardized metadata records in the NBII
Clearinghouse, a system for which
Oak Ridge National Laboratory (ORNL) provides both
technical support and hosting services. Over seventy metadata providers account for over
seventy two thousand metadata reco
rds placed in the NBII metadata clearinghouse. All
records are publicly accessible through a user friendly interface and various web services. In
the NBII metadata Clearinghouse, term based searches can be combined with advanced
searches or catalog style b

Filtering on results is available to narrow the searches

Metadata records are pre
tagged using the NBII Biocomplexity thesaurus and key
metadata sections are indexed to enhance the search results

Metadata records are classified into
several categories according to the potential
functionality they can provide (San Gil, 2008).

Revisions of the current metadata and
proper curation processes are underway.

Also, a whole suite of tools to ease the processes
of metadata editing and entry a
re being deployed. Semantic mediation is also being worked
on in several levels through

of the NBII thesaurus
and its services, and also the re
tagging of content with
thesaurus terms to enhance the metadata discovery process.

Trainings for metadata
curators, principal investigators and information managers are also an integral part of the
decade long NBII metadata program.

Paper outline

After this brief introduction,

we offer details on the metadata program today, we present
the current NBII data providers and an introduction to metadata quality levels.

In the next
section, we provide an overview of the metadata tools being deployed and the steps we are
taking to mak
e the metadata creation process simpler: Administrative dashboards for the
metadata providers, linkages to related resources, and new editing tools.

We discuss the
convergence of metadata standards. We will finalize with a discussion on current challenges

and future tasks along with plans to serve the science with the best tools to inform the


In this section we present an overview of the metadata program in biological sciences, from
the point of view of the activities
of the National Biological Information Infrastructure. We
present the goals of the program, the achievements, some technological details such as
standards used
, some details about the biological metadata clearinghouse,
metadata training programs a
nd the projects on integrative biology that have spun from
these efforts.

Metadata diversity in Biology

There are four major metadata specifications that are widely used by the ecological
community. These four specifications are implemented using the ex
tensible markup
language (XML). XML is an optimal vehicle for digital data interoperability among data

The Federal Geographic Data Committee (FGDC
) makes several specifications
the Content Standard for Digital Geospatial Metadata and the FGDC p
rofiles. The profiles are
customizations, for biology we use the Biological Data Profile or BDP

These are primarily
used by many federal government and state government in the United States.


produces FGDC
affine specifications.

This commercial (
used extensively by Geographic
Information Systems) specification is a superset of the FGDC.

All these profiles and
supersets are quite interoperable, presenting little technical impediments to make most of
the metadata content easily integrable across th
e diverse platforms using ESRI, FGDC and

The Ecological Metadata Language (EML
) is another comprehensive specification.

EML is used throughout the ecology research community, for example, the Ecological
Society of America, the LTER Network and th
e Organization of Biological Field Stations

among other organizations. Note that we have used the word
instead of
to refer to these XML
based set of contextual rules. Technically a standard needs
to be vetted by an internationally
recognized organization devoted to the standards matters.

Centralized Holdings for Biological Metadata

The NBII program serves over
seventy two thousand biological metadata records on the
NBII clearinghouse. More than seventy data providers (mostly organ
izations) are regularly
providing metadata through a weekly
program. In table 1 shows summary of
metadata holdings divided by provider. In figure 1, we show the relative composition of the
NBII metadata clearinghouse holdings as a function of nu
mber of records per provider.

[Table 1] [ Figure 1]

There are four main XML based standards accepted to harvest metadata, the FGDC, EML,
and Dublin Core

with the addition of FGDC's Biological Data Profile, and ESRI's based
superset. Also, metadata reco
rds that are compliant with the Darwin core

, and the ISO

and ISO 19137

standards are accepted at the NBII clearinghouse.

This disparity of common metadata specifications and standards to structure and share
metadata poses a barrier in terms o
f interoperability.

The NBII, through a partnership with
the LTER released a number of

(specification translations) to seamlessly
operate with any of these major metadata standards.

Some metadata crosswalks are not bi

Some spec
ifications, such as the Dublin
Core, are not as comprehensive as the above mentioned when it comes to describe a
scientific data resource.

The Dublin Core Metadata Standard typically collects such
elements as Title, Author, Publisher, Contributor, and oth
er elements. However, a

crosswalk between the FGDC and the Dublin Core may yield a complete Dublin Core record
when translating an FGDC metadata record. On the other hand, an FGDC translation of a
Dublin Core record may yield an incomplete metadata recor
d. There are other problems
with crosswalks that require human intervention, for example resolving differences in
granularity (San Gil et. al, 2010)

The NBII hosts the largest clearinghouse for biological metadata, but is not the only

Knowledge Network for Biodiversity also offers a network of biological
clearinghouses with structured metadata.
of t


allow faceted
search, term
based searches with geo
temporal qualifiers and filtering of results.

addition, all t
he information resources hosted at the NBII clearinghouse are available to the
following specialized services: Geospatial One Stop,, Dryad, GEOSS and the raptor

the NBII federated search. The clearinghouse is opened to the Google indexing servi
and other web crawlers.

Trainings and o

A metadata program becomes stronger when combined with a training program.

offers metadata trainings through
a dedicated Metadata Program Director
, who imparts on
average six to eight trainings
per year.

There are two types of trainings

a user training,
where the user becomes familiar with the concept of metadata, the many sections of
biological metadata, and the tools available for the metadata entry and editing process.

NBII offers also
another series of trainings with focus on creating new metadata trainers

"train the trainer".

numerous outreach opportunities, as the program mission is executed by
forging partnerships.

NBII participates in many new biological initia
tives, and networks with
most of the stakeholders in
the biological/
ical communities

A typical outreach effort
would consist of reaching out to a biological scientific organization or group to offer NBII
services and look for synergistic opportunit
ies. Examples of such NBII services are the long
term storage for metadata, a centralized repository that provides the NBII partners added
visibility, metadata trainings,
and data management capabilities.

Metadata Driven data integration and data analys

The NBII biological metadata program makes biological data discoverable for scientists and
the public in general.

However, the NBII metadata program goes beyond data discovery.

The program is making efforts to enhance the quality of the metadata hold
ings to provide
further functionality to the data.

Comprehensive specifications such as the EML
specification, enable machine mediation for the interpretation and manipulation of the
associated data. Some early prototypes (San Gil et al., 2010) show the f
unctional value of
content metadata.

This models inspired information management architectures where
the data flow is based on the underlying metadata.

One of the frameworks proposed for
LTER is the Provenance Aware System Tracking Architecture or P
ASTA (Servilla et. al,

One early instance that implemented part of the PASTA framework is the EcoTrends
project (Servilla et al, 2008). EcoTrends allows the public to explore the time series data
produced by the LTER sites since the corresponding p
rojects were conceived. EcoTrends
provides access to more than twenty thousand time series through a web portal

EcoTrends has only implemented a small part of the metadata
driven structure, the web
portal application, and a few metadata components.

full PASTA implementation for
EcoTrends would have required a thorough revision of the metadata and data holdings.

However, it is foreseen that most of the data applications at LTER will be based on the
PASTA framework. The success of these prototypes rel
ies mostly on the foundations, the
quality and fidelity of the metadata.

We will discuss more the challenges ahead and what
are we doing to meet those in the last section of this paper.

The Dryad repository (Greenberg et. al, 2009) is a new initiative
that has the long
term goal
of being semantically mediated while meeting the needs of today's community demands.

Dryad takes a denovo approach, avoiding the obstacles observed in legacy systems.


is another large scale project that relies on fu
nctional metadata. DataONE is a
virtual data center for biology, ecology, and the environmental sciences.

DataONE uses
metadata and the data
libraries that parse metadata to harvest data into a long
repository network. Quality, rich content metadata
will provide the functionality that will
contribute to the success of the work flows to ingest data into this distributed network of
repositories for the long term.

DataONE is jointly supported by LTER and the USGS NBII


Metadata Entry and Metadata quality control

Today, the metadata creation or metadata entry process requires a substantial amount of
human resources.

Many organizations, including the NBII and LTER devote some resources
to assist with the meta
data record creation process.

It is not an easy task to evaluate the
cost of metadata.

Lytras (Lytras and Sicilia, 2007) provides a framework to evaluate the
metadata lifecycle, however, accurate total costs remain elusive in large scale programs.

The N
BII also manually curates and revises metadata records for some of its partners, but
mostly, the task of preserving and maintaining metadata is left to the providers.

metadata editing tools are essential to the metadata maintenance process.

ata Creation Tools

Some in the biological community have used the native XML editing tools such as the

and oXygen

There are also customized metadata entry tools available to the

These tools are an advantage over the raw XML editing to
ols, as these tools are
oriented to the actual rules and conventions adopted by the back end standard.

For the FGDC family of standards, we have several metadata editors. ArcCatalog is tightly
coupled with all the GIS tools served by ESRI, however the ed
itor can be used to produce
metadata records outside ESRI software applications.

Other resourceful editors are TKME
and XTKME, which are easily coupled with the metadata analysis (validator, quality control
tools) called MP (for Metadata Parser).

t is a popular editor created by Rugg (2005)
and the Natural Park Service has released the NPS metadata editing tools.

There are more
editing tools oriented to the FGDC standards, among those are the MetaDoor, developed at
the Baruch Institute, which is w
eb based with a good number of functionalities that facilitate
the metadata entry process.


is a wizard like for the creation of EML compliant metadata records. Morpho also
has a tree based (hierarchical) editor that resembles the XMLSpy function
ality from a fair

Morpho was developed under NSFs SEEK program, and

it is an open source

The Morpho tool is tied to the Metacat, a hybrid metadata server.

Before a record is
inserted in the Metacats repositories, there is a validation

process that goes a bit beyond
the mere XML
schema compliance tests.

Some human intervention is needed to asses the
quality and accuracy of the content entered.

The ISO standards are proprietary, this places another barrier to the already difficult task

establishing an interoperable metadata program, and a shift in culture in the ecological
biological community. Yet the ISO 19115 North American Profile will be required by the
Federal Government as the next iteration of a metadata standard from the cur
rent FGDC


always looking for better tools to ease the metadata lifecycle experience. Renewed
joined efforts at NBII, ORNL and LTER produced (Aguilar et al., 2009) the seminal work for
a more integrative tool set.

There is a difference from

content management system based
editors and all the above mentioned editors.

A content management system provides the
organization a tool set that relates the all the information managed, whether photos, videos,
data, projects, personnel directories or p
ublication lists.

The comprehensive metadata fields
covered by EML and FGDC intersect many of the typical information content (See Figure 2)
managed at biological field stations, LTER sites, NBII
nodes and the like.

This content informa
tion overlap presents the question:

Where does metadata begin, and
where does it finish? This question is better addressed when we look at the expected
functionality of the record.

For example, a metadata record can be defined as the set of
descriptive f
ields that encompasses the bibliographical reference of a journal citation:
minimal personnal information (such as first name, last name) and a title and locator for the
journal, volume, number and pages. However, we have seen here that from the data
ration and synthesis point of view, a metadata record needs to be extensive.

In the
context of an information system, the metadata record concept may be replaced by all the
information that is related to a dataset, extending the functionality (data mashup
s) and
improving it (expanded and enhanced ability to discover the data).

In this context, we
developed a basic content
management based system to handle biological metadata.

However, the content management system approach (Van de Weerd et al., 2006) ena
us to connect all the contextual information that relates directly and indirectly to each data
hosted by the system.

We chose Drupal because 1) is a widely used Content Management
System 2) Drupal has all the functionality we need to serve the inform
ation of any biological
field station 3) Drupal is free and 4) It is very easy to configure, maintain and use.

This set of user and server requirements are often found when surveying LTER sites (San Gil
et al., 2008). Furthermore, by decoupling the metada
ta from the particulars of a standard,
we are able to serve the metadata in any of the flavors mentioned above (FGDC, EML,
Dublin Core, etc). The standard convergence is desirable for interoperability, as stated in
this paper repeatedly.

Our system provid
es built
in semantic mediation (tagging) that
enables us to uncover relations with data and metadata that is not directly connected to a
particular person, publication or geo
temporal constraint.

Details on this different tool will
be published soon, alth
ough some demos

are already available for testing and in

Controlled vocabularies and taxonomies are other advantage of this system.

The system does not implement a full fledged ontology, however, it is ideal to accommodate
all the
created over the life of LTER sites, and it gives us an opportunity to
advance a network
wide controlled vocabulary. Rather than trying to implement a top
system to a inertial community, we work towards a unified vocabulary that emerges from
the l
ocal terminologies and are gradually merged through tools such as the NBII thesaurus


Establishing a comprehensive metadata program is expensive (Lytras and Sicilia, 2007),
and large scale metadata solutions that come from idealized scenari
os have come short of
their initial goals in ecological and biological programs.

We have sought pragmatic
approaches to standardize legacy holdings (San Gil et al., 2008), a strategy that has given
us a success that earlier approaches to metadata tools yi
elded just a vanished promise
(Berkley, 2003).

For years metadata clearinghouses have been considered “black holes”

et al
, 200
3; Maier et al, 2001
). Data providers, PI’s, and data managers have
contributed metadata to various clearinghouses, with little or no rational as to Why? What
Benefits? and/or Recognition.

However, an analysis of the standardization processes showed what remains to be do
ne in
order to take advantage of metadata driven applications (San Gil and Baker, 2007). The
automated metadata standardization processes inherit all the flaws from the legacy
metadata sources.

No ontology or optimal metadata schema can fix a bad quality

the metadata entry process and metadata conversion process need to be revised:

Specialized ontologies may fail to classify incomplete records. Specifications with inadequate
granularity hamper the integration process.

Ambiguous measure
ment classifications (Velleman and Wilkinson, 1993) adopted by EML
hinder the proper use of the standard.

FGDC's poor measurement classification
("unrepresentable domain") is a popular measurement classification option that leads to
similar results.

Is t
here a need to define another XML based metadata transportation
specification? Consider rather the following approaches to reduce the measurement
classification entropy.

1) Adopt some
Best or Common Practices
, (San Gil et al., 2010) a practical guide for
etadata users with clear examples that clarifies ambiguous classifications.

2) Evolve existing standards. Improve the current standard leveraging of the successful,
coherent practical uses of such problematic specification sections.

In practice,
nting and revising Best Practices is often easier and faster than revising established

In our experience, evolving a standard may require anywhere from two years
(EML, minor revision) or many more years (new ISO North American Profile).

gies, thesaurus, controlled vocabularies or folksonomies? We advanced in the tools
section our bottom
up approach to the semantic web implementation.

There have been
some substantial efforts in LTER
Europe to create an ontology based system for metadata
ataloguing and management (van der Werf, 2009).

Efforts at NCEAS, and some other
explorations at LTER, have yet to reach maturity. A usable mainstream tool with a general
applicable purpose in ecology may be still some time away, but we can
easily i
discovery functionality with simple controlled vocabularies. The LTER group has had an
active controlled vocabulary group since 2005. The group was not successful at finding the
vetting mechanism (an authority) to elevate a selected set of terms to a ne
twork level
vocabulary. Porter's efforts (2009), led to a candidate set that has undergone a series of
selection criteria.

Perhaps the absence of a specific application that would test the validity
of a vocabulary (or application of thereof) is another im
pediment for the advance of the
semantic tools at LTER and ecology.

The completeness factor:

Metadata records describing a data resource as a bibliographical
reference can be used for basic local cataloging.

Richer records, containing keywords,
s, geo
temporal information, have better

records enable data discovery through geo
fielded searches. Dated annotations (when was
the study conducted, when was the study published, and when was the metadata created
and l
ast edited) help narrow searches in large catalogs.

From the point of view of data
integration, any incomplete or inaccurate records prevents automatic aggregation.

Another important challenge is the data access mediation.

In genetics, a researcher nee
to upload any new sequences to a centralized repository (such as Genbank
before publication of the sequence related scientific results in peer reviewed journals.

sharing policy has been key to the rapid advancement of the research in ge


has been shown to increase the number of citations for those publications
associated with the open data (Piwowar et al., 2007).

However, in ecology, the practice of
unfettered data access is still not mainstream.

Some o

still place login barriers
to access data
, bruising the path to open, semi
automated data synthesis.

Scientific Units: Disparity of units affects all sciences (I.e: Mars NASA orbiter
). At LTER,
the surprising diversity of units makes the data integr
ation process laborious and expensive.
The LTER has a unit task force group working on reducing the unit spread and creating a
unit best practice document, and a central unit repository.

Perhaps another grand challenge is the dispersion of information.

We are witnessing the
birth of new metadata repositories, data repositories
, project databases
, social networks
and the like.

Potentially, all the information resources can be interconnected to the atomic
level (the record level), using SOAP

web se
rvices or, perhaps better,

RESTfull services
(Pautasso et al., 2007).

In practice, many of these instances wind up disconnected with
similar content, lacking minimal integration to even local, available information.

Versioning system: As time passes, t
he information changes, both in the content, the
representation and format.
Van de Sompel

(2009) proposes an enhanced protocol to
facilitate temporal access on the web, however, it would be up to the recourse providers to
include snapshot and versions of
previous metadata.

Genomics and Ecology:

Novel affordable sequencing technologies (Amaral
Zettler et al.,
2008) are tackling projects in
situ rather than in lab, creating a need for added metadata
that transcends the classic Genbank minimum metadata re

The Genomics
Standards Consortium (GSC
) is coordinating the flurry of new needed metadata activities
in genomics and ecology. Often, what is considered data in Ecology is classified as metadata
in Genomics, which is reflected on the metadata

specifications produced (Field et. al,

San Gil (2008) describes some challenges ahead in making the genomics and
ecological standards interoperable.


Immediate plans call for actions to improve the NBII Clearinghouse fu

and overall
community metadata management. Through NBII’s a
dding a data provider administrative
dashboard, a process to enhance the metadata records (quality and content volume),
support the metadata tools that facilitate editing, curation an
d data discovery, continue the
aggregation of new metadata records to the central repository as well as improving the user

We are also harmonizing the existing ecological and biological standards with
the wealth of standards in the genomics f

We will test the functionality of the
genomics GCDML and EML convergence for a multi
location aquatic microbial census
Zettler et al., 2009).
Plans to integrate the metadata for the scientific data with
other contextual information such as
NBII Thesaurus, and the NBII

LIFE (Library of Images
from the Environment) photo gallery and other content are being considered.

with such activities as the Integrated Taxonomic

Information System (ITIS) will establish
relationships between me
tadata and species information.
New metadata tools will be
deployed for the community to use and also

available for organizations that would like to
benefit from them.



changes are also needed to provide rewards and recognition
for those researchers and investigators who are incorporating proper data management
practices and policies within their research activities.

The USGS NBII Program is leading
this effort, along w
ith other organizations,
in establishing metadata activities as core
components of the research process.


Inigo San Gil acknowledges the support of the National Biological Information Infrastructure
and Long Term Ecological Research Coo
perative Agreement.

disciplinary (genomics

& ecology) metadata initiatives are sponsored through the
support of a National Science Foundation Research Coordination Network Program



See National Biological Information Infrastructu
re main portal at


Executive Order pertinent to US
metadata mandates, resource at


See the Long Term Ecological Research
Network website at


See the LTER Network Office services at


LTER sites (or nodes) at


Harmon, M. Motion to adopt EML at the Coordinating Committee. (2003)

7 EML Best Practices document can be found at:


The Oak Ridge National Laboratory Dist
ributed Archive Center, at

8 FGDC: The Federal Geographic Data Committee, resource at

9 BDP. The Biological Data Profile, an expanded XML document type definition of the FGDC

Find it at:

10 ESRI.

A dominant software application company that occupies the Geographic
Information Systems sector.

11 EML.

The guidelines for EML use are at http://knb.e

12 OBFS.

The Organization for Biological Field Stations portal is at

13 Dublin Core.

Explore this metadata resource at

14 Darwin Core.

One of the hosts of this bi
oriented metadata schema can be
found at

15 ISO 19115. A international metadata standard.

16 ISO 19137. Another ISO st
andard to cover geographic metadata.

17 See for example the ESRI and BDP to EML crosswalk page at

18 DataONE portal at

19 The XM
LSpy XML editor (2010) Obtained at

20 The oXygen XML editor. (2010). Obtained at at

21 Peter Schweitzer (USGS) developed the TKME and XTKME editors. Free download at

22 Examples of system implementing a drupal based editor at at, and

23 NBII Thesaurus tool and web client accessible at http:/

24 Genbank.

25 DDBJ, the DNA data bank of Japan.

26 The Data Access Server. Description and resource at

27 The NASA misshap.

ce at

28 See proposal for the Unit Task Force at

29 The LT
ER project DB.

30 SOAP.

31 Genomics Standard Consortium


Aguilar, R., Pan, J., Gries, C., San Gil, I. and Palanisamy, G. (2009). A flexible online
a editing and management system. Ecological Informatics, 5(1), 26

Zettler L, Peplies J, Ramette A, Fuchs B, Ludwig W. and Glöckner FO. (2008).
Proceedings of the international workshop on Ribosomal RNA technology, April 7

9, 2008,
Bremen, Germa
ny. Syst and Appl Microbiol. 31, 258


Zettler, L.A. and McCliment, E.A. and Ducklow, H.W. and Huse, S.M. (2009) A
Method for Studying Protistan Diversity Using Massively Parallel Sequencing of V9
Hypervariable Regions of Small
Subunit Ribosomal
RNA Genes. Public Library of Science
ONE. 4(7).

Resource at

Berkley, C. (2003). Monarch: metadata
driven analytical processing.


2003, 8

Cardoso, J. (2009). Metadata and Semantics


Ed: Sicilia, M
A, and Lytras, M.D.

Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen
MJ, Angiuoli SV., Ashburner, M, Axelrod, N., Baldauf, S., Ballard, S., Boore, J., Cochrane,
G., Cole, J.,

Dawyndt, P., De Vos, P., dePamphilis, C., Edwards. R., Faruque,N, Feldman, R.,
Gilbert, J., Gilna, P., Oliver Glöckner, F., Goldstein, P., Guralnick, R., Haft,D., Hancock, D.,
Hermjakob, H., Hertz
Fowler, C., Hugenholtz, P., Joint, I., Kagan, L., Kane, M.
, Kennedy, J.,
Kowalchuk, G., Kottmann, R., Kolker,E., Kravitz, S., Kyrpides, N., Leebens
Mack, J., Lewis,
S.E., Li, K.,Lister, A.E., Lord, P., Maltsev, N., Markowitz, V., Martiny, J., Methe, B., Mizrachi,
I., Moxon, R., Nelson R., Parkhill, J., Proctor, L
., White, O., Sansone, S., Spiers, A.,
Stevens, R., Swift, P., Taylor, C., Tateno, Y., Tett, A., Turner, S., Ussery, D., Vaughan, B.,
Ward, N., Whetzel. T., San Gil, I., Wilson, G. and Wipat, A. (2008). Towards a richer
description of our complete collecti
on of genomes and metagenomes: the Minimum
Information about a Genome Sequence” (MIGS) specification. N
ature Biotechnology 26,

Greenberg, J., White, C. H., Carrier, S. and Scherle, R. (2009). A Metadata

Best Practice for
a Scientific Data Reposito
ry. Journal of Library Metadata. 9(3), 194

Harmon, M. Motion to adopt EML at the Coordinating Committee. (2003)

Hobbie, J.E. (2003). Scientific
Accomplishments of the Long Term Ecological Research
Program: An Introduction. BioScience, 53(1), 17


Hodge, G.M. (2001). Metadata made simpler. Niso Press Bethesda, MD

Lytras, M. D. and Sicilia M.A (2007) Where is the value in metadata?. International

of Metadata, Semantics and Ontologies. 2(4), 235

D. Maier, E. Landis, J. Cushing, A. Frondorf, A. Silberschatz, M. Frame, J.L. Schnase,
. Research directions in biodiversity and ecosystem informatics. Report of an NSF,
hop on Biodiversity and Ecosystem Informatics (NASA Goddard Space
Flight Center, June 22
23, 2000, Greenbelt, Maryland), 30 pp.

Michener, W. K. (2005) Meta
information concepts for ecological data management.

Ecological Informatics 1(1), 3

Pautasso, C
.; Zimmermann, O.; Leymann, F.

(2008). RESTful Web Services vs. Big Web
Services: Making the Right Architectural Decision. Conference paper. 17th International
World Wide Web Conference (WWW2008) Beijing, Chine. Resource available at

Piwowar, H.A., Day, R.S. and Fridsma, D.B. (2007). Sharing detailed research data is
associated with increased citation rate. PLoS One. 2(3). Resource at

Porter, J. (2009) Develop
ing a Controlled Vocabulary for LTER Data.

Databits. Fall 2009
issue. Resource at

Rugge, D. J. (2005) Creating FGDC and NBII metadata using metavist 2005. Technical

Sepic, R. and Kase, K. (2002) The national biological information infrastructure as an E
government tool Government Information Quarterly 19(4), 407

San Gil, I. and Baker, K.

(2007) The Ecological Metadata Language Milestones, Community
Work Force, a
nd Change. Databits. Fall 2007. 4

San Gil, I., Sheldon, W., Schmidt, T., Servilla, M., Aguilar, R., Gries, C., Gray, T., Field, D.,
Cole J., Pan J.Y., Palanisamy, G., Henshaw, D., O'Brien, M., Kinkel, L., McMahon, K.,
Kottmann, R., Amaral
Zettler, L., H
obbie, J., Goldstein, P., Guralnick, R.P., Brunt, J.W. and
Michener W.K. (2008)

Defining linkages between the GSC and the NSF’s LTER program:
How the Ecological Metadata Language relates to GCDML and other outcomes. Omics:
Journal of Integrative Biology,
12(2) 151

San Gil, I., Baker, K., Campbell, J.,

Denny, E. G.,

Vanderbilt, K., Riordan, B., Koskela, R.,

Downing, J.,

Grabner, S.,

Melendez, E.,

Walsh, J. M., Kortz, M, Conner, J., Yarmey, L.,
Kaplan, N., Boose, E. R , Powell, L., Gries, C., Schr
oeder, R., Ackerman, T., Ramsey, K.,
Benson, B., Chipman, J., Laundre, J., Garritt, H., Henshaw, D., Collins, B., Gardner, C.,
Bohm, S.,. O'Brien, M., J. Gao, W. Sheldon, S. Lyon, D. Bahauddin, M. Servilla, D. Costa
and J. Brunt. (2009). The Long Term Ecol
ogical Network metadata standardisation
implementation process: A progress report. International Journal of Metadata, Semantics
and Ontologies. v4, n3, pp141

San Gil, I., Vanderbilt, K. and Harrington, S. (2010).

Examples of ecological data synthesi
driven by rich metadata, and practical guidelines to use the Ecological Metadata Language
specification to this end.

Submitted to International Journal of Metadata, Semantics and

Schnase, J. L., Cushing, J., Frame, M., Frondorf, A., Landis,

E., Maier, D., and Silberschatz,
. Information technology challenges of biodiversity and ecosystems informatics.
Inf. Syst.

28, 4 (Jun. 2003), 339
345. DOI=

Servilla, M.S., Brunt, J.W., San Gil, I., and Costa, D. (2006). Pasta: A Network
Architecture Design for Generating Synthetic Data Products in the LTER Network. Databits

Fall 2006. Long Term Ecological Research Network

lla, M.W.,

Costa, C., Laney, C.,

San Gil, I. and

Brunt J. W. (2008). The EcoTrends
Web Portal: An Architecture for Data Discovery and Exploration. Environmental Information
Management Conference, Albuquerque, NM USA.

Van de Sompel, H. and Nelson, M.L.

and Sanderson, R. and Balakirva, L.L. and Ainsworth,
S. and Shankar, H. (2009). Memento: Time Travel for the Web.

Arxiv preprint

Van der Werf, B., Adamescu, M., Ayromlou, M., Ber
trand, N., Borovec, J., Boussard, H.,
Cazacu, C., van Daele, T., Datcu, S., Frenzel, M. (200 ) A Long
Term Biodiversity,
Ecosystem and Awareness Research Network.

See resource of a report at

Van de Weerd, I., Brinkkemper, S., Souer, J. and Versendaal, J. (2006). A situational
implementation method for web
based content management system
applications: method
engineering and validation in practice.

Software Pro
cess: Improvement and Practice,
11(5), 521

Velleman, P.F. and Wilkinson, L. (1993). Nominal, ordinal, interval, and ratio typologies are
misleading, American Statistician. 65