Ocean Biodiversity Informatics - NIO Bioinformatics Centre

eatablesurveyorInternet and Web Development

Dec 14, 2013 (7 years and 7 months ago)

458 views

11 Oct 2005


1

“Ocean Biodiversity Informatics” enabling a new era in marine biology
research and management


Mark J. Costello
1

and Edward Vanden Berghe
2



1
Leigh Marine Laboratory, University of Auckland, P.O.Box 349, Warkworth, New
Zealand (
m.costello@auckland.ac.nz
)

2

Flemish Marine Data and Information Centre, Flanders Marine Institute,
Wandelaarkaai 7, B8400 Oostende, Belgium (wardvdb@vliz.be)


INTRODUCTION


For several hundred years marine biology has been based on

natural history, and
during the 20
th

century addressed ecology and evolution. In recent decades, genetic
and molecular sciences have brought new insights to marine biology. In parallel
physical oceanography has become a global science using satellites a
nd other remote
sensing technology to compliment traditional sampling, and plans for real
-
time
sharing of data are underway as part of the Global Ocean Observing System (GOOS).
This growth in physical data led to the Intergovernmental Oceanographic
Commis
sion’s (IOC) International Oceanographic Data and Information Exchange
(IODE) programme establishing a network of national ocean data centres (NODC)
around the world. However, with the exception of genetic data, marine biology data
remained scattered and
often unpublished (Grassle 2000, Myers 2000, Seller et al.
2005). This may have reflected the lack of opportunities for publication of raw data.
However, the internet has reduced costs of data publication, and marine biology has
entered the information a
ge with other sciences (International Council for Science
2004). In this paper we define the scope, challenges and future prospects for the new
field of Ocean Biodiversity Informatics.


Need for data access

Never before has the need for rapid access to
data at regional and global scales been
so important. Recent analyses of ocean scale data has shown major shifts in plankton
distribution due to climatic factors (e.g. Stevens et al. this volume), global over
-
fishing (Pauly et al. 2003, Pauly and Watson 2
003), many
-
fold reductions in
abundance of large fish (e.g. Myers and Worm 2003), profound changes in ecosystem
structure due to indirect effects of fisheries that may be irreversible (Jackson et al.
2001, Frank et al. 2005), and as yet unexplained 62 mill
ion year (my) cycles of
marine genera richness in the 542 my fossil record (Kirchner and Weil 2002, Rhode
and Muller 2005). Without informatics
-
aided analyses, and supporting large
-
scale
databases, the global
-
nature of these phenomena would not have been

recognised.


Species are being introduced by human activities around the world with socio
-
economic impacts on local fisheries and aquaculture. These species may not be
recognised as introductions because only a fraction of marine species have so far be
en
described. The ability to identify species from anywhere in the world is particularly
important for detection of introductions that may prove economically harmful. Online
species indentification guides provide immediate access to more people who have
internet access (e.g.
www.crustacea.net
). In addition, electronic keys helpfully allow
users to select whichever characteristics of the animal or plant they can recognise with
confidence; rather than be forced to
choose one or two characters at each step in a
11 Oct 2005


2

dichotomous keys where one error or oversight can lead to lost time and
misidentification. The management of these invasive species requires rapid access to
identification and ecological information from othe
r parts of the world. On
-
line tools
such as the Kansas Geological Survey Mapper (e.g. Guinotte et al. this volume), and
Desktop GARP (e.g. Wiley et al. 2003), can be used to predict potential
environmental suitability for candidate invasive species. Othe
r modelling approaches
may be less automated, such as that used by Kaschner et al. (this volume) to predict
habitats for marine mammals. It seems likely that ocean biodiversity informatics will
provide a suite of modelling options appropriate for differen
t types of data and
purposes.


Local patterns of biodiversity have their origins, and may still be maintained by,
ecosystem processes at regional and global scales. Thus selecting areas for fishery
stock management and conservation require knowledge of
biodiversity patterns at all
spatial scales. At present, most conservation focuses on national scale patterns
because of regulatory obligations and limited availability of data at larger geographic
scales. Ideally, conservation should operate at ecologic
ally and evolutionarily
relevant scales.


Data, information and knowledge

Data and associated metadata (background information about the data) are the
foundation of science; the what, where, when, who, and how. The interpretation of
these facts leads to
information and theories that create knowledge. At present,
marine biology delivers many papers that provide statistics, graphs and models
derived from often unpublished data. While the importance of most of these
syntheses, models and theories will fade

in time, the value of the data increases in
time as it becomes harder to replace. The digitization of historical data from paper
files can cost only ≤ 0.5% of the original field surveys (Zeller et al. 2005), and reveal
new insights into human interaction
s with natural resources (e.g. Lotze and Milewski
2004).


Most data collection is paid for directly or indirectly through public funds to
ultimately benefit society through research, development and resource management.
The failure to publish raw data u
ndermines science, including the management of
natural resources, by impeding independent analysis, reuse and combination of
different datasets. The calls by international scientific organisations such as IOC and
ICSU (International Council for Science 20
04) to make data publicly available are
being ignored by many scientists, and are thus being repeated at international
conferences (Box 1). For example, NODC contain less than half of the oceanographic
data collected in their countries (Krohnke et al. 200
5), and few of the marine papers in
top journals publish their data. Scientists, funding agencies, institutions and
publishers must require the publication of data in user accessible form. Archives that
are not compromised by hardware and software change
s, and facilitate data re
-
use, are
also required.


Scope of ocean biodiversity informatics

Biodiversity informatics is the computer technologies that enable the management and
analysis of biodiversity data and information (Bisby 2000), and has many benef
its and
positive outcomes (Box 2). The Convention on Biological Diversity defines
biodiversity as the variety of life within species (e.g. populations), between species
11 Oct 2005


3

(e.g. communities) and of ecosystems (i.e. ecological and environmental interactions)
(Costello 2001). Related fields include bioinformatics, phyloinformatics, species
informatics, ecoinformatics and geoinformatics. Bioinformatics is generally restricted
to molecular and genetic data that do not involve species names as the core element,
and encompasses phyloinformatics, which concerns the phylogenetic relationships
between taxa (e.g. Tree of Life initiative). Species, eco
-

and geo
-
informatics concern
species level, ecological, ecosystem and geographic aspects. They deal with concepts
de
scribed as words, such as species, habitats and places, rather than numerical or
biochemical data. It is to these challenges that biodiversity informatics provides the
most novel contributions and solutions.


Ocean biodiversity informatics (OBI) is an in
terdisciplinary activity based on data
associated with marine species and their environment. It includes traditional database
design and function, as well as data exchange standards, schema and protocols, and
exploration, visualization, analysis and publi
cation software. While primary goals are
free and open access to data over the internet, some project
-
specific or sensitive (e.g.
location of threatened species) may be withheld. The use of open
-
source software is
preferred (e.g. XML, MapServer) because
this can be modified for special purposes
and freely shared, but standard proprietary software is also used (e.g. Oracle,
Microsoft Access, ARCIMS).


STANDARDS

With the advent of online data exchange, standard data exchange protocols,
middleware (or wrap
pers) that cross
-
map one database to another, and common
vocabularies of terminology have become more in demand than when databases were
isolated and centralized. Standard categories and definitions are also required for the
metadata that describes datase
ts (“discovery metadata”) and data records. Whereas
links between web pages are by hypertext mark
-
up language (HTML), the extensible
mark
-
up language (XML) provides a more formal structure for data exchange
protocols.


Data exchange

A standard list of da
ta fields (48 data elements) for exchanging data on species
distribution records has been established called Darwin Core. This has been expanded
in a backward
-
compatible manner by OBIS and Mammal Networked Information
system (MANIS) for marine and mammal s
pecializations respectively. The most
widely used biological data exchange protocol is DiGIR (Distributed Generic
Information Retrieval). The Access to Biological
Collections Data (ABCD) schema is
more complex and comprehensive (about 300 data elements) t
han Darwin Core, and
is used with the BioCASE data exchange protocol. A protocol building on and
combining DiGIR and BioCASE is under development
.

called TAPIR, the
TDWG
Access Protocol for Information Retrieval

(http://ww3.bgbm.org/protocolwiki).


Meta
data

Metadata needs to be standardised to facilitate reporting and standard definitions of
terminology. Such controlled vocabularies exist, as provided e.g. by the Global
Change Metadata Standard (GCMD), ISO 19115 and the Federal Geographic Data
Committee

(FGDC), but need to be expanded to cover marine biology and ecology.
The searching of metadata is improved by knowing the relationships between words,
such as if a word naming a concept is equal to, a subset (or child) of, or related to
11 Oct 2005


4

another word in so
me other way. This field of informatics “ontology” is well
established in information science and used by librarians, but little known by marine
biologists and ecologists. Ontologies include dictionaries, controlled vocabularies,
thesauri, and classifica
tions. Classifications can indicate taxonomic phylogenies, and
relationships between habitats and place names, and may or may not be hierarchical.
They aid capture of information from the literature as well as datasets, and are the
mechanism for creating

a “semantic web” (www.semanticweb.org). However, their
construction requires collaboration between ontology and marine biodiversity
“domain” experts, and is being facilitated by the Marine Metadata Initiative (MMI).


NOMENCLATURE

In contrast to establi
shed physical ocean and genetic data management, the common
element of all parts biodiversity informatics is species names. The application of
some species names changes over time, such as when a species is discovered to
contain several species, or to hav
e been described under different names. The
Linnaean system of species nomenclature is the best available with well developed
rules, although codes and common names can sometimes have supplementary value
(Froese 1999). Similarly, place names change over
time and the same names may be
used for different locations. Available gazetteers may find locations of some marine
place names, but they do not yet intelligently link these locations to databases to
integrate data. Ecological nomenclatures are also compl
ex, with terminology for
habitats and what defines ecosystems varying significantly.


Informatics should reduce duplication errors by making species names and
descriptions more readily available online. Having an online register of all species
names as
suggested by Costello et al. (2005) may soon become a reality (Polaszek et
al. 2005), enabling more rapid identification and avoiding the re
-
description of
species. The first step towards this, having a checklist of all described species is well
-
underway
by initiatives such as Species 2000 (www.species2000.org), Integrated
Taxonomic Information System (
www.itis.org
) and the European Register of Marine
Species (http://www.marbef.org/data/erms.php).


A key challenge in bi
odiversity informatics is the management of species names.
Name management is well established in information systems, and the first
biodiversity data exchange protocol (Z39.50) was used in bibliographic searches by
libraries (Vieglas et al. 2000). It re
organised data from a database into a standard
(Darwin Core) format that was accessible through an interface called The Species
Analyst. It has largely been superseded by DiGIR and ABCD.


GBIF’s Electronic Catalogue of names which includes the Catalogue
of Life (CoL), a
joint publication by Species 2000 and ITIS whose marine taxa are supported by OBIS.
CoL has listed about 1/3 of the estimated 1.75 million described species (Bisby et al.
2005). A parallel initiative, UBio, is capturing all used species
names from the
literature and relating this to higher taxa. This will facilitate location of information
in libraries and online sources, and if linked with the currently valid names in CoL
will greatly aid access to biological information.


DATA SYSTE
M DESIGN


Centralised databases

11 Oct 2005


5

The first informatics approaches to biodiversity data management were single
centralized databases, sometimes called “data silos”. These have advantages in a
single data structure and nomenclature, and are the best approach

where the data is
largely required within the host institution, and when a host is willing to undertake its
management. Examples include FishBase, AlgaeBase (Nic Donncha and Guiry
2002), Hexacorallia (Fautin 2000), CephBase (Wood et al. 2000), MedOBIS
(A
rvantidis et al. this volume), BioOcean (Fabri et al. this volume) and the Integrated
Taxonomic Information system (ITIS). However, when a database becomes larger
and requires many participants, then centralized systems place a heavy technical,
scientific
, and financial burden on a single organization (Merali and Giles 2005). A
centralised database may allow online access to the scientists who maintain the data
(i.e. a “data warehouse”), while the host institute focuses on technical aspects of data
manage
ment; this model is in use by the European Register of Marine Species
(Costello 2000, Costello 2004, Costello et al. this volume).


Networked databases

Some recent biodiversity informatics initiatives such as Species 2000 (Bisby et al.
2005), the Ocean B
iogeographic Information System (OBIS) (Zhang and Grassle
2003, Costello et al. 2005a), and Global Biodiversity Information Facility (GBIF)
(Edwards et al. 2000), are federations of databases distributed in many organizations
around the world. Distributed

data systems have financial, quality control, ownership,
and community building advantages over centralized structures. The funding costs
are distributed, data is maintained at source by those best qualified to update and
improve it, and data ownership i
ssues are minimized as the custodian retains control
over what data is shared. Building a scientific community to support and develop the
data system is promoted because the data sources can remain directly involved in the
initiative. The central web sit
e or “portal” that connects to all the datasets can thus
concentrate on portal function rather than raw data collection and management. The
costs of hardware, software and expertise are similarly distributed, and know
-
how can
be shared amongst the partici
pants.


However, there are limitations to a purely distributed system in that the s
peed of
response decreases with system growth, the availability of the potential data is
variable as some sources may be off
-
line, the portal is ignorant of the data conte
nt so
it cannot develop advanced data handling and search tools, users get no feedback as to
why ‘zero’ returns occur (may be no data or temporarily no data). The s
olution is to
“crawl” the data sources and “cache” the data at intervals. Thus the data ca
n be
classified and indexed, for example geographically and taxonomically. For example,
the
OBIS Index initiated by Tony Rees (CSIRO) is a subset of all data available from
the cache that can be classified, and allows calculation of statistics on availabl
e data.
By resolving records in the cache to one record per geographic grid
-
square it reduces
data volume and allows more rapid online search and mapping. It allows “near
matches” to account for misspellings, and users can search down taxonomic
hierarchy
. Because users are more aware of the data content, they can customise their
search.


Users

Initially, most users of ocean biodiversity informatics are probably scientists. This is
essential because their use of the data is a key aspect of quality cont
rol, and their
involvement will improve the functionality of the systems. It is also critical that the
11 Oct 2005


6

systems have the confidence of the scientific community, as without that, further
investment of experts’ time and government funding will decline. Univ
ersity and
high
-
school students and their teachers will make up greater numbers of users but it
takes time to develop awareness within this community. Most users of FishBase, the
best developed publicly available marine database, now fall into this catego
ry (Froese
pers. comm.). To attract these users, systems must have authoritative and credible
content. Exciting tools may elicit ‘wow’ factor and attract first time users, but content
will result in repeat and longer
-
term usage.


QUALITY ASSURANCE

Qual
ity assurance is especially challenging when the use of the data cannot be
predefined. The value of data is dependant on the purpose to which they are put.
Knowing a species occurs in the Pacific Ocean is useful at a global scale but
somebody in New Zeal
and would want to know where in that ocean it occurs so they
can judge whether their discovery is a range extension.


The completeness of a product is a function of its stated content and the needs of the
user. Unfortunately naïve users may not apprecia
te that so little of the marine
environment has been explored, that many species remain to be discovered, and that
of what has been observed only a fraction has been described and published in any
format. Setting too high goals for a product may delay its

completion and publication,
but setting interim goals that allow a step
-
wise publication provides a service for users
and demonstrates progress. For example, a simple checklist of species is of more
value when seen as the first step in a process where it

provides the backbone for
linking to synonyms, distribution data, identification information, and published
literature.


The early steps in quality control begin at the point of data collection. This is
followed by procedures to minimise additional erro
rs that may arise in documentation,
digitisation, archiving, and publishing (either on paper or electronically). Because the
opportunity for errors increases with the number of steps in handling the data, it is
critical for raw data to be available in thei
r basic form as well as any synthesised
forms. Present ocean biodiversity information systems may serve data from
authoritative sources, but less credible sources, such as amateur websites and
student’s web pages also exist. Quality control includes adeq
uate metadata,
standardised format of data (e.g. consistent placement of rows and columns in a
table), and standard pre
-
defined terminology. Procedures include checks for missing
values, scanning for impossible and anomalous values, mapping and graphing f
or
outliers, and calculations to check records match expected numbers. Checking for
outliers and irregularities needs expert intervention to avoid removing remarkable but
true discoveries. The best quality control comes from the use of the data and this
s
hould be facilitated by the publication process. User feedback must be encouraged,
and this form of peer
-
review could become a pre
-
requisite of data publication as it is
for publication of papers.


Conventional statistical analyses require presence and
absence data. However, being
certain of a species absence is challenging in ecology because many observations are
limited in same and time, and all sampling methods are biased. For example, without
the use of underwater video the abundance of deep
-
sea co
ral reefs on the continental
shelf of Europe would have remained unknown, although some reefs are 40 * 8 km in
11 Oct 2005


7

area (Costello et al. 2005b). Thus ecological studies often limit analyses to presence
-
only data. Museum collection data is also biased by spec
imens of rare species and
excludes absence information. However, protocols to convert presence
-
only data to
presence
-
absence may be possible based on known sampling and survey methods.
Such tools would increase the utility of online data but require high
compliance with
metadata standards that have yet to be established.


Data quality indices could be developed based on evidence that steps in a standard
Quality Assurance process were conducted. Data suitability is a different issue and is
dependant not o
n the data, but on purpose it is required for. An objective method for
scoring data reliability has been used for FishBase (Froese et al. 1999).


CHALLENGES

The challenges facing ocean biodiversity informatics are not just technology.
Arguably, the grea
test obstacle is the lack of a data publication culture in marine
biology. Government agencies may make data available as a public service, but
unless required by funding agencies, there is no incentive for scientists to publish their
data. Science journ
als prefer synthesis and statistics of data, but an increasing number
allow data to be published as online appendices. These appendices could be
published to a standard format for data exchange and hence facilitate interoperability.
Such standards exist
and are in use by OBIS, GBIF and others. It is normal practice
in taxonomy to lodge type specimens in museums prior to publication, and in genetics
to enter data into GenBank. It should be a similar requirement of journals that
ecological data is similar
ly made publicly available prior to publication (International
Council for Science 2004).


Interoperability improvements being addressed include (a) more automated ways of
merging datasets and cross
-
checking of nomenclatures (e.g. Froese 1997), (b)
metho
ds of having a “Globally Unique Identifiers” (GUID) for every data record will
allow detection of duplicate records, (c) expanded schema to allow more data and
metadata to be exchanged, (d) new versions of data exchange protocols and
middleware that are mo
re comprehensive and easier to implement. With common
data sharing tools and increasing amounts of data in the public domain, the same data
can be retrieved from several sources. This may be avoided in part by selective
caching and transmitting of data,
such as where OBIS does not serve datasets to GBIF
that also enter GBIF from other sources. Automated ways of recognising and
excluding such duplication at the data record level are thus necessary. Metadata
standards are being developed for marine habita
ts, including classifications and
dictionaries. They also need to be developed to describe sampling methods so users
can appreciate bias in datasets. Fisheries scientists have special catch
-
related data that
requires standards to facilitate interoperabil
ity (Branton and Ricard, this volume).


Desktop Geographical Information Systems (GIS) are now standard in marine and
environmental sciences (including management), and GIS designed for operating
online are being developed (Guralnick and Neufield 2005, H
alpin et al. this volume).
Mapping data points, lines such as for satellite
-
tracked animals, and polygons (areas)
are available online, and ways of converting between these and comparing results to
ocean data are improving. Online semi
-
automated “gazette
er” tools to convert place
names to points and polygons are being developed (e.g.
www.biogeomancer.org
) and
will improve.

11 Oct 2005


8


This change in culture is underway. An IOC
-
sponsored workshop that brought
physical oc
eanographers, biologists and data managers together in 1996, was followed
by a symposium on ocean data management in 2002 (Vanden Berghe et al. 2003). An
international conference on “Ocean Biodiversity Informatics” in 29 November
-

1
December 2004 had ove
r 170 delegates from 37 countries and 70 presentations (from
over 100 offers of papers) (
www.vliz.be/obi
). It was sponsored by the
Intergovernmental Oceanographic Commission of UNESCO, IOC’s Internationa
l
Ocean Data and Information Exchange, International Council for Exploration of the
Sea, Census of Marine Life’s Ocean Biogeographic Information System, International
Association of Biological Oceanography, Taxonomic Data Working Group, Flanders
Marine Ins
titute, MarBEF (European marine biodiversity and ecosystem function
research network), the European Commission and the German Government.
Participants came from government agencies, universities, NGO, museums, and
commercial companies.


ADAPTING TO CHANG
E AND
FUTURE PROSPECTS


Computer technology is changing at such rapid rates that it is difficult to predict what
opportunities will be available in future years, although monitoring the commercial
sector is a good indication. Ocean biodiversity informatic
s requires an entrepreneurial
approach that seizes opportunities for technology transfer, and sees change as an
exciting opportunity rather than an impediment to development. Having a variety of
choice in hardware and software platforms may seem confusing
, but must be
recognized as the normal market
-
driven approach in innovation. Resources are
always limited and investments must weigh the uncertainties of more novel and
progressive approaches against the certain needs of their market. Dealing with the
un
certainties of future funding, what technologies and data will be available, and who
will use the data for what purposes, have parallels in any innovative business. Both
materials (types of data), technological tools, products (e.g. maps, models, derived
data), and customers are likely to change. Thus ocean biodiversity informatics
initiatives must be adaptable to change, and regularly review the way they operate.


In parallel with advancing technology, the expectations of users change, and so will
scie
nce culture. One culture challenge is to overcome the concerns and excuses for
not making data available (Box 3). Froese et al. (2004) reviewed these for fisheries
data and noted that they can be overcome through delayed publication, data
aggregation, da
ta use agreements, disclaimers, read
-
only access (the norm), data
owner support and involvement, and crediting the source. The advantages of data
publication are not only to other scientists, and in the long
-
term to society (Box 2),
but the data providers

get more visibility, recognition, invitations, citations and
collaborations (Froese et al. 2004). Indeed, publishing data may be better for
“marketing” a scientist or organisation than publishing papers because it demonstrates
an advanced level of data m
anagement.


While looking forward with imagination, there are lessons from history. One of the
greatest advances in human communication was the invention of the printing press. It
allowed mass production of information, much of it with no peer
-
review
or quality
control. The size of libraries increased, and in time edited science journals and peer
-
review prior to publication became established. Today, many universities use
11 Oct 2005


9

rankings of the citation rates of journals and papers to judge individual scien
tist’s
productivity and performance, and governments use this information when
distributing research funding. We suggest the internet is a similar revolution in
information availability. A citation index for data accessed from online databases
may have s
imilar consequences for encouraging online publication by indicating data
use (Box 3).


At present there is relatively little external peer
-
review prior to publication of material
on scholarly websites, but they are recognised as credible because of the
organisations
and people who produced them. Some online information systems, such as ERMS
and OBIS have established Editorial Boards with a similar function in quality
assurance as boards of science journals. In contrast to the scientists who volunteer
t
ime for editing and peer
-
reviewing papers for journals, their efforts directly benefit
the scientific community that retains ownership of the data. This avoids concerns
over commercial publishers or institutions profiting from their contributions. This h
as
been taken a step further by ERMS and Fauna Europaea (a register of all 130,000 land
animal species in Europe). They are owned by the Society for the Management of
European Biodiversity Data (
www.smebd.org
) but all

scientists who contributed to
these initiatives are honorary life
-
members, and elect a Council to manage the
databases (Costello 2000).


Until recently, archiving was a concern for electronic media. Tapes, diskettes,
compact disks and other media coul
d be given an ISBN number (International Serial
Book Number) and lodged in a copyright library for archiving but the media would
eventually deteriorate, and the hardware (and perhaps software) to read them may be
unavailable. Web pages are notoriously tra
nsient. However, the Internet Archives
now routinely copies web pages and archives them, for which storage capacity is no
longer a problem. However, they do not archive data that is only accessible through
search screens. Commercial search engines also c
ache web pages but delete these as
they are replaced. Procedures for database backup and mirror sites are now well
-
established so data will not be lost if hosted in such systems.


At present, internet access remains elusive to many people in developing
countries
due to poor infrastructure. However, it seems probable that reduced costs of hardware
and services, and increased efficiency of satellite and wireless transmission systems,
will overcome this obstacle. Indeed, this will open the “knowledge econ
omy” to all
countries and may create a new wave of user demand and innovation at present
dominated by developed countries.


OBI is an initiative of the 21
st

century and will make conventional marine biodiversity
research more dynamic and comprehensive,
with a range of constantly evolving
online tools (Box 3). The consequences are positive and complementary for
traditional subjects such as taxonomy (Pennisi 2000, Costello et al. this volume),
biogeography, ecology, and resource management (Box 3).


11 Oct 2005


10

REF
ERENCES

Arvanitidis C, Valavanis VD, Eleftheriou A, Costello MJ, Faulwetter S, Gotsis P,
Kitsos M S, Kirmtzoglou I, Zenetos A, Petrov A, Galil B, Papageorgiou N. (in
press). MedOBIS: Biogeographic Information System for the Eastern
Mediterranean and Blac
k Sea.
Marine Ecology Progress Series


Bisby F.A. 2000. The quiet revolution: biodiversity informatics and the internet.
Science 289, 2309
-
2312.

Bisby FA, Ruggiero MA, Wilson KL, Cachuela
-
Palacio M, Kimani SW, Roskov YR,
Soulier
-
Perkins A and Hertum J van

(eds) (2005).
Species 2000 & ITIS Catalogue
of Life: 2005 Annual Checklist
. CD
-
ROM; Species 2000: Reading, U.K.

Bohlen S. 2005. Embracing the data challenge. Sea Technology 46 (5), 77.

Branton R M, Ricard D. Towards Using OBIS to provide reliable estimat
es of
population indices for marine species from research trawl surveys.
Marine Ecology
Progress Series

Costello, M.J. 2000. Developing species information systems: the European Register
of Marine Species.
Oceanography
13 (3), 48
-
55.

Costello, M. J. 2001.

To know, research, manage, and conserve marine biodiversity.
Océanis
24 (4), 25
-
49.

Costello, M.J. 2004. A new infrastructure for marine biology in Europe: marine
biodiversity informatics.
MARBEF Newsletter
No. 1, 22
-
24.

Costello M.J., Grassle J.F., Z
hang Y., Stocks K. and Vanden Berghe, E. 2005a.
Where is what, and what is where? Online mapping of marine species. MARBEF
Newsletter No.2, 20
-
22.

Costello M.J., McCrea M., Freiwald A., Lundalv T., Jonsson L., Bett B.J., Weering
T.v. and de Haas H, Rober
ts J.M. and Allen D. 2005
b
.

Functional role of deep
-
sea
cold
-
water
Lophelia
coral reefs as fish habitat in the north
-
eastern Atlantic. In:
Freiwald A. and Roberts J.M.,
Cold
-
water corals and ecosystems
.
Springer Verlag,
Berlin Heidelberg, 771
-
805.

Cos
tello, M.J., Emblow C.S., Bouchet P. and Legakis A. (in press).
Gaps in
knowledge of marine biodiversity and taxonomic resources in Europe.
Marine
Ecology Progress Series


Edwards J.L., Lane M A., Nielsen E. S. 2000. Interoperability of biodiversity
data
bases: biodiversity information on every desktop. Science 289, 2312
-
2314.

Fabri M
-
C, Galeron J, Larour M, Maudire G. BioOcean database for deep
-
sea benthic
ecological data and online Ocean Biogeographic Information System combined for
biogeographical analy
ses.
Marine Ecology Progress Series

Fautin D G. 2000. Electronic atlas of sea anemones: an OBIS project. Oceanography
13, 66
-
69.

Frank KT, Petrie B, Choi JS, Leggett WC 2005. Trophic cascades in a formerly cod
-
dominated ecosystem. Science 308, 1621
-
1623.

Froese, R. 1997. An algorithm for identifying misspellings and synonyms in lists of
scientific names of fishes. Cybium 1(3):265
-
280.

Froese, R. 1999. The good, the bad, and the ugly: a critical look at species and their
institutions from a user's perspect
ive. Reviews in Fish Biology and Fisheries 9:375
-
378.

Froese, R., N. Bailly, G.U. Coronado, P. Pruvost, R. Reyes, J.
-
C. Hureau. 1999. A
new procedure to evaluate fish collection databases, p. 697
-
705. In B. Séret and J.
-
Y. Sire (eds.) Proceedings of the 5t
h Indo
-
Pacific Fisheries Conference, Noumea,
New Caledonia, 3
-
8 November 1997. Soc. Fr. Ichthyol., Paris, France.

11 Oct 2005


11

Grassle J.F. 2000. The Ocean Biogeographic Information System (OBIS): an online,
worldwide atlas for accessing, modeling and mapping marine b
iological data in a
multidimensional geographic context. Oceanography 13, 5
-
7.

Grassle J.F., Stocks K I 1999. A global Ocean Biogeographic Information System
(OBIS) for the Census of Marine Life. Oceanography 12, 12
-
14.

Guinotte J. M., Bartley J. D.,
Iqbal A., Fautin D. G., Buddemeier R. W.. Modeling
and understanding habitat distribution from organism occurrences and correlated
environmental data.
Marine Ecology Progress Series

Guralnick, R, Neufeld D. 2005. Challenges building online GIS services to
support
global biodiversity mapping and analysis: lessons from the mountain and plains
database and informatics project. Biodiversity Informatics 2, 56
-
59.

Jackson J B C., Kirby M X, Berger W H, Bjorndal K A, Botsford LW, Bourque B J,
Bradbury R H,. Cooke
R, Erlandson J, Estes JA, Hughes TP, Kidwell S, Lange CB,
Lenihan HS, Pandolfi JM, Petersen CH, Steneck RS, Tegner MJ, Warner RR. 2001.
Historical over fishing and the recent collapse of coastal ecosystems. Science 293,
629
-
638.

International Council for

Science 2004. ICSU report of the CSPR assessment panel
on scientific data and information. ICSU, Paris, 42 pp

Kaschner K., Watson R., Trites A.W., Pauly D. Mapping world
-
wide distributions of
marine mammal species using a Relative Environmental Suitabil
ity (RES) model.
Marine Ecology Progress Series

Kirchner J W, Weil A. 2005. Fossils make waves. Nature 434, 147
-
148.

Kohnke D., Costello M. J., Crease J.; Folack J., Martinez Guingla R.; Michida Y.
2005.
Review of the International Oceanographic Data and

Information Exchange
(IODE).
Report submitted to the

Intergovernmental Oceanographic Commission
(IOC) of UNESCO 23
rd

Session of the Assembly (Paris, 21
-
30 June 2005)
http://ioc3.unesco.org/iode/files.php?action=viewfile&fid=501&fcat_id=124
.

Lotze H K, Milewski I 2004. Two centuries of multiple human impacts and
successive changes in a North Atlantic food web. Ecological Applications 14, 1428
-
1447.

Merali Z., Giles
J. 2005. Databases in peril. Nature 1010
-
1011.

Myers R A 2000. The synthesis of dynamic and historical data on marine populations
and communities; putting dynamics into the Ocean Biogeographic Information
System (OBIS). Oceanography 13, 56
-
59.

Myers R.
A., Worm B. 2003. Rapid worldwide depletion of predatory fish
communities. Nature 423, 280
-
283.

Nic Donnacha E. and Guiry M. D. 2002. AlgaeBase: documenting seaweed
biodiversity in Ireland and the world. Biology and Environment: Proceedings of the
Royal
Irish Academy 102B, 185
-
188.

Pauly D., Alder J., Bennett E., Christensen V., Tyedmers P., Watson R. 2003. The
future for fisheries. Science 302, 1359
-
1361.

Pauly D., Watson R. 2003. Counting the last fish. Scientific American , 43
-
47.

Pennisi, E. 2000.
Taxonomy revival. Science 289, 2306
-
2308.

Polaszek A., Agosit D., Alonso
-
Zarazaga M., Beccaloni G., Bjorn P. de P., Bouchet
P., Brothers D.J., Evenhuis N., Godfray H.C.J., Johnson N.F., Krell F
-
T., Lipscomb
D., Lyal C.H.C., Mace G.M., Mawatari S., Miller
S. E., Minelli A., Morris S., Ng
P.K.L., Patterson D.J., Pyle R.L., Robinson N., Rogo L., Taverne J., Thompson
F.C., Tol J van, Wheeler Q.D., Wilson E.O. 2005. A universal register for animal
names.
Nature
437, 477.

Rohde R A, Muller R A 2005. Cycles in

fossil diversity. Nature 434, 208
-
210.

11 Oct 2005


12

Vanden Berghe, E. and Costello M.J. 2005. Ocean Biodiversity Informatics: Report
from international conference on “marine biodiversity data management”,
Hamburg, Germany, 29th November to 1st December 2004. MARBEF
Newsletter
No.2, 16
-
17.

Vieglas D., Wiley E.O., Robins C.R., Peterson A.T. 2000. Harnessing museum
resources for the Census of Marine Life: the FISHNET project. Oceanography 13,
10
-
13.

Wiley E O, McNyset K M, Peterson A T, Robins C. R, Stewart A M 2003
. Niche
modeling and geographic range predictions in the marine environment using a
machine
-
learning algorithm. Oceanography 16, 120
-
127.

Wood, J.W, Day C.L., Lee P., O’Dor R.K. 2000. CephBase: testing ideas for
cephalopod and other species
-
level database
s. Oceanography 13, 14
-
20.

Zeller, D., R, Froese and D. Pauly. 2005. On losing and recovering fisheries and
marine science data. Marine Policy 29:69
-
73

Zhang Y, Grassle J F 2003. A portal for the Ocean Biogeographic Information
System. Oceanologica Acta

25, 193
-
197.





Not cited (but maybe)

Froese, R., U. Piatkowski, S. Garthe and D. Pauly. 2004. Trophic signatures of marine
organisms in the Mediterranean as compared with other ecosystems. Belgian Journal
of Zoology 134:31
-
36.

Froese, R. and R. Reyes J
r. 2003. Use them or lose them: the need to make collection
databases publicly available. p. 585
-
591. In A. Legakis, S. Sfenthourakis, R. Polymeri
and M. Thessalou
-
Legaki (eds.) Proceedings of the 18
th

International Congress of
Zoology.


Froese, R., D. Ll
oris and S. Opitz. 2004. The need to make scientific data publicly
available


concerns and possible solutions. p. 268
-
271 In M.L.D. Palomares, B.
Samb, T. Diouf, J.M. Vakily and D. Pauly (eds.) Fish Biodiversity: Local studies as
basis for global inferenc
es. ACP
-
EU Fisheries Research Report 14, 283 p.

Canhos V P, de Souza S, De Giovanni R, Canhos D A L 2004.
Global Biodiversity
Informatics: setting the scene for a “new world” of ecological forecasting.
Biodiversity Informatics 1, 1
-
13.

Stein B R, John R.
Wieczorek J R 2004.
Mammals of the World: MaNIS as an
example of data integration in a distributed network environment. Biodiversity
Informatics 1, 14
-
22.





Box 1. Public statement
by the 2004 conference on
Ocean Biodiversity
Informatics.



We note that increased
availability and sharing of
data



is good scientific
practice and
necessary for
advancement of
11 Oct 2005


13







Box 2. Some of the
benefi
ts of biodiversity
informatics.



Data publication



Low cost
publication of text,
maps, images,
movies, sounds



Easier access to
data and metadata



Availability of
data and metadata
widened



Rapid publication



Linking to many
data and
information
resources on

the
world wide web





Consequences



Permits data
mining and
exploration



Combination and
sharing of data
from multiple
sources



Data are re
-
usable
for perhaps
unforeseen
benefits



Repatriate data and
knowledge to
developing
countries



Non
-
biodiversity
researche
rs
may
analyze the
data from
new
perspective
s



Collaboration
between different
research groups is
promoted and
facilitated



Policy makers and
11 Oct 2005


14



Box 3. Predictions for what Ocean Biodiversity Informatics may provide in the
future.


Science culture

1.

Dat
a sharing normal part of scientific process in marine biology

2.

Data publication on
-
line becomes standard practice

3.

Citation rankings of on
-
line publications

4.

Recognition value on
-
line publication in individual’s research performance


Informatics

1.

Online mappin
g of many species against selected environmental variables

2.

Online visualization as graphs, maps, movies and 3
-
D models

3.

More automated data capture and integration option

4.

Citation index for use of online data

5.

Improved online data publication tools, includi
ng distribution and
identification information as text, images, sounds

6.

Automated translations between scripts and languages

7.

Automated and permanent archiving of scholarly websites


Data available

1.

All valid marine species names on
-
line and part of the “Cat
alogue of Life”



Identification guides (descriptions and images) to all marine species on
-
line
as part of a “key of life”



Distributions of all marine species on
-
line

4.

Search and map by marine habitats at global scales

5.

Distributions of invasive species with
predictions of future spread


Consequences for efficiency in science

1.

Improved quality control in identification and taxonomy

2.

Increased rate of species being described

3.

New discoveries and understandings of role of biodiversity in ecosystems
based on data


4.

R
apid reanalysis of existing data in light of new data

5.

Better management of fish stocks and natural resources through better
understanding of ecosystem function and health

6.

Real
-
time monitoring of environmental (e.g. satellite, in situ systems) and
biologic
al data (e.g. from video, sensors)