exploration missions for collecting

martencrushInternet and Web Development

Dec 8, 2013 (3 years and 8 months ago)

94 views

Accessing the original observation
data captured during plant
exploration missions for collecting
crop diversity

Bioversity International, Via dei Tre Denari 472/a, Maccarese, Rome, Italy

Hannes Gaisberger, Massimo Buonaiuto, Federico Mattei, Andrea De Pirro,
Valentina Barbiero, Simone Mori, Imke Thormann, Tom Hazekamp, Elizabeth
Arnaud


Agenda


Part 1: Safeguarding the original paper
documents by scanning and digitizing the data


Hannes

Gaisberger





Part 2: Creation of a public repository of full
scanned documents enabling access to the
full text


Massimo
Buonaiuto

Bioversity supported germplasm
collecting missions


Since 1974, Bioversity International has
supported more than
550 germplasm collecting
missions

yielding
225,875 sample
s and
covering 4,300 species from 137 countries


Samples were sent to several genebanks
worldwide for safety duplication, conservation
and potential distribution


Other CGIAR centers organized various
collecting missions for their mandate crops

Original observation data is essential for:


Identify duplicates between
collections and gaps in diversity


value for
genebank

curators and
collecting actions


Tracking original sample & country
of origin in pedigrees


value for
Breeders and Benefit Sharing


Collectors recorded key sample information
(passport data) and other observation data in field
books

Scanning of field notebooks and
related documents

Original observation: a treasure for
genebanks

and breeders


Genus and Species



Collecting Number



Site Information: Admin boundaries,
Latitude, Longitude and Elevation



Collecting Source and Sample Status



The collecting form contains the
botanical classification along with
localization details, environment,
cultural practices, diseases and
pest presence and symptoms
and traditional uses


Identification and quality
-
checking
in databases


Different publicly available genebank inventories are
checked in order to track corresponding samples and
complete missing passport data

Integration of quality passport
data


Data extracted from field books and databases is
integrated in a
sample level database
of collecting
missions


Results in figures


To date, the quality of

101,171

passport data
records from
375
collecting missions
has been
improved through data extracted from
scanned documentation


56,454

of these collected samples are
linked

to
genebank

accessions

in 51
institutes worldwide


Priority crops/
use group

Number of collected
samples

Forages

44056

Rice

25022

Maize

16484

Beans

10976

Wheat

7507

Cowpea

7473

Potato

7146

Pearl millet

6662

Barley

4429

Groundnut

2928

Finger millet

2850

Chickpea

1467

Banana

1326

Pigeon pea

999

Others

86550

Total

225875


A total of
43,637 scanned pages
are saved as
1063 pdf
-
files

and stored in an online repository aside the
26,000
other files scanned by CGIAR centers and partners

Publishing the data and attached
information


End of 2010: work must be finished
for
Bioversity

supported missions


Full text available on the
online
repository
and
publish the
collection mission database


Visualization:

Map sites where
diversity was collected (after
georeferencing

with
Biogeomancer
)



Various
projects to address
gaps analysis
and
diversity analysis
, like
Genesys
,
encourage
partners to perform same work and share the full
text and data


links to CWR information, Museum herbaria information,
Literature

Public access to the scanned collecting
missions documents

A Repository
that presently contains
27,000 Collecting
Missions Files
from CGIAR Centers and partners:


Agricultural Research Centre (ARC) of Lao People’s
Democratic Republic


AfricaRice


Agricultural Research for Development in Africa (IITA)


Bioversity International


International Rice Research Institute (IRRI)

Typology of the documents produced by
Collectors

1) Mission Reports

2) Summary Forms

3) Sample lists

4) Collecting Forms

5) Accession Vouchers



6) Newsletters

7) Factsheets

8) Distribution lists

9) Field Books


Documents Types Hierarchy

Analysis of Metadata (1/5)

Analysis of Metadata (2/5)

Analysis of Metadata (3/5)



Analysis of Metadata


Darwin Core for
Germplasm (4/5)



Analysis of Metadata (5/5)


Darwin Core Germplasm metadata

+

Collecting Missions metadata

=


Metadata for Collecting Missions Documents

How users will access the Repository

Alfresco DMS

Typo3 CMS

Import of 27,000 PDF Files

Process of import PDF files in 3 phases:

1.
Conversion of institutional metadata in Darwin
Core
Germplasm

metadata

2.
Association of metadata to all PDFs files, using
heterogeneous sources (databases, Excel files
and filenames, etc.)

3.
Batch upload of all PDF files together with
metadata file associated to each file in DC
-
Germplasm

standard.

Public Search Mask (1/3)

Public Search Mask (2/3)

Public Search Mask (3/3)

How users will manage and publish
documents


Simple Workflow to
publish into the
Repository:

1.
Upload the file in private
user Home Space

2.
Edit metadata

3.
Approve the document for
public repository with a
click


... the file will be and public


Summary



Improved quality of passport data for about 100,000
collected samples from 137 countries


56,454 of these collected samples are linked to
genebank accessions in 51 institutes worldwide


Collected 27,000 documents classified in 9 types of
documents with metadata


Metadata extracted and parsed using Gerplasm
Darwin Core standards

Open questions and challenges

-
Interaction with Open Archive standards and
Protocol for Metadata Harvesting

-
Integration with Crop Terminizer, University of
Manchester

-
Web Analytics for monitoring of downloads in details
(referrers, visits, etc.) and web marketing

-
CMIS protocol used to interact with content
management systems

-
Metadata validation with crop scientists, collectors


http://www.central
-
repository.cgiar.org/



http://www.central
-
repository.cgiar.org/


Guidelines for collecting samples

-
Being revised and will be published in a new
section of the on the Crop
genebank

knowledge
base

-
Adding guidelines for illustrating with photos that
support the tentative taxonomy, captured data and
GPS







THANK YOU!