Global content summit

Arya MirΔιαχείριση

1 Φεβ 2012 (πριν από 5 χρόνια και 7 μήνες)

997 εμφανίσεις

All species known to science Freely accessible: open access, open source Available from a single portal in a common format Quality Constantly growing Aimed at multiple audiences

Cynthia
Parr

Species Pages Group

Global Content Summit

17
-
19 Jan 2011


http://www.eol.org


All species known to science


Freely accessible: open
access, open source


Available
from a single portal
in
a common format


Quality


Constantly growing


Aimed at multiple audiences


EOL Global Partners

China

Australia

Dutch

South Africa

Costa
Rica


Mexico

Pan
-
Arab

India


Colombia




Peru





GBIF

ViBRANT

BHL
-
Global
BHL

Aims of global partners

Global access to knowledge about life on Earth

To increase awareness and understanding of living
nature through an Encyclopedia of Life that gathers,
generates and shares knowledge in an open, freely
accessible and trusted digital resource


Work together towards this vision and mission, sharing
expertise and knowledge as appropriate

Expand
the global pool of knowledge about biodiversity and
improve access to it



Aims of this workshop


Gather
content

experts from Global Partners


Become familiar with each other’s work


Learn how core EOL works and provide
feedback on it


Form the Species Pages Working Group

Team
at Smithsonian (SPG)

Representatives from global
partners


Draft
individual plans
that complement each
other towards a common goal


Remind ourselves WHY we want to do this

What is content?


Biological information

Names and hierarchies

Descriptive text

Literature

Multimedia

Maps

Links to more
information

…..what about comments, collection annotations?

Overview of agenda

Day 1: Introductions

Day 2: Sharing

Day 3: Planning


Acknowledgements


Funding from:

David M. Rubenstein gift

John D. and Catherine T. MacArthur Foundation

Alfred P. Sloane Foundation

Smithsonian Institution

Marine Biological Laboratory

Harvard University


and other funders and
donors


All our content partners and global
partners


Volunteer curators and individual contributors via Flickr, Wikimedia,
and members of
EOL


All of
you for coming


Claire
Badgley


Cynthia
Parr

Species Pages Group

Global Content Summit

17
-
19 Jan 2011


Overview of Content Partnering

Databases

Journals

LifeDesks

& Scratchpads

Public
contributions

EOL
is a
content
curation

community

Curate


Comment

Rate, Collect


eol.org

Aggregate


API

Third party apps

Quality control, prioritization

http://
eol.org
/
content_partners

http://
eol.org
/info/
content_partner_collections


Low hanging fruit

Photo credit:
Stanislas

PERRIN

Partner trajectory


0
25
50
75
100
125
150
Y1Q3
Y1Q4
Y2Q1
Y2Q2
Y2Q3
Y2Q4
Y3Q1
Y3Q2
Y3Q3
Y3Q4
Y4Q1
Y4Q2
Y4Q3
Number of partners

0
100000
200000
300000
400000
500000
600000
1
11
21
31
41
51
61
71
81
91
101
111
121
131
1
10
100
1000
10000
100000
1000000
1
11
21
31
41
51
61
71
81
91
101
111
121
131
Partners in order of #
taxa

contributed to EOL

Number of
taxa

for which content is contributed to EOL

Long Tail in databases contributing to EOL

… viewed on log scale

Content strategy

Highlights

Priorities

Richness score

Processes

Goals



http://
eol.org
/info/partners

Content Partner process overview

Partner creates an EOL member account

Adds a content partner

We communicate with them

They (or we) upload a resource file or set a
URL where one can be found

They set a harvest frequency

EOL harvests at that frequency


Current methods of data transfer


EOL resource document (XML) (usually they do
the work)

Spreadsheet upload (either can do the work)

Connector (we do the work)

Scrape web site or PDF

Use web services

Work from a copy of DB

Darwin Core Archive (classifications, soon)

See
http://
eol.org
/info/
cp_resource_checklis
t


How EOL gets content
n
=141 partners

0
10
20
30
40
50
60
70
XML resource doc
Connector
LD/eLD/Scratchpad
Spreadsheet
CSV

web
service


PDF


HTML

DB


LD/
eLD
/Scratchpad

Example partner


Pensoft

has a
process to generate
EOL
-
compliant XML
for new species


Also sends images to
Morphbank
,
specimens to GBIF


They registered the
URL at EOL


Our script checks for
changes once a day



EOL Schema Sources

Content
type

Taxa

Attribution & licensing

Text objects &
links

Multimedia


Standards used

Darwin Core Archive

Dublin & Darwin Core

Species Profile Model(
and
now +)

Dublin (+ Audubon Core)


EOL table of contents

TDWG

Species Profile
Model

Overview



Brief Summary

Overview



Comprehensive Description

Overview



Distribution

Physical Description



Morphology

Physical Description



Size

Physical Description



Diagnostic Description

Physical Description



Type Information

Physical Description



Look Alikes

Physical Description



Development

Ecology



Habitat

Ecology



Migration

Ecology



Dispersal

Ecology



Trophic Strategy

Ecology



Associations

Ecology



Diseases and Parasites

Ecology



Population Biology

Ecology



General Ecology

Life History and Behavior



Behavior

Life History and Behavior



Cyclicity

Life History and Behavior



Life Cycle

Life History and Behavior



Life Expectancy

Life History and Behavior



Reproduction

Life History and Behavior



Growth

Evolution and Systematics



Evolution

Evolution and Systematics



Fossil History

Evolution and Systematics



Systematics or
Phylogenetics

Evolution and Systematics



Functional
Adaptations

Physiology and Cell Biology



Physiology

Physiology and Cell Biology



Cell Biology

Molecular Biology and Genetics



Genetics

Molecular Biology and Genetics



Genome

Molecular Biology and Genetics



Molecular
Biology

Conservation



Conservation Status

Conservation



Trends

Conservation



Threats

Conservation



Legislation

Conservation



Management

Relevance to Humans and Ecosystems



Benefits

Relevance to Humans and Ecosystems



Risks

Notes

Taxonomy

Education Resources

Citizen Science

Identification Resources

Nucleotide Sequences

EOL Table of Contents

TDWG Species

Profile
Model

Physical Description



Morphology

#Morphology

Physical Description



Size

#Size

Ecology



Habitat

#Habitat

Ecology



Associations

#Associations

Life History & Behavior



Life Expectancy

#
LifeExpectancy


Evolution and
Systematics



Functional
Adaptations

#Evolution

Conservation > Conservation Status

#ConservationStatus

Molecular Biology and Genetics



Genetics

#Genetics

Molecular Biology and Genetics



Genome

#
MolecularBiology

Molecular Biology and Genetics



Molecular
Biology

#
MolecularBiology

Nucleotide Sequences

#MolecularBiology

Example

biological

content

EOL v2

Plinian
Core

DwC

description

SPM

infoitem

using

Darwin Core Archive

flat files as

transport mechanism


EOL v3?

Relations

Numeric
values

Controlled
vocabulary

Partners


Can delete or replace any of their objects

Control how often we harvest, and can force a harvest

Get an automatically updating collection

Can request that we use their classification for browsing

Can change the logo and description of their project

Receive comments and curator actions immediately

Receive monthly reminders they can get traffic statistics

Get many links back to their original web resources



Partners cannot


Publish the very first time

Decide if they are pre
-
vetted

Roll back a harvest

Change the object of any other partners

Change classifications from any other
partners




Cynthia
Parr

Species Pages Group

Global Content Summit

17
-
19 Jan 2011


Richness scores

http://
eol.org
/pages/704102

Taxon

page richness algorithm

a (Breadth)

b

(Depth)

c

(Diversity)

+

+

Breadth: Images, topics of text objects, references, maps,
videos, sounds, conservation status


Depth: # words per text object, # words total


Diversity: Sources (partners)



60%

30%

10%

0


100, Threshold 40

Summary of EOL page richness

Overall

950,000 have content

2 % are rich

~22 % have
only

links


to literature


Hot List

30 % of 75K are rich

Average richness = ~30


Red Hot List

56 % of 3K are rich

Average richness = 43

How richness is used

Choose images for home page “March of Life”

Allows sorting in collections
Weird life example

Helps provide best search and API results



Any other ideas? Could we be matchmakers for
pages needing enrichment and users?

http://
synthesis.eol.org
/media/
treemap

Strategies for improving richness

Crowd
-
sourcing

Collections

Communities

Mobile apps


Leveraging

Enabling platforms

Enabling journals

Data mining BHL etc.


The page richness index

Helps fill gaps with existing knowledge

Helps prioritize funding and training so that it
has maximum impact on closing true gaps

Will be available via API


Computing and storing richness index on
EOL is a step towards storing and serving
computable data