LITA2011BHLcrowdsrc_articlesx - ALA Connect

signtruculentBiotechnology

Oct 2, 2013 (4 years and 12 days ago)

126 views

Crowd
-
sourcing the creation of
“articles” within
the Biodiversity
Heritage Library

Bianca Crowley

crowleyb@si.edu

Trish Rose
-
Sandler

trish.rose
-
sandler@mobot.org

The BHL is…


A consortium of 13 natural history, botanical
libraries and research institutions


An open access digital library for legacy
biodiversity literature.


An open data repository of taxonomic names
and bibliographic information


An increasingly global effort


B H L

L I T A

2 0 1 1

Problem: Books vs. Articles

Librarians manage books

Users need articles

B H L

L I T A

2 0 1 1

Solution: “Article
-
ization”

Creating articles
manually
, through the help of our
users: BHL PDF Generator


Creating articles through
automated
means:
BioStor

http://biostor.org/issn/0006
-
324X



B H L

L I T A

2 0 1 1

Page, R. (2011). Extracting scientific articles from a large digital archive:
BioStor

and the
Biodiversity Heritage Library.
BMC Bioinformatics, 12(
187). Retrieved from
http://www.biomedcentral.com/1471
-
2105/12/187



L I T A

2 0 1 1

B H L

Create
-
your
-
own PDF

B H L

L I T A

2 0 1 1

Citebank

today:
http://citebank.org

B H L

L I T A

2 0 1 1

What is an “article” anyway?

B H L

L I T A

2 0 1 1

the
Good
, the Bad, the Ugly

B H L

L I T A

2 0 1 1

the Good,
the Bad
, the Ugly

B H L

L I T A

2 0 1 1

the Good, the Bad, the
Ugly

B H L

L I T A

2 0 1 1

Questions for Data Analysis


What is the quality, or accuracy, of user
provided metadata?


What kinds of content are users creating?


How can we improve the PDF generator
interface?

B H L

L I T A

2 0 1 1

Stats


Jan 2010
-
Apr 2011



Approx 60,000
pdfs

created from PDF Generator


40% of those (approx 24,000) were ingested into
CiteBank

(
PDFs

without user
-
contributed metadata
excluded)


5 reviewers analyzed 945
pdfs

(approx 3.9% of the
24,000+ articles going into
Citebank
)


**Thanks to reviewers Gilbert Borrego, Grace
Costantino
,
and Sue Graves from the Smithsonian Institution

B H L

L I T A

2 0 1 1

Methodological approach



Quantitative


numerical rating system


Rated titles, authors, beg/end pages


Its “
findability
” within
CiteBank

search
often determined how it was rated


B H L

L I T A

2 0 1 1

Ratings System

Title



1=has all characters in title letter for letter


2=does not have all characters in title letter for letter but
still findable in
CiteBank

search


3= does not have all characters in title letter for letter and
is NOT findable via the
CiteBank

search

L I T A

2 0 1 1

B H L

Ratings System

Author



1=has all characters in author(s) last name letter for
letter


2=has at least one author’s last name spelled correctly


3=has no authors or none of the author’s last names are
spelled correctly

L I T A

2 0 1 1

B H L

Ratings System

Article beginning & ending pages



1=has all text pages for an article, from start to end


2=subset of pages from a larger article


3=a set of pages where the intellectual content has been
compromised.

L I T A

2 0 1 1

B H L

Analysis steps

L I T A

2 0 1 1

Results

Title

average

1.68

Title

average

1.68

Author(s) average

1.33

Beg/End pg average

1.41

Title & Author average

1.50

Overall average
(combines first 3 above)

1.47

L I T A

2 0 1 1

B H L

What did we learn?


Ratings were better than we expected


Many users took the time to create decent
metadata


“good enough” is not great but is still
“findable”


L I T A

2 0 1 1

B H L

BHL
-
Australia’s new portal

http://bhl.ala.org.au/

there’s always room for improvement

Other factors

But of course…..

B H L

L I T A

2 0 1 1

Changes we made

for UI so far


Asking users if they want to contribute their
article to
CiteBank


Making article title a required field and
validating it so its at least 2 or more
characters



Review button for users to review page
selections and metadata (inspired by BHL
-
AUS)


Reduced text and increased more intuitive
graphics (inspired by BHL
-
AUS)

B H L

L I T A

2 0 1 1

Brief survey of proposed
changes


Overwhelmingly positive response to
proposed change

there’s always room for improvement

But of course…..

B H L

L I T A

2 0 1 1

Success Factors


Monitor the creation of the metadata to
look at user behavior and patterns



Engage with your users



Incentivize

your users

L I T A

2 0 1 1

@
BioDivLibrary

/pages/Biodiversity
-
Heritage
-
Library/63547246565

/photos/
biodivlibrary
/sets/

/group/biodiversity
-
heritage
-
library

Bianca Crowley

crowleyb@si.edu

Trish Rose
-
Sandler

trish.rose
-
sandler@mobot.org


http://
biodiversitylibrary.org

B H L

L I T A

2 0 1 1