Linking literature to data in the life sciences - Open Access Scholarly ...

hordeprobableBiotechnology

Oct 4, 2013 (3 years and 8 months ago)

73 views

Data Publication

COASP 2012

Publications

26 million
abstracts

2.2
million full text articles



Citation networks

Database links

Text
-
mining


2012

2006

2011

2016?

Europe PubMed Central

How many open access articles in UKPMC?

P
ubMe
d (995K)

UKPMC (18%,182K)

OA (9.6%, 96K)

Big Data:

Deposition

Primary

Research
articles

Big
Data:

Curated

Annotation



Managing the public data ecosystem

Unstructured
Data

1

2

1

2

3

Literature citation from data

(data annotation)

Links from Literature to Databases


Proteins


Nucleotides


OMIM


Chemicals


Structure


Clinical reviews


Protein families


Protein
-
protein interactions


Gene expression experiments

800 K

370 K

110 K

Database
crosslinks

Bibliography from P25106

Data citation from literature

(provenance)

Semantic Type

Unique Terms

Articles

Annotations

Accession No.

233,017

66,356

387,787

Chemical

76,712

1,694,385

83,923,066

Disease

171,692

1,768,214

57,821,871

Gene/Protein

227,318

1,310,382

77,189,022

GO Terms

32,664

1,832,294

65,061,579

Organism

180,637

1,713,280

70,832,222

Text Mining in UKPMC (2.2 million articles)

Accession numbers stories:


data citation in OA articles

Senay

Kafkas

Jee
-
Hyub

Kim

0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
publisher
-
annotated

text
-
mined

Annotation of accession numbers (OA)

~10,000 articles

>25,000 articles


Névéol

A, Wilbur WJ, Lu Z (2012) Improving links between literature and biological data with text mining: a case study with
GEO, PDB and MEDLINE.
Database

2012:bas026 (PMC3371192)



Névéol

A, Wilbur WJ, Lu Z (2011) Extraction of data deposition statements from the literature: a method for automatically
tracking research results.
Bioinformatics

27, 3306
-
3312 (PMC3223368)

bmc genomics

bmc evolutionary biology

the journal of cell biology

virology journal

bmc microbiology

the journal of experimental medicine

bmc bioinformatics

bmc plant biology

the journal of biological chemistry

bmc molecular biology


plos one

acta crystallographica section e:

british journal of cancer

the journal of cell biology

environmental health perspectives


nucleic acids research

the journal of experimental medicine

critical care


emerging infectious diseases

bmc bioinformatics


plos one


nucleic acids research

bmc genomics

bmc evolutionary biology

the journal of cell biology

plos pathogens

bmc bioinformatics

virology journal

bmc microbiology


emerging infectious diseases

Most publisher tags

Most articles

Most text
-
mined tags

BMC Genomics:

1,484 TM tags*,

4,337 articles

PLoS

One:


4,226 TM tags*,

42,888 articles

Efficacy of Accession number tagging (OA)

Scientific:


Linking articles that cite the same data

Citation:


Data Citation as measure of impact (Thomson: Data citation index)


Context of data citation: submission, reuse, analysis

Operational:


Services for publishers to improve Accession number tagging


Editorial policies and adherence


Extension of NLM DTD


Lessons learned for considering unstructured data

Why is this important? Implications

That we can perform this analysis at all highlights a benefit of Open Access

AY387398: needle in a haystack

Unstructured data

Articles with supplemental data (UKPMC)


235,000 articles (50K+ in 2011)


718, 511 files


459 extensions


0.8 TB (1200 CDs)


(However most data in ~60 extension types)

%

Pub Year

Big Data:

Deposition

Primary

Research
articles

Big
Data:

Curated

Annotation



Managing the public data ecosystem

Structured links

Unstructured
Data





reuse

analysis

provenance


Open


Citable


Discoverable


Reusable

People


Paula Buttery


Andrew
Caines


Norman
Cobley


Yuci

Gou


Senay

Kafkas


Jyothi

Katuri


Oliver
Kilian


Jee
-
Hyub

Kim


Nikos
Marinos


Jo
McEntyre


Xingjun

Pi


Philip
Rossiter





Rebholz

Group


Peter
Stoehr



University of Manchester


British Library



OpenAIRE/OpenAIRE

Plus



NCBI, NLM