Moving forward our shared data agenda: a view from the publishing industry

italiansaucyΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 10 μήνες)

65 εμφανίσεις

Moving forward our shared data agenda: a view from
the publishing industry


ICSTI, March 2012

Data and the Scientific Article

Researchers perceive data sets as “important, but hard to access”

Publishing
Research Consortium,
2010

Researchers, N = 3824

Important, but hard to access

Overview: Data & the Scientific Article



Current approaches


Thoughts for the future

Supplementary Material


Authors can upload Supplementary Material with their paper




Pro’s



Coupling of data and article



Peer review



Citation mechanism



Preservation (byte
-
wise)


Con’s



Limited data type support



Compatibility (format support)



Limited capacity



Data not centrally stored

Connecting with Data Repositories, 1


Link to CCDC
database

(indicates that
information for this

article is available)

Screenshot of journal article on
ScienceDirect

(http://dx.doi.org/10.1016/j.jfluchem.2009.07.015)


Article Linking example: CCDC

Connecting with Data Repositories, 2


... clicking on the CCDC logo takes the reader to a page at the CCDC repository with data
related to the article

Screenshot of information page at CCDC (Cambridge Crystallographic Data Centre)


Article Linking example: CCDC

Connecting with Data Repositories, 3


Tagged
Genbank

entry

(genetic sequence)

Screenshot of journal article on
ScienceDirect

(http://dx.doi.org/10.1016/j.biortech.2010.03.063 )


Entity Linking example:
Genbank

Accession Number

Connecting with Data Repositories, 4

... clicking on the linked
Genbank

accession code takes the reader to an information page
on the NCBI data repository about that specific genetic sequence

Screenshot of information page at NCBI (National Center for Biotechnology Information)


Entity Linking example:
Genbank

Accession Number

Connecting with Data Repositories, 5

Database

Subject


Type of Linking

CCDC

Crystallography

Article
-
level

PANGAEA

Earth Sciences

Article
-
level*

EMBL Molecular

Interactions

Chemistry

Entity
, tagging

Molecular
INTeraction

DB

Chemistry

Entity
, tagging

Genbank

Nucleotides

Entity
, tagging

UniProt

Proteins

Entity,

t
agging

Protein

Data Bank

Proteins

Entity
, tagging

ClinicalTrials

Medicine

Entity
, tagging

TAIR (Arabidopsis
)

Model organism

Entity
, tagging

Mendelian

Inheritance in Men

Genetics, inheritance

Entity,

t
agging

*: with Application

The Article of the Future

Discovery and Use via SciVerse Applications


Use information from
SciVerse

and the web


Support for rich user
interfaces


Integrated directly into
the online article


Simple to build using
Content and Framework
APIs


Open standards
(Apache Shindig, Open
Social)


Features & Benefits

Discovery and Use via SciVerse Applications


Give me your data, my
way…

Openness

and
Interoperability


Know who I am and what I
want…

Personalization


The right contacts, at the
right time…

Collaboration
and trusted
views

Libraries

can become focal point for applications

Researchers

can save time and improve their

information discovery process

“Apps interacting with results are very
important to help save time…”


Specific information can be
targeted by
applications to facilitate content mining
and speed up the search time, utilising more
time for analysis.

“what faculty is really after is something
that ties this altogether, so its all in one
place…”


Applications assist researchers to
extract all
information



content, data, figures etc. to a
single analysis source which can be on a
local database at the customer’s institute.

Applications example: NCBI Genome Viewer


Scans the article and builds list of sequences based on NCBI accession numbers tagged in the article


View/analyze sequence data from genes in the article using NCBI Sequence Viewer


See specific information about each strand; zoom in/out; export data


Screenshots of journal article on
ScienceDirect

(http://dx.doi.org/10.1016/j.ygeno.2007.07.010)

Applications example: PANGAEA


Document identifier sent to PANGAEA data repository for earth sciences


PANGAEA returns map plotted with locations where cited data was collected


Push
-
pins open with details of dataset and direct link to data on PANGAEA.de


Screenshots of journal article on
ScienceDirect

(http://dx.doi.org/10.1016/S0377
-
8398(01)00044
-
5)

Elsevier Enables Content Mining

CONTENT

Customers may:

Run

extensive

searches

and

use

locally

loaded

content

for

text

mining

purposes

for

their

own

research
.


Perform

extensive

mining

operations

on

subscribed

content

.




Structuring

input

text



Deriving

patterns

within

the

structured

text



Evaluation

and

interpretation

of

the

output
.

Extract

semantic

entities

from

Elsevier

content

for

the

purpose

of

recognition

and

classification

of

the

relations

between

them


Integrate

results

on

a

server

used

for

the

customer’s

own

mining

system

for

access

and

use

by

its

researchers

through

the

customer’s

internal

secure

network
.

Enabling

developers

who

wish

to

design

and

implement

applications

to

analyse

our

content,

or

test

applications

as

part

of

their

research

within

Elsevier

content


Our Content Mining Solution Suite

CONTENT

DELIVERY

SEARCH &

WORKFLOW

SOLUTIONS

ANALYSIS


Current initiative overview




Supplementary Material


Linking to Data Repositories


Presentation via
Article of the Future


Discovery and Use via
SciVerse Applications


Empower scientists to mine content and use locally


***************************


Data store (600
terrabytes

as present)


Executable papers


Workflow tools


Etc.


Conclusions: some thoughts for the future

RESEARCHERS

FUNDERS

PUBLISHERS

INSTITUTIONS

Need for aligned strategies and policies, sustainable
business models, and concerted collaboration