What is data and why should you care?

signtruculentBiotechnology

Oct 2, 2013 (4 years and 12 days ago)

87 views

What is data and why should you
care?

Dr. Kalpana Shankar

School of Information and Library
Studies, UCD

5
November
2012

What do Apollo
11
, the
Domesday

Project, and award
winning scientists from the US National Science
Foundation have in common?

What is research data?

“The data, records, files or other evidence, irrespective of their content or
form (e.g. in print, digital, physical or other forms), that comprise research
observations, findings or outcomes, including primary materials and
analysed data.



Australian National Data Service

Examples:


Statistics and measurements


Results of experiments or simulations


Observations e.g. fieldwork


Survey results


print or online


Interview recordings and transcripts


Images, from cameras and scientific equipment

What is ‘data’?

Any information you use in your research

“PhD students lose material all the time…and they
are exactly the people who want to be backing up.
These are people who are creating data which are life
and death important to them”

Why are we talking about data
management?

“The whole thing
is incredibly dull.”

Rising volume and complexity of
research data


According
to the
European
Bioinformatics Institute,
the
volume of new biological data is
doubling every 5 months


For example, in genomics:


we can now analyse the equivalent of
a human genome every 14 minutes at
a cost of $5,000
-

400 times quicker

than when the draft human genome
was first published in 2000.


1,000 Genomes Project:
200
terabytes


the equivalent of 16
million file cabinets filled with text, or
more than 30,000 standard DVDs

A hard drive after 6 years’ research

113
Gb

42
,
699
Files

3
,
466
Folders

Image by Lindsay Lloyd
-
Smith

So, why is data management
important for
research?


It is increasingly integral to all areas
of research


It is a rapidly escalating issue


It
is important to research funders


likely to be increased follow
-
up in
the future


It has major resource implications


which need to be planned for
carefully


In short, it creates major challenges
which
aren

t going to go away!

“Fire” by
andrewmalone

via
flickr
.: http
://www.flickr.com/photos/andrewmalone/
2032844649
/

What would happen to your
data if there was a fire
or
theft in
your
office,
department or home?

Why data management is
important
to YOU
(
II)

Writing a Data Management Plan

1.
Formalises the definition of
your research data

2.
Documents the contextual
and technical details of
your data

3.
Check on File Structure /
Naming

4.
Plans for data sharing,
access,
and archiving


Your Data Management Plan won’t be perfect


It is not a static document


Change and update it as your research progresses and you
understand more about your data


Think about key issues that might affect your data…

o
…while you work on them

o
…in the future


It’s better to have a plan that covers some aspects than no plan
at all


Ask for advice if you’re uncertain

Getting started

Questions to ask yourself


Platform: Windows, Macintosh and/or Unix ?


Objective: Store? Manage? Share? Publish?


Extent of collaboration


Your research group/lab only


Your group +
externals


Cast of thousands?


Nature of data?


Level of security?


Human records
(de
-
identified)?


Intellectual Property?


Amount of data? MB? GB? TB?


Rate of accumulation of data?


How much needed online to do useful work?


Period of preservation?

Give your data a structure…


By Anne (Flickr ID: I like): “
Voltaire
& Rousseau


http://www.flickr.com/photos/ilike/2616342739
/

CC BY
-
NC
-
ND 2.0

By
twechy

(Flickr ID): “Library Bookshelf”

http://www.flickr.com/photos/twechy/
6829994084
/

CC BY
2.0

…it makes it easier to find things

Something to try:

Use post
-
it notes to
create a map of your
file structure


Write each existing file and folder name onto a post
-
it


Arrange folders on your desk in a sensible hierarchy


Put your ‘files’ into ‘folders’


Do you need new folders? Do you have too many?

What’s in a name?


Names tell us what a file is
(contextual information)


Use a combination of different types of information to make
context and content clear,
eg


Author (or Initials)


Date


Data source


Theme


Experiment


Sample


…But try not to let file names get too long



Why create documentation?


Creating documentation
might seem
like a waste of
time


Good
documentation will
include a lot of
information that might
seem obvious



www.flickr.com/photos/smutjespickles/2434418686
/

Document your data as you go


If you don’t, it may become impossible
for you


or someone else


to
understand and re
-
use data later on

Question Mark Sign by
Colin_K

on
flickr
:

http
://www.flickr.com/photos/colin
kinner/
2200500024
/

What’s obvious
now might not
be in a few
months, years,
decades…

Image: http
://www.flickr.com/photos/archer10/5692813531/

MAKE SURE
YOU CAN
UNDERSTAND
IT LATER

Make research material
understandable

Make research reproducible


Detailing your
methodology helps
people understand
your research better


Explaining your
algorithms, search
methods
etc

makes
your work reproducible


Conclusions can be
verified


Image by
woodleywonderworks

on
flickr
:
http
://www.flickr.com/photos/wwworks/4588700881/


Material
may
be re
-
used by someone in a
different discipline


Provide context to
minimise the risk of
it
being misunderstood/
misused

Make material reusable

Backing up


Lots Of Copies Keeps Stuff
Safe (LOCKSS): make
multiple back
-
ups


Keep back
-
ups in a
separate place to the
original


Use different types of
storage media,
eg

CDs, pen
drives,
networked storage
,
external hard drive

From: “Copy Copy
Copy
” by David
Goehring

(
CarbonNYC
)
via
flickr

For everything you keep….

Make sure you can:



find

it again later



understand

later

Where to get help


Earth Institute will be putting up links on
Website


Your supervisor


Library


Funding agencies


Earth Institute will be putting up links on
Website

Oh yes…what do Apollo, the
Domesday

Project, and
award winning scientists from the US National Science
Foundation have in common?

Questions?


My contact information:


Kalpana Shankar (
kalpana.shankar@ucd.ie
)