Open data and data curation

farmpaintlickInternet and Web Development

Oct 21, 2013 (4 years and 22 days ago)

80 views

Hamish James

Statistics New Zealand

Open data and data
curation

Outline

1.
Setting the scene

2.
Open data

3.
How open data and data
curation

are related

information

structured

digital

analogue

unstructured

Quick definitions

data

open data

data
curation

Defining data

Data consists of
sets of structured values
that
can be organised, analysed and manipulated by a
software application or some other means of
calculation. This includes data collected directly
through surveys and administrative systems, as well as
data created or compiled by aggregating or
reanalysing other sources. A defining characteristic of
data is that it is
machine
-
readable
.

Open data, data curation


Open data is a
philosophy based on the idea that
that
data is more valuable if more people can use it,
and that technology
has
made the cost of sharing
data
negligble


Data
curation

is
a field of research and work
focusing on the long
-
term management of data,
built on the argument that the opportunity cost of
losing data is
high


Open data highlights benefits


Data
curation

worries about costs


data

knowledge

value

Focus of open data activities


Data collected and held by governments


Data collected or generated through publically
funded research


http://wiki.opengovdata.org/index.php?title=Open
DataPrinciples



Reasons to make data open


The underlying purposes of making publically
funded data more accessible are to:


inform decision making by government, businesses and
communities


increase transparency and accountability in government decision
making


assist informed participation by the public in government decision
making


promote economic development through the innovate
application of data collected for one purpose to other tasks


gain greater value from research data



Barriers to reuse of government data


Agency culture (reluctance or hostility to data
sharing)


Funding constraints


Ensuring data confidentiality


Shared ownership


Poor dissemination practices

Open Government Data Principles


Government data shall be considered open if it is made
public in a way that complies with the principles below:

1.
Complete
.

All public data is made available. Public data is data that is not subject to
valid privacy, security or privilege limitations.

2.
Primary
.

Data is as collected at the source, with the highest possible level of granularity,
not in aggregate or modified forms.

3.
Timely
.

Data is made available as quickly as necessary to preserve the value of the data.

4.
Accessible
.

Data is available to the widest range of users for the widest range of
purposes.

5.
Machine

processable
.
Data is reasonably structured to allow automated processing.

6.
Non
-
discriminatory
.

Data is available to anyone, with no requirement of registration.

7.
Non
-
proprietary
.

Data is available in a format over which no entity has exclusive
control.

8.
License
-
free
.

Data is not subject to any copyright, patent, trademark or trade secret
regulation. Reasonable privacy, security and privilege restrictions may be allowed.



Characteristics of open data

Open data:


Free and open
access

to the data


Freedom to
redistribute

the data


Freedom to
reuse

the data


No restriction

of the above based on
who someone
is

(e.g. their nationality) or their
field of
endeavour

(e.g. commercial or non
-
commercial)

c.f. http://www.okfn.org/about/


Creative Commons

Attribution

Share
-
alike

No derivative works

Non
-
commercial

Creative Commons licence conditions

Linked data


Linked data uses semantic web approaches
(especially RDF) to describe data and make it
accessible to machines


a web of linked data


RDF ‘triples’ are used to describe things


Subject


predicate


object


Hamish


is a


presenter

Linking Open Data dataset
cloud

What is missing?

Data needs context

Examples


“Which town or city in the UK has the highest
proportion of students?"


“Which town or city in the UK is home to one or
more university campuses whose registered full
or part time (non
-
distance) students divided by
the local population gives the largest
percentage?”


http://digitalcuration.blogspot.com/2010/03/link
ed
-
data
-
and
-
reality.html

render

explain

re/use

Documentation:


Standards


Meaning


Interpretation

Technology:


Hardware


Formats


Software

data

knowledge

value

Technology to render data

Documentation to explain

What is missing? Context


Data is not self
-
describing


Who provides the description?


What does it cost to provide the description?


How much of the description is held as tacit
knowledge?


Expert’s personal knowledge


Rules and meaning encoded into the data and software

Data
curation


Data
curation

involves:


Data management


Adding value to data


Data sharing for re
-
use


Data preservation for later re
-
use


http://www.dcc.ac.uk/news/what
-
makes
-
data
-
curation



= open data

= data
curation

Digital
Curation

Centre

DDI Alliance

Open data brings benefits and risks

open
data

more users

highlights
data
curation

failures

justifies
data
curation

costs

pressure
for more
user
support

expands
expert
community

increases
risk of
poor
analysis

Complementary ideas


Actively curated data will:


Remain technologically accessible


Be easier to understand (and therefore use)


Data
curation

will benefit from data being made
more open:


Data that is in active use tends to remain usable


Widely used data is better understood than isolated data



Thank you

Hamish James

Manager, Information Management

hamish.james@stats.govt.nz

04 931 4237