Linked Open Data: a new resource for eResearch

wrendeceitInternet and Web Development

Oct 21, 2013 (3 years and 11 months ago)

50 views

Linked Open Data:

a new resource for eResearch

Dr Anne
Cregan

eResearch

Analyst, Intersect and ANDS

anne.cregan@intersect.org.au

What this talk will cover


Open data


The web of data


RDF triples


RDF graphs


The Linked Open Data project


Publishing to the web of data


Consuming the web of data

Open data


The philosophy and practice of
making data freely available to
everyone, without restrictions from
copyright, patents or other
mechanisms of control.

Why make data open?


Public money was used to fund the
work, so it should be available to the
public.


Facts cannot legally be copyrighted.


Sponsors of research do not get full
value for money unless the resulting
data are made freely available


In scientific research, the rate of
discovery is accelerated by better
access to data.


Source:
How to Make the Dream Come True: The Astronomers Data
Manifesto (Norris, 2007)

How to make open data useful…


Principles


Make it easy to find


Make it available to everyone


Separate it from the applications that
use it


Interlink it with related datasets in a
meaningful way


Make it machine
processable


The web of data


The web of data = a naming model









+ a data model










on the web



It’s a web
of interlinked data that machines
can read (whereas the web is a web of
interlinked documents for people to read)


Also known as the “Semantic Web” because
of its formal semantics for reasoning and its
relationship to meaning


The web of data


It is an initiative of the World Wide Web
Consortium (W3C
),
and is a collaborative
effort of many parties


It derives from W3C director Sir Tim Berners
-
Lee's vision of the Web as a
universal
medium for data, information, and knowledge
exchange
.


Like the web, anyone can publish to it:
anyone can say anything about anything.



The web of data


It is an initiative of the World Wide Web
Consortium (W3C) and is a collaborative effort
of many parties


It derives from W3C director Sir Tim Berners
-
Lee's vision of the Web as a
universal medium
for data, information, and knowledge
exchange
.


Like the web, anyone can publish to it: anyone
can say anything about anything.

However, they need to say it in RDF, not
HTML.



The web of data


It is an initiative of the World Wide Web
Consortium (W3C) and is a collaborative effort
of many parties


It derives from W3C director Sir Tim Berners
-
Lee's vision of the Web as a
universal medium
for data, information, and knowledge
exchange.


Like the web, anyone can publish to it: anyone
can say anything about anything.

However, they need to say it in RDF, not HTML.

And anything they want to talk about has to be
a URI.



URI = Uniform Resource Identifier


The naming model for the web of data


A URI is a unique name that identifies a resource


A resource is anything to which we can attach identity


A resource can be an information object, like a document
or a webpage, but it can also be a real world object, like
a person. It can be anything at all. For example:






A URL is a kind of URI that names the resource and also
indicates a means of acting upon or obtaining it via its
primary access mechanism e.g. http, ftp




URL:

http://www.w3.o
rg/People/Berne
rs
-
Lee/

URL:


http://www.w3.org/

TR/
rdf
-
concepts/

RDF =
Resource Description Framework


A framework for describing and linking
resources on the web


Allows URIs to be connected into a
directed graph


Based on the idea of triples



Subject

Predicate

Object

RDF =
Resource Description Framework


A framework for describing and linking
resources on the web


Allows
URIs

to be connected into a
directed graph


Based on the idea of triples: e.g.



intersect.org.au/inter
sect
-
team/AnneCregan


intersect.org.au

doac:organization

RDF =
Resource Description Framework

intersect.org.au

doac:organization

a
nds.org.au

doac:organization


Putting triples together creates a graph


intersect.org.au/inter
sect
-
team/AnneCregan

RDF =
Resource Description Framework

intersect.org.au

doac:organization

a
nds.org.au

doac:organization


Putting triples together creates a
graph


Nodes of the graph are
URIs

and literals



intersect.org.au/inter
sect
-
team/AnneCregan

“Anne”

foaf:firstName

RDF =
Resource Description Framework

intersect.org.au

doac:organization

a
nds.org.au

doac:organization


Has a schema to describe relationships
between things, called RDF Schema



intersect.org.au/inter
sect
-
team/AnneCregan

“Anne”

foaf:firstName

RDF =
Resource Description Framework

intersect.org.au

doac:organization

a
nds.org.au

doac:organization


Is a World Wide Web consortium (W3C)
Recommendation


Is part of the Semantic Web “stack”



intersect.org.au/inter
sect
-
team/AnneCregan

“Anne”

foaf:firstName

Semantic Web Technology Stack


The Semantic Web


standards build on

each other


URI is the naming
mechanism


RDF, RDF
-
Schema
and OWL are the
languages for
describing
resources
and relationships
between them


SPARQL is a query
language for querying
RDF graphs

RDF Graphs


Putting triples together creates a directed
graph


RDF Graphs


Putting triples together creates a directed
graph


RDF Graphs


Graphs can be interconnected by referring
to URIs in other graphs


RDF Graphs

Linking Open Data Project


Community project of the W3C Semantic
Web and Outreach (SWEO) group


Started in 2007


Has grown rapidly by members of the
community adding open datasets


Has created the largest existing RDF graph


over

18
billion triples!

Linking Open Data Project

October 2007








Linking Open Data Project

September 2008







Linking Open Data Project

July 2009







Linking Open Data Project








July 2009







Linking Open Data Project








April 2010







Linking Open Data Project


As at May 2009 had created a linked open data cloud
of 4.7 billion RDF triples; in April 2010 Linked Open
Numbers added another 14 billion triples


Datasets include:


DBpedia



linked data version of
wikipedia


US Census


2000 US Census data set


Gene Ontology


annotations from Gene Ontology db


Drug bank


info about FDA approved drugs


UniProt



life sciences data set


Lots of bio/life sciences data sets
-

BIO2RDF cloud


More info at
http://esw.w3.org/topic/TaskForces/CommunityProje
cts/LinkingOpenData/DataSets





Publishing to the Linked Open
Data Cloud


Principles

1.
Use
URIs

to name
things

2.
Use HTTP
URIs

so you can look up those
things on the web

3.
When someone looks up a URI, provide
useful information (“dereference
-
able”)

4.
Include RDF statements that link to other
URIs

so that they can discover related
things

These
principles

are

from

Tim
Berners
-
Lee‘s

2007
note
:

http://www.w3.org/DesignIssues/LinkedData.html




Consuming linked open data


Browsing linked data is easy


You need an RDF Browser like Tabulator,
Disco,
Zitgist
, Marbles and
OpenLink



Let’s go for a ride on Disco:

http://www4.wiwiss.fu
-
berlin.de/rdf_browser/

Start here: http://www.w3.org/People/Berners
-
Lee/card#i


We can travel through the linked open data
cloud between
URIs

linked using RDF


RDF Browsers include Marbles

http://www5.wiwiss.fu
-
berlin.de/marbles



Consuming linked open data


eResearch

example: Enabling drug discovery


Data sets published to the data cloud:


Linked CT

Linked Clinical Trials

60,000 trials in 158 countries


DrugBank


FDA
-
approved drugs

5,000 small molecule and biotech drugs




Diseasome

Disorders and Disease genes

4,300 Disorders, disease genes and associations


DailyMed


Chemical structures of marketed





drugs
124,000 triples and 29,600 links


SWAN




Alzheimers

Hypothesis Browser

Knowledgebase



Consuming linked open data

Using an RDF browser:



See all drugs in trials for Alzheimer’s disease in Linked CT,
including a Phase III trial for
Varenicline


Follow a link to data from
DailyMed

showing that
Varenicline

is
already on the market for nicotine addition. The typical dose is
1mg twice daily and the Linked CT trial used no higher than
that so no new safety issues.


Link to
DrugBank

to find that
Varenicline

is an alpha
-
4 beta
-
2
neuronal nicotine acetylcholine receptor agonist.


Diseasome

indicates that the corresponding genes are only
important in nicotine addiction, not
Alzheimers
.


But the SWAN Knowledgebase shows there are hypotheses
relating
Alzheimers

to nicotinic receptors through
amyloid

beta.





Consuming linked open data

Using the linked open data cloud with an RDF browser,

able to :


Browse data relating to companies, clinical trials,
drugs, diseases and genetic variation


See when extra data is available


Gain access to data without needing to map
identifiers and synonyms


interlinking has already
been done


Gain additional insights about interesting questions
to ask

Jentzsch

et al “Enabling Tailored Therapeutics with Linked Data”


events.linkeddata.org/ldow2009/papers/ldow2009_paper9.
pdf





Consuming linked open data


Querying using SPARQL Queries


A SPARQL endpoint enables users
(human or other) to query a knowledge
base via the SPARQL language.


Results are typically returned in one or
more machine
-
processable

formats.


Examples:

http://wiki.dbpedia.org/OnlineAccess



Types of Queries


Selection and extraction
queries


retrieve parts of the data based on its content, structure, or
position


Reduction queries


specify
which part of the data
not

to include in the
answer


Restructuring queries


restructure
data into possible formats/
serialisations


Aggregation queries


aggregate
several data item into one new data
item


Combination
and inference
queries


combine
information that is not explicitly connected



Summary


Open data


The web of data


RDF triples


RDF graphs


The Linked Open Data project


Publishing to the web of data


Consuming the web of data

Thankyou



More details are at


http://linkeddata.org/


http://esw.w3.org/topic/SweoIG/TaskForces/Communit
yProjects/LinkingOpenData




http://www.w3.org/2001/sw/




Questions and comments may be emailed
to
anne.cregan@intersect.org.au