Cochrane Linked Data Project: From “Star Trek” to the
Director of Web Development, Cochrane Web Team
Lorne Becker, Cochrane Web Team and Cochrane Innovations
For the attention of
User stories and user research
Further reading and viewing
Beginning in May of 2011, members of the
Web Team, Cochrane Editorial Unit (CEU), IMS
Team, Wiley and consultants from Ontoba held the first meeting to explore the use of
semantic web technologies to enable more dynamic use of Cochrane content both with The
Cochrane Library and to allow Cochrane
to connect to the “web of data” and forge potential
partnerships with those working in this new, online and technological context. Originally
dubbed the “Star Trek” project, due to its futuristic thinking, the project progressed
throughout the remainder of
2011 and in to 2012 mainly focusing on showing proof
concept through an initial set of use cases. Then, from March 2012, the project entered its
second phase and became, officially, the Cochrane Linked Data project.
The original thinking and impetus be
hind this project was grew out of the both developments
on the web in the area of linking data and injecting “meaning” and structure into content as
well as the reports from our various users via research done by Wiley and others that our
users would like
to see other views of our content. “Thinking outside the container of the
Review”, the full
text PDF presentation that is our current standard, thus became the task.
This required “interrogating” our currently RevMan XML structure and to look out how this
structure could be improved or augmented to support doing interesting things with the
content such as providing new ways to browse and search the content and re
package it for
various users in various contexts.
Cochrane Reviews are great, but…
There are pr
oblems that limit their use by some people
Difficult to wade through all of the text
Difficult to understand the figures, terminology, and other bits of the Review
Hard to compare interventions without reading multiple Reviews
Moving from studies in
CENTRAL to Reviews that included that study difficult
Can be difficult to find the Review you seek
: The basics
The linked data approach allows the possibility for a machine (i.e. a computer program) to
query) a web page or set of pages and return specific portions of interest to
the user. For example, a semantic web standard called “GoodRelations” using linked data
markup to enrich search results so that product details can be extracted and presented in
search results including photos, price, user reviews and ratings and other information that
the user can use to make their purchasing decision. Another example relates to display of
recipes in Google and other search engines. Display of recipe results in
Google is also being
enhanced by linked data markup, for example. Google “New York Cheesecake recipe”
you can see below that a photo, rating and preview of results appear:
Machines aren‘t good at reading web pages
Data on the web is
meant for human consumption
Machines need the data to be structured
Once structured, information can be more easily shared within datasets and across
Fortunately, Cochrane Reviews are structured
but we still need to teach the machines how
ead them, where to find data within them and how the data is related.
The web is
moving from a web of documents to a “web of data”. Right now, the links on web pages are
between documents but the data and content
web pages and in databases is largel
devoid of any “meaning”. The semantic web and linked data are a way to move toward a
web of data that allows for more meaningful connections between things. See the “Further
reading and viewing” section for more info.
Cochrane Semantic Model
web technology stack (
) uses ontologies (semantic models) to describe a domain. For example, Cochrane
Reviews can be described
using OWL, the Web Ontology Language, and RDFS, RDF Schema,
to map the classes and relationships of the various component
s. So, a Review includes a
number of studies and each study
may have, for example, a risk of bias assessment in a
Review. Once these c
oncepts and relationships are made explicit, a machine can then
“understand” the underlying content. Using an ontology with data in RDF (Resource
Description Framework) format, a simple data model that uses “triples” to store information
that is query
against a given ontology or set of ontologies that describe the data.
Here is a simple example:
RDF stores data in triples:
This is the way humans think as well, in sentences.
<Director German Ctr
<Director German Ctr>
So, given the first 2 statements, the machine could infer the 3rd statement.
We have created an ontology
, a semantic model, for Cochrane Reviews and studies. Latest
version can be found here:
It is still a work
progress and needs to be evaluated and te
sted to be sure the inferences it makes are
consistent with Cochrane methods and that it can fulfill the use cases and thus the needs of
our various end
User stories and user research
Projects already conducted by Wiley and Cochrane have indicate
d that end users would like
to find and view Cochrane Content in a variety of different ways, and developers of new
The Cochrane Library
would like to be able to select and manipulate sections of
Cochrane Reviews for repackaging into new produ
cts. The use of improved XML structure
and semantic web technologies could facilitate the delivery of “dynamic Cochrane content”.
From this research and other thinking within the Linked Data Project and within the RevMan
Advisory Committee (RAC) and other
groups within Cochrane, we have developed lists of
“user stories” that inform larger sets of use cases.
Using industry techniques we learned
from our consultants, Ontoba, we have used various rubrics and tools to arrive at and
describe these user stories a
nd use cases.
‘So that…’ phrases
One way to capture user stories is to use the “So that…” framework to describe what people
want to do with your content, on your website, etc. You translate desired features into the
form: “As a xxx, I want to be able to
I can zzz.” Here are some examples from
the Linked Data Project:
As a ‘XXX’, I want to see all the information about a study in CRS, so that...
‘Clinician’: I can see if the paper is relevant to my clinical question, before reading
‘Systematic reviewer’: I can screen the paper to see if it is relevant to my review,
without having to read the full report.
‘Anybody’: So that I can easily compare the characteristics of studies, as the CRS
format is common across entries.
‘XXX’, I want to see all risk of bias analysis conducted on a study, so that...
‘Clinician’: I can see if the study is biased, and the results trustworthy.
‘Clinician’: I can see if there are differing opinions on the biases in the study from
uthors, and this may help me reach my own conclusions about whether
or not I think the study is biased.
‘Systematic Reviewer’: I can identify whether someone has already done the work
of assessing the risk of bias of a study, and this may save me time. I c
the information as a starting point and amend if I think it is needed for my own
review, or I could use the information after I have performed my own
assessments to see how they differ.
From groups of user stories, we are able to build out use
cases that can inform potential
prototypes of functionality for use on our websites. One example is the idea of an “Asthma
Super Centre”, or browser of the evidence on Asthma. Another one we’ve been working is a
CENTRAL demonstrator that shows the power of
linking between studies and Reviews and
the information in Reviews about studies they evaluate.
User stories and use cases
For the current linked data project, we have been focusing on two sorts of user stories. One
is the idea of an
“Asthma Super Centr
or browser of the Cochrane evidence on Asthma
that would address our perception that
users would like to find and view Cochrane Content
in a variety of different ways
. The generic user story for this section has been the following:
As a reader of Cochrane Reviews, I would like to:
Filter reviews by
to show me the subset most relevant to me
of those reviews in a format that works for me
Link out to selected content
(both Cochrane & non
chrane) that would
enhance the usefulness of the review material
The second focus for the linked data project has been a
explores the potential for linking between studies and Reviews and the information in
Reviews about stud
ies they evaluate. The primary user story that we have been addressing
in this section is:
As a Cochrane author who has identified a single trial report that is relevant to my review, I
would like to:
See what other published reports from the same trial
have been identified in the
“studified” data in CRS
See which other Cochrane Reviews have this as one of their included studies
See the Risk of Bias appraisals of this study from those other Reviews
o that I can improve my review by using the work that
others in the Collaboration have
3. The demonstrator
As part of phase 2 of the Cochrane Linked Data Project, the Web Team, CEU and Ontoba
have created a demonstrator site in which we can build out these initial use cases and where
we can have a “sandbox” for demonstrating the power of using linked data with Cochrane
and other external content. At present, the demonstrator only includes a s
ubset of the
asthma reviews produced by the Airways group. The demonstrator is at
and has functionality that relates to both the Asthma
Supcercenter and the CRS/CDSR
Searching Reviews by drug name.
Currently, there is no cross
variant names of drugs in Cochrane Reviews. We have linked to Drugbank
) which includes most of the
variants of drug names
including the different brand names and generic names used in different countries.
We have created a “semantic search” that allows users to type any name for an
asthma medication and find the relevant Cochrane Reviews. See:
. This functionality would greatly
improve the discoverability of Cochrane content in The Cochrane Library as now, for
example, if you search for “Pr
ozac” you get zero results, but if you search for
“fluoxetine” you get 30 results.
Displaying selected portions of reviews.
Clicking on any title on the “List of
Reviews” page in the demonstrator (
takes you to a custom view of that review that we have created by including sections
of the review suggested during the
Strategic Discussion in Paris.
This capability of
showing selected portions of a revie
w, and rearranging their order could allow us in
future to devise different
for different user groups, to allow users to
customize their own Cochrane view by selecting the specific components and their
order, or to compare reviews by looking at
components from 2 or more Reviews side
Linking out to selected content.
In addition to linking to Drugbank as noted
above, we have linked to SIDER, a linked data set that includes information on side
effects from FDA label information (see
Finding which Cochrane Reviews have included a particular study.
review page in the demonstrator includes the list of included studies from the
with a link to a specific study page for each item on the list. Each study page
includes a list of all of the reviews (in our limited set) that have included that study
sing the unique study identifier from CRS and the links that CRS provides betw
studies and Reviews (see
for an example).
at other published reports from the same trial have been identified
in the “studified” data in CRS.
Once again using the links with CRS, each study
page in the demonstrator includes a list of all published reports from the study that
have been identified b
y Cochrane collaborators and either used in reviews or studified
See the Risk of Bias appraisals a single study from different Reviews.
information is also included on each study page in the demonstrator. In some cases
(as in the O’Byrne exa
mple above), there is good agreement. Some other examples
have more variation
While the above examples are simple, they demonstrate and show the proof
this approach and, critically, the data in the “triple store” beneath this website is completely
dynamic. There are only ca. 40 Reviews on Asthma in
there now but if we were to put all
Cochrane Reviews and their related studies in the linked data repository, the queries would
The technology behind the demonstrator
Demonstrator.dev.cochrane.org uses the Drupal
source content management system
(CMS), the same system used to produce 130+ of the websites for The Cochrane
Collaboration. Drupal “plays nicely” with the semantic web stack including an RDFx module
and a very powerful module called SPARQL Views whi
ch allows for SPARQL queries to be
constructed within the core Drupal Views system.
With our triple store linked data repository
running in the background at a canonical data.cochrane.org address and
server, we use Drupal and its RDF and S
PARQL modules to quickly create a working website
for creating working prototypes that can be quickly styled using Drupal’s built
in theming and
4. Future plans
Our experience with the linked data project to date has convinced us that it
has potential to
become an “enabling technology” for the Collaboration that could allow us to do more with
our data. However, there are a number of issues that should be explored as we decide on
how best to integrate linked data within the Cochrane IT s
tructure. These include:
Potential additional user stories
The technical architecture including implications for the IMS, Web Team, CRS and our
publisher of increased use of linked data
Adding structure and standardization to Cochrane reviews
Our success in realizing the r
elatively limited goals of the Linked D
roject to date has
encouraged us to look at additional user stories that might be addressed using this approach
including several items on the RevMan
wish list. For example, RevMan case # 119285
"Provide easy access from RevMan to relevant sections of other reviews using the
same studies via CRS. E.g. if you were completing the RoB table for a study you could easily
see how other authors ha
ve assessed the risk of bias for that study" is very similar to the
CRS/CDSR user story that we have been working on and case 122027 calls for "Interaction
between RevMan & CRS" without offering specific details.
Some of the user stories that might be expl
ored using this approach include the following:
As a Managing Editor, or as a Cochrane Review user wishing to keep up with a specific area
of content, I want to see the date of publication for a subset of reviews (e.g.
included in an overview, arti
cle, guideline, etc) so that I
can see if any have been upda
since I last looked at them
This came up as a specific request from an ME, but could easily
apply to writers of Cochrane overviews, guidelines, book chapters, etc.
rs to generate a visual graphic highlighting each treatment
being compared in their review. Each node would be a treatment, each line at lest one RCT
with numbers corresponding to the number of RCTs.
"Calculator for estimating power to overt
hrow current primary outcome.
example, for a very potent intervention with high precision it may need a study total of
around 15,000 people with a neutral result to drag the findings back to being null. Should it
be a weak finding a trial of 100 may su
stantially change the result."
This refers to how we would actually go about building all this out in reality within, alongside
or otherwise in our current systems, workflows and dataflows. An industry standard in
ntic web and linked data technologies is to “not blow up the company”
but to innovate alongside existing tools and technologies to create a metadata store that
better describes the content but leave the existing content store(s) alone. But, we might
o innovate in the authoring process and/or other parts of the content production
process as well. This is all still be to determined and will be discussed at the Linked Data
Project meeting in London from 4
6 December 2012.
Paul Wilton from Ontoba drew up
a possible technical architecture diagram to provide us
with an example of one way we could consider:
Adding structure a
nd standardization to Cochrane R
fact that Cochrane Reviews are very structured has been critical to the success of the
linked data project to date. However, this structure could be greatly improved by coding
some key elements in a standard way across reviews. For example, the only wa
determine which interventions have been included in a Review is to parse the text in the title
of each forest plot. A standardized way of coding the I and the C for each analysis would
improve the power and precision of linked data queries of CDSR. I
deally, all elements of the
Population, Interventions, Comparisons and Outcomes covered in the Cochrane Reviews
would be coded using some standard taxonomy.
Unfortunately, there is no currently existing taxonomy that adequately addresses this need,
ough several widely used taxonomies could partially address our requirements. One
approach to this problem would be for the Collaboration to build on the various CRG topic
lists to develop a Cochrane taxonomy which would not be identical to any individual
taxonomy, but would mirror some specific portions of a handful of key taxonomies in a way
that will allow meaningful linkages to them.
The taxonomy could be built gradually by working with individual CRGs. The process has
already been initiated with th
e Airways group as part of the Cochrane linked
The CEU browse list would gradually evolve from its current structure to the new taxonomy.
As each CRG completed its section of the taxonomy, the relevant section of the CRG browse
would be re
placed. The eventual result would be that the CEU browse would be completely
replaced by the new taxonomy, and each review would have only a single set of topics.
. Further reading and viewing
Here are some presentations, videos and articles to provide f
urther background on both
linked data and the semantic web as well as the work so far in Cochrane in the “Star Trek”
and Linked Data Project.
Linked Data and Cochrane Reviews: A Report from the “Star Trek” Crew
from Madrid Colloquium, October 2011
Sustainability and Cochrane Revie
ws: How Technology can Help
by Chris Mavergames
from UK Co
ntributors’ Meeting in Loughboroug
Web 3.0: The Semantic Web
Linked Data and the Web of Data
Intro to the Semantic Web
The Semantic Web of Data
Lee, inventor of the World Wide Web
Here is a glossary of terms related to
linked data as well as a few related to Cochrane.
Application Programming Interface
allows different pieces of software to communicate.
Register of ControlLed Trials (Central)
commonly known in
indexing and cataloguing, controlled vocabularies use pre
specific and agreed
upon sets of terms for use in taxonomies, thesauri and other systems to
tag and organize content and data.
source Content Management System (CMS)
drupal.org. The Cochrane
Web Team uses Drupal for the 130+ websites it manages.
Layouts and designs for Drupal
A module in Drupal that is basically a GUI (Graphical User Interface) for querying the
(MySQL) behind Drupal for displaying content on a website in more or less any
form you like.
is the most powerful
vocabulary for publishing all of the details of your products and services in a way friendly to
search engines, mobile applications, and browser extensions. By adding a bit of extra code
to your Web content, you make sure that potenti
al customers realize all the great features
and services and the benefits of doing business with you, because their computers can
extract and present this information with ease.”
Part of the movement known as the Semantic Web or Web 3.0, linked
data refers to a set of
concepts and standards for connecting data on the web and across data silos. See:
Linked Life Data
“A semantic data integration platform for the biomedical domain”
It includes the Unified Medical Language System (UMLS) which
includes SNOMED CT as well as Drugbank, both used in the Linked Data Project
demonstrator site at
Put simply, “data about data”. Data that describes your content.
An ontology is a specification of a conceptualization.”
Ontologies in the semantic web are
used to describe a domain included the classes and
properties and relationships between
The Web Ontology Language. See:
semantic repository software or “triple store” currently used in the Cochrane Linked Data
Resource Description Framework. A data model for storing data in “tripl
RDF Schema language.
See videos above!
Systematized Nomenclature of Medicine
. A controlled vocabulary of medical
SPARQL Protocol and RDF Query Language. The query language for querying data in RDF
module that integrates the SPARQL query languages with the Views module to
create displays of content on a website.
Less formal way of creating a system to organize content. Note: there is substantial debate
the difference between ontologies, tax
onomies, controlled vocabularies and thesauri!
RDF triples. In the RDF data model, data is stored as triples with a subject
object. There are multiple serializations for RDF including RDF
XML, Turtle and N
is a purpose
for the storage and retrieval of
a triple being a data entity composed of subje
object, like "Bob is 35" or
"Bob knows Fred".