CCPE09v3 - myExperiment

innateearthyInternet and Web Development

Feb 2, 2013 (5 years and 5 months ago)




first complete


Concurrency Computat.: Pract. Exper.


Open Science:

The myExperiment a

D. De Rour
, Carole Goble

, S
ergejs Aleksejevs

, Sean Bechhofer

, Jiten Bhagat


Don Cruickshank
, Dunc
an Hull

, Yuwei Lin

, Danius Michaelides
, David Newman

Rob Procter

* School of Electronics and Computer Science, University of Southampton,

Southampton SO17 1BJ, UK.

School of Computer Science, The University of Manchester,
M13 9PL


from original

and structure of this one

myExperiment has set out to provide the social software and services to support the scientific process, focusing on the
‘time to experiment’ phase of the scholarly knowledge cycle

rather than the data deluge from new experimental
techniques. In the context of open science this phase is also experiencing a deluge of scientific objects

not just data
but protocols, methods and the new artefacts of digital science such as workflows,
provenance records and ontologies.
myExperiment has demonstrated the role of a social web site in addressing this challenge.

In this paper we
how myexperiment f
acilitates the management and sharing of research workflows
, supports a social model fo
content curation tailored to the scientist and community, supports open science by exposing content and functionality
into the users’ tools and applications, and establishes a more general notion of Research Object and the e

tific Workflow, Virtual Research Environment, web 2.0, Open Repositories, Data Curation


While the computer science community has focused on accelerating processing, if we are to accelerate time to
discovery then our research effort needs

to look at ‘accelerating’ the human part of the discovery cycle.
Scientific advance relies on a social process in which scientists share hypothesis, insights and results, and the
data and methods that support these. Traditionally scholarly discourse and d
issemination has focused around the
publication of peer reviewed articles, mediated by the scholarly publishing process and established structures
such as conferences. However, the scientific process (figure 1) has many other opportunities for supporting

The systematic publication of primary and secondary data sets, along with standard metadata sufficient to
support their interpretation, is now becoming an accepted, or at least expected, practice, although the tying
together published results

with the data that upon which they are based is still poorly supported [ref]

, software tools and scripts

published through libraries
and Web sites. This includes
community services
such as the



which hosts
contributed re

in the
nanotechnology domain, and OpenWetWare which provides a
n exchange for techniques in

Data and algorithms commonly accessible using the Web as
dissemination distributed applications

through web services or web ser
vers [ref].
There are now tens of thousands of publically available web
services across business and science.

need to share (and find) not just the digital materials of scientific research but also the digital methods
and processes: the protocol
s, plans, and standard operating procedures of bench science and the scripts,
workflows and provenance records of e
Science. Methods are scientific commodities in their own right, with


first complete

associated intellectual property, metadata, and life cycles; and as wit
h data and articles, subject to their own
forms of authorship, credit and reuse criteria.

By pooling and sharing methods

we have the potential to accelerate science by pooling and sharing know
how and best practice, avoiding reinvention and hence reducin
g time
experiment [ref]

decoupled communities of scientists who are not organized into predetermined Virtual Organisations

workflows, for example, are often complex and challenging to build, and can require specialist
expertise that
is hard
won and may be outside the skill
set of those who need [ref CCPE reuse paper]. The
workflows in [trps] took a bioinformatics expert six months and over 40 versions to develop; however, once
developed they were immediately reusable by other, perhaps

less experienced, e
Scientists accelerating
their research [

By combining methods with results

we can accelerate discovery by enabling transparent, comparable and
reproducible science [ref].
packaging and aggregating methods with

data, results t
members, groups

any form of digital
research object

sharing these across applications
as publication units
we can
work towards an open

that is outside any specific
Research objects
coupled with a

objective or designed to support a


intended to capture a scientific investigation or research question and are expected to be


A data and method deluge
demands new techn
iques, especially in the context of open science. The new
instrument that we bring to bear on this challenge is provided by
society itself

it is the scale of community
participation and the net
work effects that this brings.
This instrument offers new way
s of tackling difficult
challenges; for example, the ‘decay’ over time of research objects can be addressed by community curation.

Accelerating time to experiment requires
social infrastructure

for open science
There is great potential in
providing social

tools to support the scientific process and the sourcing, sharing and continued curation of
scientific research objects [wikinomics article]. This is possible because increasingly the various research
objects are born or available digitally. Scientists ar

beginning to use blogs, wikis and social networks to
facilitate more rapid and immediate sharing of research, a phenomenon sometimes characterised as Science 2.0
, ref
]. The Open Science movement [ref], though currently niche, vocally advocates
the large scale, open
distributed collaboration is enabled by making data, methods and results freely available on the Web.

By adopting social content sharing tools
for research object repositories we can harness a social
infrastructure that enables social

networking around scientific objects and provides community support for
social tagging, comments, ratings and recommendations and social network analysis and reuse mining
(what is used with what, for what and by whom), and remixing of new research objects

from previously
deposited ones. We can take advantage of popular and familiar user interfaces of social content sharing sites
such as Flickr, YouTube and Slideshare [

By adopting an open, extensible and participative

for re
search object
can become readily
available for reuse by others

and draw

on other services as
much as possible
should not

oblige the scientist to come to
a repository
, but rather make it as easy as
igure 1: A
cceleration in the open science lifecycle

(Cameron Neylon)

Not sure we have space for this



first complete

possible to bring
the conten

to the scientist
’s own environment
is essential for adoption

in turn is essential to build
a community and catalyse

Open science is thus the process of opening up content (sharing research objects in controll
ed and appropriate
ways) and opening up applications (sharing research objects and the functionality of their repositories with

[ref] is a socially
sourced content repository that

supports the sharing
and curating
of research
bjects used by scientists,
specifically focused on


and experiment plans.

For researchers it
provides a

social infrastructure that encourages sharing and a platform for conducting research, through familiar
user interfaces. For develo
pers it provides an open, extensible and participative environment.

The production
team is made up of software developers, scientific informaticians from a range of disciplines and social
Launched in November 2007

and built using Web 2.0
social networking approaches according to
Web 2.0 design principles [ref]

Facilitates the management and sharing of research workflows.


public repository

has established a

significant collection of scientifi
c workflows
, spanning multiple
(biology, chemistry, social science, music, astronomy)
and multiple workflow systems, which
has been accessed by

over 16,000 users worldwide. At the time of writing

the public
site has over 600
different workflows (200+ versions), drawn from XX workflow management systems including Taverna,
Kepler, Triana, Trident, etc etc). There are 1600 registered users.
How many plans? Include the notion of
Objects here, and Packs here as th
e start of objects.

In section 2 we introduce myExperiment, briefly present
our development methodology, highlight the characteristics of workflows and other forms of methods such
as plans and standard operating

that influence

and compare ou
r work with other method

Supports a social model for content

tailored to the scientist and community
. P
roducers of
research objects should have inc
entives to make them available and
consumers need to be able to discover
and reuse th
em; all will benefit from self

and community

myExperiment is a fruitful environment
in which to study its use by social scientists which has revealed shortcomings and issues to address and
. In section 3 we describe the socia
l model that myExperiment implements and discuss
it in practice as identified by a user study that has shadowed and steered the development of the repository.
In particular we show that the content is roughly split into a market and a toolbox; and that sha
ring is
desirable and possible but anonymous reuse is
We compare our social content approach to other
content services in science and outside science

Supports open

by exposing its content and functionality into the users’ tools and ap
and absorbing other interfaces.
myExperiment provides an open, extensible environment to permit ease
of integration with other software, tools and services, and benefit from participative contribution of
software. In contrast to social web sites

like Facebook and mySpace, developers can download, reuse and
repurpose myExperiment itself, and the codebase is evolving as it is used across multiple projects. In
section 4 we show how, by exposing the myExperiment functionality, new interfaces have bee
n built and
existing interfaces have incorporated myExperiment functionality, including plug
ins to the Taverna
workbench, Facebook applications, an iGoogle gadget
based research dashboard and a Silverlight interface,
and Chemistry Electronic Lab Notebooks

and ‘blogging the lab’ [ref].

Establishes the more general notion of Research Object and the e

In section 5 we discuss
how myExperiment is a first step towards a general notion of Research Object, which captures aggregations
of objects and al
so encompasses the other forms of data in myExperiment, and is the part of a greater vision
of interoperable e
Laboratories. We describe how
myExperiment makes Research Objects accessible and

beyond the core repository.

Our Research Objects are
represented in RDF and we have
developed a myExperiment ontology which uses Dublin Core metadata for research objects and FOAF for
the social network information. To interwork with other repositories we adopted the Object Reuse and
Exchange representation
from the Open Archives initiative, which is based on named RDF graphs. We
envisage that the scholarly publishing process will evolve to support this more general notion of scientific
research object, which will facilitate reusable and reproducible

What is the appropriate related


an important component in the revolution in
sharing an
d publishing scientific results,
established itself as a valuable and unique repository

with a growing

international pr

demonstrates the success, and exposes the challenges, of blending modern social curation methods with the



March 2009



first complete

demands of researchers sharing hard
won intellectual assets and research works within a scholarly
communication lifecycle
. We close in Sect
ion 6 with discussion and future work.




can you provide material on other repositories like pipeline pilot please?

introduce myExperiment, briefly present our development methodology, highlight the character
istics of
workflows and other forms of methods such as plans and standard operating

that influence

compare our work with other method repositories.


Web provides a platform for delivering not just documents and data but also services
which support the
research process.

Scientific workflows are the means to compose these, providing descriptions of


the co
ordinated execution of multiple tasks

so that, for example, data analysis

and simulations can be
repeated and
accurately reported
Alongside e
xperiment plans, Standard Operating Procedures and laboratory
utomated workflows are one of the most recent forms of scientific digital methods, and one that has
gained popularity and adoption in a short time [1]
. They
represent the methods compon
ent of modern research
and are valuable and important scholarly

assets in their own right

There are many different workflow systems,
each providing its own representation of the workflow descriptions and environment for
executing the
For example Taverna and Trident

The myExperiment

was motivated by observing a clear need to share workflows
and also

by a
frustration with existing sys
tems which
missed the social dimension, merely making things avail
able rather than
ng and controlling sharing and
presented complex user interfaces out of line with the popular web sites
that people are using on an everyday basis, thereby demanding further skill.
The motivation and rationale for the

project is discussed in detail in [4
5] and the design principles in [6].

Crucially we

focused on the
user experience,
providing an attractive and immediately understandable web interface that uses the metaphors
and behaviours of popular tools used in eve
life. I
t is immediately familiar to a new generation of students
and researchers.

xperiment provides the familiar features of a social website, closely tailored to the needs of researchers:


about recent activity in

online communi

and the research objects that they share

Management of the social network through friendships and groups. There is
fine control over the
and sharing of

Research Objects

within the network

Discovery, uploading and execution of workflows, and ma
nagement of
the essential extrinsic info
and ‘social metadata’ such as
licence, tags, sharing, ratings.

Credits and attributions are an essential feature
to support flow of rights and reputation.

Creation of packs of digital items, which can be tagg
ed and shared like individual items.

2.2 Architecture

The architecture of one instance of m
yExperiment is shown in Figure 3

igure 2
: T
he myExperiment social website



first complete

In line with our open environment capability, the database server, search server and external workflow enactors
are all separate s
ystems to which the main application connects. The interfaces are accessed via a web server
that handles load balancing over a cluster of mongrel application servers. Ultimately scalability will also be
achieved by federating multiple instances of myExperi

myExperiment, which is released under the BSD licence,

is built in the Ruby on Rails web application
framework and follows the Model View Controller abstractions set out in Rails. By keeping with the
architectural design of Rails we were able to le
verage many of its capabilities to build features for users rapidly.

he database server, search server and external workflow enactors are all separate systems
with simple
to which the main application connects. Various mechanisms for authentica
tion are provid
ed based on
the interfaces used; for example, f
or end users, authentication can be via external OpenID services.

The agile ‘perpetual beta’ development process [8] requires frequent updates to be rolled out to the main ser
vice. This is aided by maintaining a separate server for final testing of code, which allows
preview and test of new features and checking for performance regressions with automated tools. A test server
containing a recent snapshot of the public data from
the live site is also provided to developers writing
applications that make use of the myExperiment API.

2.3 Related Work

Duncan check here please

Workflow management systems already make workflows available for sharing, through repository stores for
lows developed as part of projects or communities. Unlike myExperiment, they are tied to a particular
type of workflow and do not offer programmatic access to the workflows. For example, the
Kepler Hydrant

( is a site (under dev
elopment) for sharing Kepler workflows. It supports workflow
execution and allows users to assign permissions to other users.


Customer Hub

( has the ambition of enabling Inforsense workflow users to share bes
t practices and
leverage community knowledge. However, it does not rely on the social model.


provides a public site where biologists can run analyses and for developers it provides an open
framework for tool and data integra
tion. It does not provide social infrastructure to support sharing of

Social networking sites such as

( do not support research objects, and
the handling of attribution and licensing may not be adequate for scientists.
Facebook supports the development
of plug
in applications. Similarly, science
specific social networks, like Epernicus ( only
support the social part of the VRE function.

The research objects of

( are
igure 3: The myExperiment architecture



first complete


used in biology labs and, through use of a wiki, OpenWetWare supports the social model and open
environment. However, it does not itself intend to be a platform for conducting computationally



( is a go
od example of a VRE that takes a portal approach. It
focuses on the nanotechnology domain and provides web
based resources for research, education and
collaboration. It also provides simulation tools that can be accessed from the browser. In terms of socia
infrastructure it provides workspaces, online meetings and user groups. In contrast, myExperiment deliberately
set out to build a Web 2.0 site which would be familiar to users, choosing a Web application framework (Ruby
on Rails) rather than a portal fra
mework. It offers a rich API and remote execution. myExperiment is designed
to provide services to a portal and also to be used as a Web 2.0 ‘skin’ over existing portal services.



Dan’s stats are in
this section

Rob and Yuwei’s words are in this section

In section 3 we describe the social model that myExperiment implements and discuss it in practice as identified
by a user study that has shadowed and steered the development of the repository. In parti
cular we show that the
content is roughly split into a market and a toolbox; and that sharing is desirable and possible but anonymous
reuse is
. We compare our social content approach to other content services in science and outside

periment is distinctive because it majors on the social dimension, and it can itself be seen as an experiment
to explore whether scientific communities share sufficiently in order to benefit from the network effects of a
social web site.

To support produce
rs of research objects in contributing to myExperiment we provide members of the site with
support for credit and attribution, and fine control over the visibility and sharing of research objects.
Early user
feedback revealed this to be the most critical f
actor in making a social web site

for use by scientists.

Other members of the site ‘consuming’ the research objects can view, download, tag, review and ‘favourite’
them, which aids their discovery and enhances reputation of the producers. Additi
onally, content exposed
publicly is discoverable through search engines.

Unless they are maintained, workflows and other research objects can cease to be reusable over time

effectively ‘decay’, though in fact it is their context that is changing. For

example, a recent change in gene
identifiers by one service provider led to a myExperiment announcement for users of the affected workflows.
Useful workflows will be curated by the community that uses them, and the original authors are also encouraged
curate because they are getting credit for use of their work. Workflow decay is a difficult problem and
myExperiment provides a new approach through community curation.

Link paragraph...How do we measure this...usage stats, and talking to users


is of Usage

Placeholder for latest stats

Danius to update

Analysis of usage statistics over the period January
July 2008 demonstrates: (i) a rapidly
growing community, (ii) extensive use of contributed research objects and (iii) the deve
lopment of social

(i) Community size. At the time of writing, has 1051 activated accounts. There has
been a steady growth in the user base during 2008, with about 10
20 new users registering a week. Spikes in
registrations are due
to Taverna workshops that use myExperiment to host their tutorial materials and
conferences. 38% of the registered users are regular visitors. In a seven month period the site received
approximately 60000 page views in 13500 visits by 8581 unique visitors.

The figures are collected using Google
Analytics and do not include accesses made via the API. It is interesting to note that the number of unique
visitors is much larger than the number of registered users This suggests that the publicly visible content
on the
site is of value to a wider audience. There are

workflows and a further
workflows that are revised
versions. Workflows were downloaded a total of 50934 times, with three workflows commanding over a 1000
downloads each.

In terms of permissions
, 280 (85%) of the workflows are publicly visible whereas 252 (76%) are publically
downloadable. 40% of the workflows with restricted access are entirely private to the user and for the remaining
the user has elected to share with individual users and grou
ps. 36 workflows (over 10%) have been shared with
the owner granting edit permissions to specific users and groups. In addition there are 53 instances where users
have noted that a workflow is based on another workflow on the site. This indicates that the
site is supporting
collaboration amongst its users and that they are willing to contribute derived works.


Research Methods



first complete

Rob’s words start here

It is important to investigate how researchers actually make use of innovative tools such as myExperiment,

explore participatory patterns and cultures, how these shape the ways in which scientific knowledge is crafted
and communicated and to use the insights gained to help myExperiment adapt to users’ needs. Here, we present
evidence from an ongoing investi
gation of myExperiment users’ workflow sharing and re
use practices, their
motivations, concerns and potential barriers.

We have been conducting a series of interviews with myExperiment users. The study has been designed not only
to gather data about expec
tations and motivations as researchers join myExperiment but also to provide a
longitudinal perspective over a period of 24 months which enables us to track myExperiment users’ attitudes and
behaviour as they become more experienced. Interviewees are selec
ted on the basis of myExperiment activity
profiles, including workflows uploaded/downloaded; number of friends; group membership; group moderation
and discipline.

In total, we expect to conduct between 40
50 interviews with 25
30 respondents in 4
5 waves.

In each wave,
existing respondents have been re
interviewed and new respondents recruited. To date, we have conducted 34
interviews with 27 users through phone, instant messaging, via email and face
face. One user ahs been
interviewed three times; five

users interviewed twice, and the rest have been interviewed once. A summary is
given in Table 1.

Interviewees are recruited either via myExperiment or by snowball sampling (i.e., users suggested by


Findings: Motivations for Being a myEx
periment User

All interviewees report using myExperiment for publishing and disseminating workflows:

“The ability to be able to publish the workflows […] It’s much more convenient and more organised than just
publishing the XML file on my research group’s
site […] People who have the same interests or use the same
components […] they’ll more likely to find it on myExperiment. So, basically, for things like dissemination it’s
fantastic, makes it easy to point people out which I think it’s useful.”

er]: “So [publishing on myExperiment] is an alternative way of getting your work known?
[Interviewee}: “Yes […] in the future, I’ll be writing several workflows. I’ll cite them and myExperiment URLs
in the paper. It’s just a personal policy of mine. It’s a

bit like gene or protein sequences: I submit my

Some interviewees express a more collaborative ethos:

“When my solutions work, it would be great to share with others.”

“[…] I have the feeling that there’s a better opportunity for sharing schol
arly work in that way too.”

Other interviewees are aware of the importance of ‘network effects’:

“[…] we started to realise very quickly that it’s going to be the next kind of big area for different kinds of
research […] many people say to us ‘hey, we’re u
sing myExperiment. Can we have access to various different

1.3 Findings: Barriers and Enablers to Sharing Workflows

Research domains have emerged as an important barrier to reuse of workflows. The responses of our sample of
users suggest that wor
kflows do not easily migrate across domains, reflecting, perhaps, distinctive ‘patterns’ to
research processes in different domains. Many interviewees also commented that their research is relatively
advanced or is too specialised for many workflows to be
directly helpful to them. This may reflect that the
myExperiment community is still evolving and, as yet, is populated by early adopters, such that effects normally
attributable to social networks have yet to make themselves felt.

Good documentation is acc
epted as a key requisite for facilitating sharing and re
use. From our interviewees’
comments, they are still learning what constitutes good documentation for workflow discovery and sharing:

“It’s a compromise how much time a contributor would like to inv
est to make sharing workflow feasible. If I
have to be honest, my own workflows are not documented properly either. My workflow is used by a small
group of people. Once in a while when I present my work, that’s how they know how my workflow works.”

“[…] it

depends on how well people label various components. We’ve seen quite a few models with no
descriptions of what input and output are, or what kind of format and what assumptions they made.”



first complete

“The functionality is there, but […] people are being lazy or peo
ple don’t understand different ways they can put
information in to make that available to other users […] people make too many assumptions over what anyone
else knows about the system. It happens everywhere.”

“[…] I found browsing for things to be one of t
he functionality areas I think need help because I find it difficult
to find something […] some features that make it easier to find workflows would be useful.”

“[…] you can start asking questions to collaborators […] there’s flexibility there but it’s som
ething that can be
improved […] it’s users rather than myExperiment’s fault.”

Tagging is one example of how myExperiment users are grappling with establishing good documentation

“One of the issues we found is everyone kinds of puts things down as

a text mining tag which is not very specific
in order to get the different bits and pieces that we need. So users need to put in or people who actually put
things there need to start considering how to label things in order to have them being found.”

“I a
lways think of tags as supplements for navigation because you can’t really rely on people conforming to
tagging standard ways that other people doing it. So if it’s a low frequency tag you are not necessarily going to
get everything even related to it.”

f they can make it easier to discover workflows, like better using the metadata, promoting users to use
metadata, like a paper repository and then being able to search other things, that would be very helpful to me.”

Our interviewees also reveal how myExpe
riment affords a social solution to the problems of incomplete

“I included the workflow […] in the paper and […] on the myExperiment webpage there is also a link to the
pape. That might be the reason for the spike of interest […] I had a fe
w emails from people about how they can
get this workflow […] they got in touch to know how the workflow worked. I put the extra instruction on my
webpage […] they just need more help.”


Findings: Diverse Requirements for Sharing

We can discern from our

data two distinct myExperiment communities when it comes to workflow re
use. The
community which we will refer to as workflow ‘consumers’ prefer larger workflows ready to be down loaded
and enacted; the community of workflow ‘builders’ prefer smaller, mod
ularisedb workflows which can be
assembled and customised:

“I upload the complete thing and I also upload its parts. The idea I have is that workflow creators will more look
for components […]. As the end user you probably look for the larger workflows.”

A larger workflow might be too specific for my needs, it might be more worthwhile to look for the smaller
parts, to adjust them to my needs. Things in bioinformatics are often so specific, it can be difficult to find the
right thing, smaller things are eas
ier to evaluate.”

We might conclude that workflow consumers see myExperiment as a workflow ‘supermarket’; workflow
builders see it as a ‘toolbox’.



The results of our study myExperiment users reflect a community in formation, whose members’ a
intentions and expectations are diverse and evolving. As with any innovation, significant changes in attitudes,
intentions and expectations are likely to occur as the activities of community members filter through the
networks of relationships, r
einforcing successful innovations and defining good practice. Such ‘social learning’
(Fleck, 1993; Williams et al., 2004) is an important tool for the diffusion of innovations; myExperiment, of
course, is both the locus of the innovation and the platform f
or its diffusion.

As yet, we have not found evidence of a major shift in research cultures and practices such that, for example,
researchers are turning to unprompted and spontaneous collaboration through the sharing and re
use of artefacts.
It is likely t
hat, as we note above, it is simply too early for such behaviours to emerge. Rather, at present, a major
motivation for researchers using myExperiment is to build ‘social capital’, among their peers. This may seem
quite a modest aim when compared with the
Open Science vision of accelerated time to discovery but it is not an
insignificant step along the road to its realisation, Nor does it mean that sharing and re
use will not emerge as
significant features of myExperiment users’ behaviour over time as they
begin the master its challenges. Indeed,
there is some evidence of workflow re
use, and rather more evidence of a desire to re



first complete

It is clear that one current inhibiting factor in the sharing of workflows is a lack of adequate documentation and,
ly, lack of understanding as to what constitutes good practice. Underpinning this, of course, is the problem
of trust: “trust indicates a positive belief about the perceived reliability of, dependability of, and confidence in a
person, object or process” (
Fogg and Tseng, 1999). The question is what evidence would satisfy a potential
workflow user that a particular workflow a) matches what they are looking for and b) that it works reliably?
There are a number of different theoretical approaches to the study
of trust (Axelrod, 1997; Kipnis, 1996;
Luhmann, 1979; 1990), Luhmann suggests that most approaches fail to pay attention to “the social mechanisms
which generate trust” (1990:95). The key point is that there is no single mechanism which, a priori, guarante
es a
solution. What is important, as we see from our study, is that there are a range of ‘trust affordances’ to hand
when trust becomes a practical issue for myExperiment users. myExperiment facilitates social interaction,
enabling users to switch relative
ly seamlessly from workflow to workflow author and back again.

Our findings suggest that researchers will take time to adapt and possibly to redefine current practices for
producing and documenting scientific knowledge such that these new practices are com
patible with the sharing
and re
use of new kinds of knowledge artefacts such as workflows. They also suggest that members of the
myExperiment community are prepared to grapple with these challenges. Of course, such changes are likely to
become feasible onl
y when individual and community interests align. For example, a recent study of citations in
cancer microarray clinical trial papers showed that those that were linked to publicly available microarray data
received more citations (Piwowar, Day and Fridsma,

2007). There seems to us to be no prima facie reason why
such behaviours should not apply to other kinds of research resource. In the meantime, what is important is that
myExperiment is successful in growing a community of researchers and bootstrapping ne
twork effects and, in
this respect, the evidence of usage is promising.


David Newman’s words are in this section

Jits’ Silverlight words are in this section

In section 4 we show how, by exposing the myExperiment functionality,

new interfaces have been built and
existing interfaces have incorporated myExperiment functionality, including plug
ins to the Taverna workbench,
Facebook applications, an iGoogle gadget
based research dashboard and a Silverlight interface, and Chemistry
Electronic Lab Notebooks and ‘blogging the lab’ [ref].

Intro paragraph needed...



As well as bringing this capability to the user through the myExperiment interface, the API is designed so that
developers are easily able to build ‘functionality mas
hups’ over myExperiment for rapid prototyping of tools to
support researchers. These may be prescriptive interfaces for specific tasks, such as running preconfigured

To support the open and extensible environment we provide data access using bas
principles, and in line with the community we are increasingly adopting Atom as a means of delivering content
and synchronising with peer services. These interfaces have wide adoption in the developer community.

Though Ruby on Rails provides a mech
anism for automatically providing REST access, we decided to manage
the API separately so that we could respond to the requirements of API users, while also being independent of
codebase evolution. Hence the REST API is driven by an XML specification that
can be loaded and edited
within Microsoft Excel. This allows us to create an independent API specification with the added benefit that it
is in one place instead of spread across many model files. It also assists in generating documentation and tests.

n that control of visibility is crucial to myExperiment, we need a means of authenticated API access. This is
achieved by using the OAuth protocol, whose purpose is not just to authenticate that a user has given a service
consumer access to a service provi
der; it is a specific key that may have certain privileges assigned to it. With
OAuth, a user can create several keys which could be used with one service, and each of those keys may have a
different set of privileges.

A developer communi
ty is growing up a
round the API:

Developing new Interfaces
We have two exercises in building entirely new user interfaces to myExperiment’s
functionality. Firstly we are using Silverlight to build a rich
similarity search

and socially
driven workspace

that uses

e myExperiment API
together with

other common data source

like Google Search,

Scholar, CiteULike, Connotea, PubMed and so on.

Figure 1
shows how a user can initiate a
search from the
“Top Tags” cloud built from live myExperiment data

Secondly we h
ave built Google Gadgets for


first complete

myExperiment. Two of these are shown in Figure 2: the Tag Cloud gadget, and the
Workflows F
eed Reader
gadget which shows

the latest myExperiment workflows

Bringing myExperiment to existing interfaces
We have integrated with
the Taverna workflow workbench by
building a Taverna plugin for myExperiment, so that Taverna users can access the myExperiment capabilities
from within the Taverna environment (figure 3). We are currently integrating with Microsoft’s Trident
Scientific W
orkflow Workbench [6], and for this we have developed preliminary support in myExperiment for
sharing Windows Workflow Foundation (WWF) workflows. Finally we are working in conjunction with our
open science colleagues in chemistry to bring myExperiment to
gether with work on Electronic Lab Notebooks
and ‘blogging the lab’ [7].

Jits’s Silverlight words...

Our close work with users on improving the usability of the myExperiment website has informed many complex
requirements, one being the need for richer disc
overy tools. We explored the use of a Rich Internet Application
(RIA) technology

Microsoft Silverlight

in developing a simple search mashup tool.

Silverlight, in a similar
vein to Adobe Flash, is an extension to the browser in which rich content and fu
nctionality can be provided to
users. This goes beyond existing web standards with the disadvantage that users need to download additional

Our search mashup presents a clean interface that allows a user to focus on discovery without being
ted by the other features of myExperiment. We have used the keyword search and tag cloud functionality
(via the API) to allow discovery of all public content from the repository. These include
workflows, packs, files, groups and users.

rs of the mashup can also preview each individual search result (either inline or in a popup). If interested
they can then click through, which opens up the item in myExperi
ment (in a new browser window).
Because it
runs within a browser, the mashup can be

hosted in one place and then be accessed just like any other web
application. Our discussions with users suggest that rich mashups provide easy entry points to repositories like
myExperiment and, by taking a task
oriented approach, allows users to focus o
n the task at hand. More complex
and personalised workspace oriented applications could also provide useful interfaces in which scientists can
carry out their work.


Publishing knowledge to the Web in RDF

myExperiment's ability to share information
is one of its key advantages when it comes to closing the
experimental lifecycle loop. However it is important to consider the mechanisms for how this information is
shared. The myExperiment RESTful API has already demonstrated how a machine
oriented shari
ng mechanism
can allow the development of new interfaces in the form of "mashups" and "gadgets". The RESTful API
although extensible is quite rigid and requires any linking
up to be performed client
side, RDF provides a
framework so this can be executed se

myExperiment publishes all its public data as RDF at RDF has a very simple
object (triple) structure that facilitates linking
up but these relationships can be formalized


first complete

using a meta
structure pro
vided by a schema or ontology. By formalizing, additional information can be inferred
rather than having to define it explicitly. myExperiment uses a modularized ontology set to provide its
formalization ( The myExp
eriment ontology is modularized to promote
reuse with each module designed for a specific sub
domain, e.g. types of annotation/contribution, attribution and
creditation, packs, experiments, etc. As well as promoting reuse, the myExperiment ontology reuses
parts of
other ontologies/schemas:

FOAF and SIOC for representing the social network

Creative Commons for contribution licenses.

Dublin Core for common metadata properties

ORE for representing packs

Ontology Modules Architecture

Through reuse it i
s possible to make some sense of myExperiment data outside its domain, allowing data from
different sources to be collated. By making the myExperiment ontology reusable its saves reinvention and
allows similar projects to map their data in the same way. Si
gnificant effort is being given to representing
experiments and the data they produce in such a way that their insights can be shared across multiple fields. The
Scientific Discourse subgroup of the W3C's Health Care and Life Sciences
pic/HCLSIG/SWANSIOC) has been considering how to reconcile a number of ontologies
that treat expe
riments as first class objects.

A SPARQL endpoint is a way of providing this server
side linking
up. By collating all myExperiment's RDF
data in a single data
structure, known as a triplestore, it is possible to provide a very flexible querying interface
using the Semantic Web querying language SPARQL. myExperiment's SPARQL endpoint
( allows the execution of simple queries that

could be performed by a
RESTful API call with the appropriate parameters, such as returning all the workflows uploaded by a specific
user. However, if you wanted to further restrict that to those you had also commented on, the RESTful API
would not be abl
e to do this without additi
onal functionality being added.

SPARQL queries are essentially trying to map networks where one or more of the nodes or links are unknown.
myExperiment's RDF provides a listing of components (sources, sinks processors and links)
for Taverna
workflows; SPARQL provides a facitlity for searching for workflows where these components link up in a
specific user
defined way. This makes it possible to find a workflow that is much more tailored to a searcher's
requirements, which becomes e
ver more important a
s the number of workflows grow.

Returning SPARQL results is an appropriate format if this data is to be used within a Semantic Web application,
however this is often not the case. Results may need to be ported in a more generic way or u
nderstood by a real
person. The SPARQL endpoint allows results to be exported as comma
separated values (CSV) or visualized as
an HTML table. Other more specific use cases have also arisen, in particular a capability to represent SPARQL
queries that return

mappings (e.g. between users who are friends) as a matrix that can be exported as CSV. It is
important to understand that although RDF and SPARQL are Semantic Web technologies this should not
prevent them from working seamlessly with applications that are



Sean’s words are in this section

In section 5 we discuss how myExperiment is a first step towards a general notion of Research Object, which
captures aggregations of objects and also encompasses the othe
r forms of data in myExperiment, and is the part
of a greater vision of interoperable e
Laboratories. Our Research Objects are represented in RDF and we have
developed a myExperiment ontology which uses Dublin Core metadata for research objects and FOAF fo
r the
social network information. To interwork with other repositories we adopted the Object Reuse and Exchange
representation from the Open Archives initiative, which is based on named RDF graphs. We envisage that the
scholarly publishing process will evo
lve to support this more general notion of scientific research object, which
will facilitate reusable and reproducible
What is the appropriate related work?

The e

The e
laboratory or e
lab, is a set of integrated components that,
used together, form a distributed and
collaborative space for e
Science, facilitating the planning and execution of
in silico

experiments. An e
brings together people, materials and methods in order to support scientific

Central to the i
dea of an e
lab is the Research Object (RO). Research Objects are conceptual objects that bring
together resources used in scientific investigation. An RO is an aggregation of resources (data sets, analysis
methods, workflows, results, people) that tells a

particular story about an investigation, experiment or process.


first complete

The Research Object captures key information about the lifecycle of the investigation (for example provenance
information about analyses), facilitating reuse of results and repeatability of e
xperiments. ROs are the work
objects that are built, transformed and published in the course of scientific experiments. ROs form a common
currency for e
lab infrastructure, with a particular e
lab being built from components and services that consume
and p
roduce ROs.

ROs play a role both in driving the components and capabilities within an e
lab. For example, the
e of an RO can be used within a workbench to determine appropriate visualisation methods for the
contents of the RO. ROs are not,

however, simply internal to a particular e
lab platform

they will also play a
role in sharing/communicating not just between services and components within an e
lab, but also with other e
labs (or e
labs services). The layered approach (see below) is k
ey to achieving this, with services able to remain
agnostic in the face of info
rmation they do not understand.

Repeat, Reuse, Replay

An RO tells a story about an investigation. It provides not just an aggregation of resources, but additional
tion about the purpose, reason or rationale for the aggregation. ROs thus capture the investigation

As an example, consider researchers working with survey data (for example, a researcher working within the
Obesity e
Lab). Analyses on a complex su
rvey may be encapsulated in statistical scripts, but those scripts are
not generally associated with the data extraction process. As a result, the steps required to cut down and extract
information from a data set before running the analysis can be forgott
en. An RO that captures the entire process
including hypothesis; data extraction; workflows used to analyse and transform data; and ultimately publication
of results, supports the notion of
, providing sufficient information for a third party
to validate the
gation that has been performed.

Another key characteristic of ROs is
. The myExperiment experience shows the benefits of publishing
and sharing scientific workflows, where scientists can reuse and repurpose existing workfl
ows. The use of ROs
extends this further, bundling not just workflow but also data sets, intermediate results and hypotheses.

Finally, ROs provide a facility for
, allowing examination of the process undergone. Although related to
repeatability, repl
ay meets a different need. For example, replay of an RO can offer a coarser grained level of
detail, showing the major steps undertaken during an experiment. For example, the use of Coverflow style
previews in the Shared Genomics project [shared
allows replay of an investigation.

A Layered Architecture

The Open Archives Initiative's Object Reuse and Exchange (OAI
ORE) [ore
model] provides a standard for the
description and exchange of aggregations of Web resources. Research Objects provide an
extension of ORE,
with RO specific schemas or ontologies providing vocabulary to be used within instantiations of ORE. An upper
level RO Model describes the common characteristics across ROs (

identifying key stages in the lifecycle of
ROs as they move
from a draft to published state), while specific RO Domain Schemas describe extensions to
that vocabulary which are relevant to particular domains (e.g. identifying particular transformations that are
performed on social scienc data sets). This layered app
roach (See Figure RO
layers) allows components to offer
general RO functionality while remaining agnostic as to the domain specific content. Use of emerging standards
such as ORE also facilitates interoperation between an e
lab and other digital repository


examples include the JSTOR
repository [vandesompel09:assets], while repository infrastructures including
Dspace and Fedora are moving towards the support of OAI
ORE as an interchange format.

myExperiment Packs and publication of know



first complete

Nascent Research Objects are already present in myExperiment in the form of

(or Encapsulated
myExperiment Objects (EMOs) as referred to in earlier work). Packs allow the aggregation of heterogeneous
content, and have been used to provide bundle
s of related workflows (along with additional documentation) for
training purposes. Although packs support the collection of workflows with, for example, input data and service
log invocations, the detailed relationships between the items, which forms an e
ssential core of the Research
Object notion

is not explicitly represented.

As discussed in Section 4, public information describing myExperiment objects, including the Pack structures is
published as RDF data at
. This publication of Pack information makes use of the
ORE vocabularies and schemas, providing the layered approach as described above. In addition, the
publication of data reuses other ontologies and schemas such as FOAF

and SIOC for representing the social
network; Creative Commons for contribution licenses and Dublin Core for common metadata properties. Again,
this serves to increase interoperability and facilitates the consumption of myExperiment data in other

In [Barend Mons], the problem of "knowledge burying" is highlighted, where knowledge about investigations or
experiments is published in paper form, and text mining techniques are required to extract this knowledge,
leading to inefficient transfer of

information (See Figure burying). A view of "Research Object as publication",
packaging and associating data, results and methods as part of the publication process helps to overcome some
of these issues by ensuring that information and knowledge is not l
ost during that publication process.


Agreed on chat this will be worked inline more pervasively

important because part of
service approac
h and social curation story

Although the services available are well defined from a protocol and st
ructural point of view, they are often
poorly described in their usage and with very little documentation available on the internet. Much of the
knowledge and information on the use of services is captured within the minds of the bioinformaticians and in
he scientific workflows that utilise the services. There is a severe need to capture this knowledge and allow
l and community curation on it.

BioCatalogue provides an actively curated and centralised registry of Web services. Scientists can use
and rely
on BioCatalogue as a starting point in discovering what services they can utilise in their work. BioCatalogue
aims to support users in the whole lifecycle from service discovery to service usage. Whilst myExperiment is a
repository of scientific r
esearch objects BioCatalogue is a registry, where rich and accurate descriptions; loose
and semantic tagging; and in depth examples add the most value. Monitoring and recording service availability
is another key factor in assessing the usability of web se
rvices and adds significant value to the d

Social curation is a major challenge in an environment where many different types of stakeholders exist

service providers, service developers, end users and professional curators. The desig
n of the technical
functionality requires careful understanding of the different tensions between and priorities of these
stakeholders. We ensure that the BioCatalogue does not endorse any negative community feedback but that we
facilitate quality and expe
rt curation from users and professional curators. This requires not only highly usable
technical features and interfaces but policies need to be in place to allow transparency in how we handl
situations of libel and such.

Much like myExperiment, BioCatalo
gue is also a Web service itself, allowing other applications and clients
(such as the Taverna workflow workbench) to bring the functionality to their users. By integrating BioCatalogue
with myExperiment and other tools that scientists use, we bring BioCat
alogue intrinsically into the user's work
patterns and daily tasks, at the point of usage. This allows BioCatalogue t
o really add value to the user.

The development of the BioCatalogue service follows many of the same principles and methodologies of the
Experiment project and there is significant overlap between the two projects

from code sharing,
collaborative design and regular joint meetings. The release cycle (driven by a perpetual beta process) and
development environment match with myExperiment an
d we take the same user
driven design and development
approach, with a dedicated "friends" community and a community liasion officer who manages this.


The results of our study myExperiment users reflect a communit
y in formation, whose members’ attitudes,
intentions and expectations are diverse and evolving. As such, significant changes in attitudes, intentions and
expectations are likely to occur as the activities of community members filter through the networks of

relationships, reinforcing successful innovations and defining good practice.



first complete


Future work…


Please add people to this alphabetical list

The design of myExperiment
and Research Objects

has been
a collaborative exercise involv
ing a large group of

including Mark Borkum,
Les Carr,
Simon Coles,
Phil Couch

Catherine De Roure,
Tom Everleigh,
Fisher, Jeremy Frey, Antoon Goderis,
Matt Lee, Cameron Neylon,
Savas Parastatidis, Meik Poschen, Marco
Roos, Robert Stevens,

Franck Tanoh,
David Withers,
Katy Wolstencroft. myExperiment is funded
by the JISC Virtual
Research Environments programme,
Microsoft Technical Computing Initiative



Will be managed in endnote

just drop text inline or here f
or now so I know what they are!

[Barend Mons] B. Mons. Which gene did you mean? BMC Bioinformatics 6(), p.142 2005 DOI: 10.1186/1471

[1] Waldrop, M. Mitchell, “Science 2.0: Great New Tool, or Great Risk?”, Scientific American, Published onlin
e January 9,
2008 on

[2] Borda, Ann, et al. Report of the Working Group on Virtual Research Communities for the OST e
Infrastructure Steering
Group. London, UK, Office of
Science and Technology, 46pp. 2006.

[3] Gil, Y., Deelman, E., Ellisman, M. et al. “Examining the Challenges of Scientific Workflows”. IEEE Computer 40(12):
32. 2007.

[4] De Roure, D., Goble, C. and Stevens, R., “Designing the myExperiment Virtual Resear
ch Environment for the Social
Sharing of Workflows,”
IEEE International Conference on

Science and Grid Computing
, pp.603
610, 10
13 Dec. 2007

[5] De Roure, D., Goble, C. and Stevens, R.. “The Design and Realisation of the myExperiment Virtual Research
vironment for Social Sharing of Workflows”, Future Generation Computer Systems, published online July 2008.

[6] De Roure, D. and Goble, C. Six Principles of Software Design to Empower Scientists. IEEE Software. In Press, 2009.

[7] Oinn, T., Greenwood, M.,

Addis, al. “Taverna: lessons in creating a workflow environment for the life sciences,”
Concurrency and Computation: Practice and Experience 18, 10 Aug. 2006, 1067

[8] Lin, Y., Poschen, M., Procter, R. et al. “Agile Management: Strategies for D
eveloping a Social Networking Site for
Scientists,” in 4th International Conference on e
Social Science, 18
20 June 2008, Manchester, UK.

[9] Neylon, C. Openwetware blog. See

[10] Goderis, A., De Roure, D., Go
ble, C., Bhagat, J., Cruickshank, D., Fisher, P., Michaelides, D. and Tanoh, F.
“Discovering Scientific Workflows: The myExperiment Benchmarks,” IEEE Transactions on Automation Science and
Engineering . (Submitted 2008)

[11] O’Reilly, T. What is Web 2.0?

Axelrod, R. (1997). Complexity of Co
operation: agent based models of competitio
n and

collaboration. Princeton. NJ. Princeton University Press.

Fleck, J (1993). Innofusion: feedback in the innovation process. In: Stowell, S.A., West, D. and Howell, J.G.
(eds.) Systems Science: addressing global issues, Kluwer Academic / Plenum Publis
hers, pp. 169

Fogg, B. J. and Tseng, H. (1999). The elements of computer credibility. In Proceedings of CHI 99,

New York, NY: ACM, pp. 80

Kipnis, D. (1996). Trust and Technology. In R. M. Kramer and T. R. Tyler (eds.): Trust in

Organizations: Fron
tiers of Theory and Research, London: Sage, pp. 39

Luhmann, N (1990). Familiarity, Confidence, Trust: Problems and Alternatives. In Gambetta, D,

(ed.): Trust: Making and Breaking Cooperative Relations, Oxford. Basil Blackwell.

Piwowar, H., Day, R. and
Fridsma, D. (2007). Sharing Detailed Research Data Is Associated with Increased
Citation Rate. Nature Precedings : doi:10.1038/npre.2007.361.1

Williams, R., Stewart, J, Slack, R. (2004). Social Learning in Technological Innovation, Cheltenham, Edward