4.6 Test bed with “Real-Life Applications” [SP5]


Oct 1, 2013 (3 years and 8 months ago)



Test bed with “Real
Life Applications” [SP5]


Scientific Background

The biotechnological

omic revolution in life sciences research is leading to a bioinformatics
revolution as well. New bioinformatics applications, being methods, tools and infras
tructure, are
essential to

omics life science experimentation. Analogous with biotechnological applications,
they are usually developed, tested and
used in a local environment with

, yet expert, users. If proven valuable, ove
r time the life sciences community

to use both
biotechnological and bioinformatics applications. This means the
se applications

have to be scaled
up and made accessible to
the growing user community
. Additional

intuitive interfaces and
advanced vis
ualization techniques to use these applications might be required, as the intended
user group
exends beyond

bioinformatics experts. Scaling bioinformatics applications is not a
trivial exercise. It
is an often underestimated effort. It

rigorous t
esting of every
application in a

environment with sufficient well
chosen biological, medical or
pharmaceutical end
user situations

and may require ICT related research as well.

To explain the function of a test bed area for bioinformatics, Figure

1 depicts a schematic
representation of the developmental route bioinformatics applications follow into maturity. Three
distinct phases can be distinguished:


Research and development


Scaling, testing, and validating



All phases require an appropri
ate environment
, comprising ICT infrastructure,
tools and/or applications,

that support the specific demands of the activities of each phase.

The research and development phase
is driven by a biological research question and
starts of

the development of
a specific life sciences
, aimed towards answering this

in the context of
genomics biotechnologies and/or

omics data, that require

application to deal with. Therefore, this
first phase inv
olves bio

informatics research to develop and test
local bioinformatics applications
consisting of methods, tools and/or

These local bio
informatics applications,
that are used by (few) expert users, are the
starting p
oint for the scaling, testing, and
validating phase. In this phase the
application will be adapated as to serve
the numerous (non
expert) users, it
eventually is intended for.
. Only those
applications that are considered to be
valuable for an extensive lif
e sciences
research community will be considered
for scaling, because this will bring on
many ICT requirements and demands
quite an effort. Scaling can be done in two
different ways; direct scaling in an
ad hoc
approach using

Figure 1

Developmental route Bio
informatics applications

for ea
ch application
or scaling via embedding

the application

in an integrated environment, like
e environment. In both approaches a test bed is needed for testing and validation. This test
bed will use well
chosen life science applications to test and va
lidate the scaled bioinformatics

The support phase starts of

with scaled, tested and validated bioinformatics applications that will
be placed in a support environment. This environment support

both directly scaled and integrated
ics applications
by providing
ICT support and application expertise. Life
science end
users will use the application once it is available in the support environment.


Scientific & Technological Innovation

Applications are essential in each developme
ntal phase and every area of biotechniques and
bioinformatics innovation. Therefore, there are many types of applications
e.g. life sciences
research applications, bioinformatics applications, test bed applications and so on. In general, they
provide a fo
cus for developing new methods, instrumentation and software components and are
the ultimate reference point in testing and validating the quality and effectiveness of the results.
To allow applications to perform their critical role in testing and validat
ion, an innovative and
comprehensive test bed environment must be available. In this test bed, applications should be
tested on a scale, comparable to the “real life” scale on which the applications eventually will be
used. Scale in this context refers to
number of users, number of computer resources, bandwidth,
network configuration, and amount of data. Establishing the test bed environments for scaling up
and validation with ‘real
life’ applications is the key aspect of this subprogram.

International dev

In itself it is not uncommon to secure suitability of any application in a production environment
by taking it through a complete developmental chain, from research and development
environment, via scaled
up test and validation environment to
roduction and support
environment. In various areas of IT research test bed environments are more established, for
instance a
ll major grid projects to date have recognised the need for such an environment to
validate their research.

Obviously, in industria
l research and development
, test bed environments
are standard. However to our knowledge, in life sciences and in particular in bioinformatics, no
major international initiative exists that includes such an explicit test and validation environment.
nly not in the context of a strong integration with

omics bioinformatics research such as
present in the other BioRange subprograms.


Because of the nature of academic research, there is
often a major gap between research and
development of

methodologies, tools and infrastructure and their implementation.
Usually, bioinformatics applications will be developed and tested in a local environment, on a
local scale, and with local life science applications. If pro
ven successful, they usually are
published and employed locally, or be left orphaned, because implementation requires different
research expertise and is typically not funded. This severely hampers the dissemination of these
bioinformatics applications and

expertise throughout the life science community, although many
scientists could benefit from them. This was the incentive to include a subprogram in the
BioRange project for implementation by scaling and testing in a test bed environment with real
life ap

Moreover, by introducing the test bed subprogram, the whole chain from research unto supported
environment is in place. Within the BioRange subprograms SP 1
3, bioinformatics applications
will be researched and developed. Proven bioinformatics
applications from these subprograms

and from other research activities outside the project

that are
esteemed valuable
by enough life
science researchers, will be scaled and validated within this subprogram’s test bed. Finally, scaled
and validated a
pplications will be implemented in the BioASP support environment.

The uniqueness of the test bed originates also from the fact that
the tests will be carried out using
several challenging “real
life” life sciences applications among others provided by the


Furthermore, these real life application test cases are different from the
development applications and will be tested in another environment than the local research and
development environment. Thus further ensuring proficient
scaling and validation.

Typically scaling efforts concern one or
limited number of bioinformatics applications that

independently brought from a local research and development environment to a dedicated
environment for general usage. In this subp
rogram two major approaches towards scaling up and
validation are pursued; the direct approach and an integrated approach. Practical consideration
concerning the nature of the application (e.g legacy issues) may lead to the choice for scaling up
in a dedic
ated test and validation environment. Hampering issues will be avoided in this
development process and results can be delivered in the timeliest way, however each application
will require a specific approach towards its diversity. Considerations of impleme
ntation time and
feasibility will play a prominent role. Finally, no interoperability between different applications is
established and the consequence will be a collection of individual scaled applications each in a
dedicated environment.
In the integrate
d approach, applications will be scaled and validated in a

integrated environment, such as the VL
e Proof
Concept environment. Here, diversity is
essential in validating completeness and applicability of VL
E methodology for bioinformatics at
all possi
ble scales. Once this has been established, an integrated concept is available that allows
bioinformatics applications from the BioRange program as well as others to be assembled as
modules of a generic environment. On a national scale, innovative environm
ents can be created
on demand, incorporating resources for computing, visualisation, high
bandwidth networks and
security. By providing test cases, methods, tools, ICT infrastructure, and expertise, this
subprogram will allow a common and globally availabl
e infrastructure to be established.

With the key activity of this subprogram, establishing test beds with real
life applications, the
BiOrange program will base itself and build upon hardware and software infrastructure that is the
result of other Bsik pro
grams in particular VL
e where Grid and basic application support and
GigaPort where networking is involved. The integrated test bed environment will be built on top
of the VL
e Proof of Concept environment. The connection of a bioinformatics program, with

strong ICT programs will put the whole BioRange program, but particular this subprogram in an
excellent start position to achieve its mission.

The construction of the integrated test bed environment will comprise both hardware components
plus a compr
ehensive, consistent and innovative software environment. It not only offers a
for validating the resulting applications of the other BioRange subprograms,

but also
poses challenging complex system design problems on which bio
informaticians and c
scientists can work together in a multidisciplinary effort.
It requires integration of software
developed inside the BioRange and VL
e programs with technology developed elsewhere through
clear, open standards based interfaces and exchange of exper
tise. The hardware infrastructures,

sets of computers, storage and networks, that will be used by this subprogram will differ in
scale, level of heterogeneity, and in the characteristics of communication.
Eventually, judging
quality and effectiveness
of the implemented bioinformatics applications is a task performed in a
concerted action by a multidisciplinary team. This co
operation may well be a stimulus for further


Scope of the subprogram in the context above

The key question for

this subprogram is: Will the within BioRange developed bioinformatics
methodology and resulting software scale for use in real
life applications? The objective is to
show scalability and reliability of bioinformatics applications using realistic test bed

environments with challenging real
life test applications in the realm of life sciences. At the same
time making these bioinformatics applications available in a dedicated or integrated environment
to a broad range of life sciences researchers that are no
experts. The

omics test applications are
based among others on new biotechnologies such as micro
arrays for genomics and
transcriptomics and mass
spectroscopy for proteomics. In order to realize this objective, the focus
will be on several research issu


Selecting valuable bioinformatics applications to scale
. Selection of

appropriate real
test cases

in life sciences to test them.


Building an integrated problem
solving environment for bioinformatics applications.


Scaling and valid
ating bioinformatics applications with real
life applications in life


Providing suitable non
expert end user interfacing for bioinformatics applications.

Except for building
an integrated problem
solving environment for bioinformatics application
all issues apply
to the dedicated as well as to the integrated approach for constructing a test
. First of all there has to be criteria drawn up for the selection of bioinformatics
applications that will be scaled up.
Present software is
the set of existing

commercial and open

applications, algorithms that are developed using this software and applications that
emanate from the other BioRange subprograms or from research and development activities
outside the realm of BioRange p
rogram. The eventual environment delivered by this program will
also leverage complementary work that implements emerging and established standards in life
science application such as MIAME (Minimum Information About Micro Arrays) for micro
The selection criteria for applications might include the development stage of
the application, the acceptance through peer review, the anticipated number of end users, the
foreseen impact on life sciences, and the feasibility for scaling. Likewise
, real
life applications in
the area of life sciences have to be defined that can serve as test cases for the scaling and
validation of selected bioinformatics applications. These test cases should really reflect the
employment of these appli
everyday life sciences research. There


inventory of


should be set up,
concerning the general character of the research
question, the number of end users, their localisation and use of the bioinformatics ap
plication, and
parameters for success.

In order to be able to integrate bioinformatics applications, there have to be a suitable problem
solving environment. In this subprogram, this integrated problem
solving environment will be
built intertwined with th
e proof
concept environment of the ICT program VL
e program
“Scaling up to and validating in real
life applications”. A lot of informatics problems have to be
addressed in building a problem
solving environment such as proper integration of softwa
re, use
of generic and reusable VL
e components, seamless access to globally available resources, and
parameters for reliability and availability of the environment.

Once the bioinformatics applications and associated real
life test applications are selec
ted scaling
and validating in the test bed environment can start. In case of the integrated approach a basic
solving environment should be in place. There are several practical and technological
issues that require attention such as, the chosen sca
ling approach, the practical execution of the
test, the technical aspects of the scaling, and the parameters for validation. Technical scaling

in this respect involve among others

cooperative federated information management on
the database level,
intelligent user interfaces on the middleware level, high
power computing on
the grid level,

Even in the test bed environment the initial users are usually domain and application experts.
However, in many cases the ultimate end user is no expert. Therefor
e effort has to be put into
bridge this final gap to the real life end users by providing suitable non
expert interfacing for the
bioinformatics applications. This mainly involves establishing the minimal requirements for
intuitive interfacing plus develop
ment and evaluation of those interfaces in the real life research

The scaling
up and validation work is complementary to the four other subprograms and is
divided in four research themes that will not further be subdivided into projects.


Knowledge to be gained & skills to be developed

The primary knowledge that is gained and skills that are developed concern the construction of a
test bed environment with real life applications. Obviously there are several technical issue

involved, such a
s fitting the various bioinformatics and ICT standards into the test bed
environment. Over the course of this subprogram we anticipate also learning about and
implementing an iterative development cycle between bioinformatics application research and
opment, scaling and testing, and support phases. How to achieve such an iterative
developmental cycle is key knowledge to be gained in this subprogram. Furthermore, major
advancements are expected in the area of the integrated bioinformatics test environme
nt, which is
built in close collaboration and as an extension of the ICT proof
concept environment in the
e program. Because this subproject exercises an approach
that involves several science areas,
there will be certainly significant progression i
n multidisciplinary collaboration. Therefore, in a
more general sense we expect to learn more about:


value assessment and selection of bioinformatics applications for use by a general
population of life sciences researchers.


technical problems and solution
s to achieve dedicated as well as integrated test bed


use of ICT and bioinformatics standards to realize smooth scaling and integration.


scaling of bioinformatics applications into integrated environments, such as the VL
concept en


definition and selection of appropriate real
life applications in life sciences that can serve
as test cases for scaling and validation of bioinformatics applications.


implementation of a important developmental cycle between the various enviro
nments of
each developmental phase being; research and development, scaling and validation, and


multidisciplinary collaboration necessary


tackle all issues related to scaling and
integration of bioinformatics applications.


the true value of the

test bed concept for scaling bioinformatics applications to a real life


Scientific Approach & Methodology

This sub
program has a strong dependency towards the other sub
programs of the BioRange
, as shown in Figu
re 2. Some
of BioRange

s applications



from sub
program 1
to 3 are developed in a
specified local environment.

Such an environment can especially be
adapted to meet the requirements set by the
biological question under study, but more
often it will

made up of

just the




that are

at an institute

Other BioRange
from sub
program 1 to 3 are
developed in the
Rapid Prototyping
Environment of sub
program 4.

his also is a
local environmen

where a large amount of
freedom exists in the choice of ICT
tructure and databases / models

the applications are developed under
the e
science paradigm and from the start the
aim is to embed analysis tools and applications
in a probl
em solving environment.

Both env


external tools

can be
connected to

external datasources

External tools can be open source tools like R

for statistic
al computing

Figure 2 Test
bed in BioRange Project



analysis and comprehension of
omic data

. They also can be

commercial application
, e.g.

for statistical


for data analysis and visualization
External datasources
, like Swiss
Prot Protein Knowledge Base


publicly accessible.

can be


in different ways (e.g. web interface, ftp access
, pu
tting limitations on its use in
side a tool

The applications in the test
, whether the result of
up appli
cations from the non
specified Local Environment to a Dedicated Test & Validation Environment or from the VL
Rapid Prototyping Environment to the VL
e Proof of Concept Environment,

use these
external tools and connect to external datasources where a

up application

encompasses the following phases that correspond with the objectives

in this sub

of applications and test cases.

In general
he selection of applications for scaling up, will be a weighting of
through peer review, the anticipated number of end users, the foreseen impact on life sciences

and feasibility.
When an application is esteemed to have a high impact on life science, but it
seems not yet feasible to integrate it in the
e Proof

of Concept Environment, a choice for
scaling up in the Dedicated Environment may be made.

Or i
nitially when the Problem Solving
nvironment still has to be build for the greater part,
scaling up may be postponed

Some application
, especially
in respect t
o use of
programming paradigm, programming
language, modularity

and adaptation to standards
, will be
more ready for scaling up than other
This means that practical considerations, i.e. t
he amount of work to be done, will
also determine a choi
se for (the moment of) scaling up.
Hier is ook de winst die met scaling
up kan worden behaald in termen van performance belangrijk.

Voor VL
e Proof of Concept Environment is een belangrijk criterium de toegevoegde waarde
die ontstaat door de applicatie in

een geintegreerde omgeving te plaatsen.
applicatie die
resultaten van verschillende micro array platforms met elkaar vergelijkt zal baat
hebben bij een omgeving waarin het mogelijk is, resultaten van die platformen met een
similar user interfa
ce uit externe databronnen te halen)

van applicatie is

een multidis

, waarbij

uit het
biologische domein en uit het ICT domein tegen elkaar afgewogen moeten worden.

Er zal
begonnen moeten worden met het opstellen van d
eze requirements
concerning the general
character of the research question, the number of
end users, their localisation and
use of the bioinformatics application
, ICT parameters like performance, bandwi
th, stability
and parameters for succes

Deze inventory kan dan ook gebruikt worden om de
real life
te kiezen.

Notie dat
als aan scaled up applications grote voordelen zitten in de zin van interoperabiliteit,
performance of stabiliteit het makkelijker zal zijn
researchers zo ver te

krijgen dat ze hiermee
gaan testen (werken).

Determination of the requirements set by the application looking at the scale of its intended

Configuration of the application in the scaled environment. Validation of the scaled

In the lo
cal environments constraints set to programming paradigms, programming languages,
used bandwidth and used standards are, if any, local. On the testbed, this freedom will be
limited, but there will be a gain in adaptability, interoperability, security and u

Reconfiguration and where necessary recoding of (parts of) the applications will in the
integrated approach be done in close collaboration with the VL
e project. In the dedicated
this is not the case and ad hoc collaboration with skilled

informaticians in the
domain of high computing and networking will have to be established.

Het is in de context van dit project opmerkelijk en veelzeggend dat zowel in het ICT
als in het bioinformatica domein, standaarden in centrum van de belangs
telling staan. De
mogelijkheden die modulariteit biedt bij het verbinden van deeldomeinen geldt voor beide
vakgebieden. Met standaarden in de het biologie/bioinformatica domein worden standaarden
bedoeld die een bepaald aspect van de werkelijkheid, bijv. m
icro array (MAGE
OM) of PPI
experimenten (PEDRo

Proteomics Experiment Data Repository,
Nature Biotechnology,
March 2003
), modelleren. Maar ook andere formele beschrijvingen van de biologische
werkelijkheid vallen hieronder (e.g. Open Biological Ontologies
http://obo.sourceforge.net/main.html). In bioinformatica is er sprake van emerging standards,
onderdeel van dit

is te bekijken hoe we deze standaarden kunnen gebruiken.
geldt voor beide approaches, zij het
dat dit voor de VL
e PSE van vit
aal belang is.
The use of
these standards in

or the integration of standards in the underlying datamodel of
scaled up environment

has a number of advantages:


acceptance and validation
. Standards are accepted by the user community. By virtu
of within standard general agreement on means of measurement and information
content of experiments, standards

where available

will offer the core of an
environment for validating methodology and software;
Gebruikers zijn bekend met
standaarden, daar
om is het makkelijker om user interfaces te bouwen

Standaarden bieden mogelijkheid om internationaal met contributors aan de standaard
samen te werken.


developmental speed
. Models that underly standards often are the product of a
lengthy co
operation of a
panel of international domain experts. Adapting standards
in the PSE can therefore decrease both time needed for the assembly of the validation
infrastructure as developmental time of the Proof of Concept.


. Standards are to be used over a pr
olonged period of time. A good
standard therefore is modelled at an abstraction level that is able to foster new
technologies that are not available yet. Within the Bi
ange project, where new
methodologies are to
added into the scaled environments
, it is
this quality of
standards that may speed up the creation
of them
to a great extent;


. Software developers in open source as well as in the commercial
domain tend to conform to standards. Hence, (part of) the underlying datamodel of
these so
ftware applications is identical. Moreover, API’s may be available that
facilitate software integration t
hrough a standard driven model.

Resultaat van deze programma lijn is kennis omtrent How to mould these (different)
standards into the scaled applicatio
ns and PSE’s.

Behalve standaarden in bovenstaande zin, zijn er ook de facto standaarden van analyse tools.
Algorithms for visualisation, statistics and methods in statistical learning are complex.
Implementations of these algorithms often reach a state of
stability, only after a long time of
recoding and error removal. Therefore, to ease the burden of validating these algorithms, in
the granular and integrated approach it must be made possible to run unmodified
implementations of these algorithms. The means

of setting up, interfacing and upscaling basic
mathematical and statistical applications like Matlab (http://www.mathworks.com), R
project.org), S

and SAS
(http://support.sas.com/rnd/app) is important knowled
ge to be acquired in this subprogram.

Hoewel alle BioRange applicaties als open source software beschikbaar worden gesteld, this

does not imply that all constituents in the Proof of Concept will be open source software or
even be software using open standa
rds. In bioinformatics a number of broadly used
applications is commercial. SRS (
) for data integration, Matlab
(http://www.mathworks.com) for numerical computations with matrices
and vectors, (Rosetta
Resolver (
) for micro array analysis and Spotfire
) for data analysis and visualisation are a few exa
mples. The depth
to which commercial applications will be integrated in the
scaled enviroments

cannot be
addressed in general, because it depends on the make of the commercial software. However,
when an algorithm or application from one of the BiOrange pro
gram lines uses a commercial
application, integration is mandatory.
Because in commercial software the building of user
interfaces often has received much attention, building the
scaled application

can, especially in
this realm, benefit from integration of

commercial software.

Omdat er op SARA mirrors lopen van belangrijke
external datasources, is het inpassen ervan
in de scaled environment altijd mogelijk.

Embedding the application in the Problem Solving Environment and extending the Problem
Solving Enviro
nment by it.

Certification for use of the scaled software in the grid environment will follow the route set by
the VL
e program.

In geintegreerde approach is de fasering belangrijk: a functional set of grid software
components is made available ear
ly on in

the project, as a bases

to create a BioRange real
environment. A first version shall necessarily include many components imported from Open
Source Grid projects in the Netherlands, Europe and the US. Over time, this set of software
will be enhanced w
ith results emanating from VL
e, such that the application
level PSEs can
optimally match the globally available software components using open standards. All grid
software will be available as open source, as is now customary worldwide, for applications
nd developers.

development of a Problem Solving Environment, i.e. an application that supports all stages of
the experimental process, requires a coordinated cooperation between informaticians (e.g.
experts in the field of collaborative information managem
ent, intelligent user interfaces,
algorithm parallelization etc.) and domain experts. Therefore Program line 4 of the VL
project and Program line 5 of BioRange must be closely intertwined.

Input from the bioinformatics domain in this cooperation will be

on embedding of specific
BioRange applications in a more generic framework.

Wat betreft infrastructuur en databases/model zal de focus liggen op gebruik van
(biologische/bioinformatica) standaarden, het samen gebruiken van meerdere van deze
standaarden, m
odel optimalisatie en het bijdragen aan internationale initiatieven.

Wat betreft de ontwikkeling van analyse tools: validatie ervan zal vergemakkelijkt worden als
zoveel mogelijk reusable components gebruikt worden (hetzij uit de VL
e lagen, hetzij door
t gebruiken van wrappers die reeds gevalideerde algoritmes implementeren).

Intelligent user interfaces are a means om tot een eenduidiger omgeving te komen, die de
eindgebruiker in grote mate kan ondersteunen in het experimentele proces

In close cooperati
on with program line 4 from the VL
e project, applications must be
embedded in the PSE.

Design of
expert end user interfacing for bioinformatics applications.

Design of user interfacing is onderdeel van de constructie van een PSE.
Maar ook voor de
icated approach belangrijk.

Design van interface
s is een research area op zich

stukje uit origineel hfdstk 6:

Even the best software fails to fulfil its promise when its users have to approach the system
through a non
intuitive, labour
intensive, or even

plain incomprehensible interface. For
NBIC, the software constitutes a set of tools. To design a good interface we need a better
understanding of the intended user group, biologists. Within the group, there will be
individual preferences. We will approach

the design of NBIC interfaces as an integrated
design effort that combines behavioural and technical research to arrive at interfaces that
allow users to work with minimal overhead. It is the combination of behavioural and technical
research, applied to a

committed and well
characterised user group, which makes this
research unique in the world. Never before have behavioural and technical aspects been
considered on a par and in an integrated way in a project of this size that will produce
concrete results
to be offered to the target population. The state of the art in the fields
addressed by this WP can be characterised as on their way to maturity.