10.20: Alan Robinson: Middleware Technologies in Bioinformatics

powerfultennesseeBiotechnology

Oct 2, 2013 (3 years and 11 months ago)

86 views

Middleware Technologies in
Bioinformatics

Alan Robinson

Talk Outline


What is middleware?


Why middleware?


What does middleware offer bioinformatics?


Who is middleware for?


Some middleware technologies


Some practical uses of middleware

What is middleware?


The “stuff” between:


Hardware & software


OS & applications


Applications


Properties of middleware:


“Glue”


Integration


Interoperability


Infrastructure


Enables distributed computing.


Why Middleware?


Bioinformatics requires:


Access to multiple distributed resources


Needs information to be up
-
to
-
date


Minimal data redundancy


Robust applications


Extendable applications


Monolithic App. vs. Components


Portable software

Why Middleware?


Bioinformatics must contend with
increasing amounts of information


Bioinformatics must adapt to changing and
new technologies


Bioinformatics can learn from IT
experiences in other domains


For example:


Microarray data accentuates these problems
of integration and interoperability, but applies
to HTP sequencing and proteomics also.

Pathways

EnsEMBL

Human Genome

Gene Annotation

EMBL
-
Bank

DNA sequences

SWISS
-
PROT

+ TrEMBL

Protein Sequences

Array
-
Express

Microarray

Expression Data

EMSD

Macromolecular

Structure Data

IntAct

Protein Interactions

What do we know

about other molecules

involved in that

pathway?

Interoperation


& Integration

Middleware Technologies


CORBA:


Object
-
oriented / components


Web services:


XML
-
based


Grid:


Next generation?

CORBA Overview


CORBA allows the interconnection of
databases and applications, regardless of:


The computer language of the applications
that provide or use the objects


The machine architecture of the computers
involved


The geographical location of the computer
(connection through the Internet/Intranet)



CORBA

ORB

Stub

Skel

Client

Object

CORBA

ORB

IIOP

Stub

Skel

Client

Object

ORB

Stub

Skel

Client

Object

CORBA Services


Naming service


Trader service


Event service


Notification service


Object transaction
service


Security service


LifeCycle service


Relationship service


Persistent state
service


Externalisation
service


Object query service


Object properties
service


Concurrency service


Licensing service


Secure time service


Object collection
service


...

Object Management Architecture

Object Request Broker

Application objects

Vertical CORBA
facilities

Horizontal CORBA
facilities

CORBA services

Web services


A collection of XML
-
based technologies
developed by the e
-
business community to
address issues of:


Service discovery


-

Business processes


Interoperability


-

Data exchange


Major developers include:


Apache, IBM, HP, SUN & Microsoft (.NET)


http://www.webservices.org/


http://www.ibm.com/developerworks/webs
ervices/

Web Services Architecture

Web Services Stack

SOAP


S
imple
O
bject
A
ccess
P
rotocol


A lightweight protocol for exchange of
information in a decentralized, distributed
environment


A design goal is to encapsulate RPC calls
using the extensibility and flexibility of XML



http://www.w3c.org/TR/SOAP/



XML Messaging Using SOAP

WSDL


W
eb
S
ervices
D
efinition
L
anguage



A specification to describe networked
XML
-
based services


A simple way for service providers to
describe the basic format of requests to
their systems regardless of the underlying
protocol


http://www.w3.org/TR/wsdl/

UDDI


Universal Description, Discovery and
Integration


UDDI creates a platform
-
independent,

open framework & registry using the
Internet for:


Describing services


Discovering businesses


Integrating business services



http://www.uddi.org/

WSFL


Web Services Flow Language



An XML language for the description of
Web Services compositions



Describes how Web services may be
composed into new Web services to
support business processes



http://www
-
4.ibm.com/software/solutions/webservices/pdf/WSFL.pdf

Web Services Stack

Proposed Specifications for

Web services


Quality of service (WS
-
Quality)


Transactions


Service level agreements (WS
-
SLA)


Security (WS
-
Security)


Interactivity (WSXL)


...



Serious investment & momentum in WS.

Semantic Web


An evolution of the current Web


Information is given well
-
defined meaning,
better enabling computers and people to work
in co
-
operation


Data defined and linked for more effective
discovery, automation, integration, and reuse
across various applications


Enable users/agents to locate, select, employ,
compose, and monitor Web
-
based services
automatically


http://www.semanticweb.org/

DAML+OIL


A markup language with a rich set of
constructs that allow for the creation of
complex and robust ontologies


Written in RDF & RDFSchema, but
provides richer modelling primitives


Intent is to provide additional machine
-
processable semantics for resources



http://www.w
3
.org/TR/daml+oil
-
reference

Example of DAML



"
Parenthood is a more general relationship
than motherhood
" and "
Mary is the mother
of Bill
" together allow a DAML system to
conclude that "
Mary is the parent of Bill
"


If a user asks a DAML search system "
Who
are Bill's parents?
"


The system can respond that “
Mary is one of
Bill's parents
”, even though that fact is not
stated anywhere, but can only be derived by a
DAML application.

DAML
-
S


Supplies Web service providers with a core
set of markup language constructs for
describing the properties and capabilities of
their Web services in unambiguous,
computer
-
intepretable form by
agents



Facilitate the automation of Web service tasks
including automated discovery, execution,
interoperation, composition and monitoring


Builds on top of DAML+OIL


http://www.daml.org/services/

DAML for Web Services

What is the Grid?


“An environment that enables
geographically distributed scientists to
achieve research goals more effectively,
while enabling their results to be used in
developments elsewhere”



Typified by access to HPC & HPN


Globus & AccessGrid.

OGSA


O
pen
G
rids
S
ervices
A
rchitecture




A proposed evolution of the current
Globus toolkit towards a Grid system
architecture based on an integration of
Grid & Web service technologies




http://www.globus.org/ogsa/

OGSA


Architecture defines a uniform exposed
service semantics (the Grid service)


Defines standard mechanisms for creating,
naming, and discovering transient Grid
service instances


Provides location transparency and multiple
protocol bindings for service instances


Supports integration with underlying native
platform facilities.


OGSA


OGSA defines WSDL interfaces and
associated conventions,mechanisms
required for creating and composing
sophisticated distributed systems,
including lifetime management, change
management, and notification



Still embryonic.


What is e
-
Science?


e
-
Science is not just high bandwidth
communication and HPC running simulations
linked through “the GRID”


e
-
Science is about:


Exploiting digital technology to support all aspects
of scientific activity


Support for large
-
scale science through distributed
global collaborations


Formation of virtual co
-
laboratories allowing
scientists to work together irrespective of location


Universal access to scientific resources


Support for scientific community.

Prof. Tom Rodden
-

Nottm.

The Bottom Line...


Middleware

enabling

interoperability


The development of a communication
and computational infrastructure to
underpin the work of scientists


Distributed Annotation System


A web
-
based protocol for a distributed
sequence annotation system developed
by Lincoln Stein
et al.

at CSHL


A single server is the “reference server” and
serves essential genome structural
information


physical map, sequence, authorship information


Sequence annotation decentralised among
multiple third
-
party annotators and integrated
on an "as
-
needed" basis by client
-
side
software


Distributed Annotation System

The DAS System


Interrogate
annotation servers

to retrieve
and add features to the sequence
retrieved from the
reference server


Need a standard format to describe
sequence features


The format must be able to deal with relative
co
-
ordinates in which annotations are related
to arbitrary hierarchical landmarks


Assume have good sequence related by mapping
info


Distributed Annotation System

Client

Reference server

Annotation server 2

Annotation server
1

DAS Annotations


Annotations relate to a region of sequence


Each annotation is unambiguously located
by defining its position relative to a
reference sequence


Annotation co
-
ordinates stored relative to the
smallest sequencing unit since more stable
than co
-
ordinates based on links or
chromosomes

Client/Server Interactions


DAS is web
-
based


Clients query the reference and annotation
servers by sending formatted URL request to
the server


Request composed of


site
,
data source
,
command

and
arguments



Servers process the request and return
response as a formatted XML document

http://
stein.cshl.org
/das/
elegans
/
features
?
ref=CHROMOSOME_I&start=1000&stop=20

Soaplab


Soaplab

is

a

set

of

Web

Services

providing

programmatic

access

to

applications

on

remote

computers
.



Soaplab

uses

a

specification

for

an

Analysis

Service

(based

OMG's

Biomolecular

Sequence

Analysis

specification)



The

EMBL
-
EBI

has

a

Soaplab

service

running

on

top

of

several

tens

of

analyses

(most

coming

from

EMBOSS


Soaplab

does

not

access

individual

analysis

programs

directly

but

uses

a

general
-
purpose

package

AppLab

that

hides

all

details

about

finding,

starting,

controlling,

and

using

applications
.


http://industry.ebi.ac.uk/soaplab/

The myGrid Project



myGrid aims to design, develop &
demonstrate higher level functionalities
over an
existing

Web services & Grid
infrastructure

that support scientists in
making use of complex
distributed
resources



The exemplar domain is bioinformatics



http://www.mygrid.org.uk/

Converging Technologies

Agents

Grid Computing

Web
Technologies

Globus, Sun Grid Engine,
Condor, DS (Jini, Corba)

SOAP, WSDL, UDDI, WSFL

DAML+OIL, OWL, RDF(S)

ACL, methodology

An early adopter
for OGSA

myGrid Features


Publication of services


Service repository


Discovery of services


Ontology & metadata


Use & access to services


Interoperation


Personalisation


Personal repository


Annotation


Configuration


Sharing


Workflow & process


Composition


Enactment


Storage


Management of provenance


Recording of the process


Attribution


Notification


Tracking changes to services


Update of services


Security & trust.

The myGrid project seeks to make the use of on
-
line services easier by improving:

Conclusions...