Web services for Bioinformatics “The good, the bad and the ugly”

earthsomberBiotechnology

Sep 29, 2013 (3 years and 10 months ago)

88 views

Web services for Bioinformatics
“The good, the bad and the ugly”
Pieter Neerincx • Laboratory of Bioinformatics • WUR
1
Intro “Interoperability and the Glue”

Goal

Linking data and tools

Interoperability problem

Strategies

Centralised systems

Distributed systems
2
Intro “Interoperability and the Glue”

Centralized systems: “Fetch, modify & install locally”

Expensive Infrastructure

Fast

Updating

More work

Less dependent on other service providers
3
Intro “Interoperability and the Glue”

Distributed systems: “Leave data & tools where they are”

Cheaper Infrastructure

Slow

Updates by original content provider

Less work

More dependent on others
4
The good: Web services

Programmatic interfaces for Application to Application
communication over the WWW
(W3C)

Remote modules (*.pm, *.jar, etc.) / Lego bricks

Horrible, horrible term! (Tom Oin)
5

Web services == services offered by web servers and designed for
human access through a web browser
The good: Web services
6
The good: Web services

Use self-describing data format for:

Data description

Service description & discovery

Service execution

Platform independent

Programming/Scripting language independent

Protocol independent

Firewall proof
7
The good: Web services

eXtensible Markup Language (XML)

Document declaration
<?xml version="1.0" encoding="UTF-8"?>

Elements
<gene>

ALK1

</gene>

<gene />

Attributes
<gene

id=”ALK1”

/>

Raw text
<gene>

ALK1

</gene>

Comments
<!--

<gene> blabla </gene>

-->
8
The good: Web services

XML versus HTML
<html>
<head><title>
Books
</title><head>
<body>
<h2>
Books
</h2>
<hr/>
<ul>
<li><i>
XML in a Nutshell
</i>,

Elliotte Rusty Harold & W. Scott Means
O’Reilly, 2002, 2nd edition
</li>
</body>
</html>
<?xml version="1.0" encoding="UTF-8"?>
<Books>
<Book>
<Title>
XML in a Nutshell
</Title>
<Author>
Elliotte Rusty Harold
</Author>
<Author>
W. Scott Means
</Author>
<Publisher>
O’Reilly
</Publisher>
<Edition id=”
2
” yearOfPublication=”
2002
” />
</Book>
<Books>
9
The good: Web services

eXtensible Markup Language (XML)

Advantages:

“Self describing data format”
Tags describe content, not formatting!

Both easily human & machine readable

Disadvantage:

Overhead
10
The good: Web services

eXtensible Markup Language (XML)

Raw text:

Encode special characters

Or use CDATA section:
<![CDATA[

bla bla

]]>
Character
Encoded

&lt;

&gt;
&
&amp;

&quot;

&apos;
11
The good: Web services

XML Building blocks:

Communication:
Simple Object Access Protocol
(SOAP)

Service Description:
Web Services Description Language
(WSDL)

Service Discovery:
Universal Description, Discovery and Integration protocol
(UDDI)
Web Services Inspection Language
(WSIL)
BioMoby...
12
The good: Web services for Bioinformatics

Soaplab: wrappers for EMBOSS (EBI)

EBI WS

BioMart: query oriented WS for MySQL, Oracle & Postgres

Entrez Utilities & EFetch (NCBI)

KEGG (Kanehisa Lab)

Lots of labs providing a service or two

BioMoby
}
WSDL
WSDL + Ontologies = Service Discovery
13
The good: BioMoby

Ontology:

Services

Location & Operation

Inputs & Outputs

Data / Objects

Structure
BioMoby Central
Client
BioMoby service
Discover
Execute
Publish
xml
xml
xml
14
The good: SOA

Service Oriented Architecture

Software architecture that defines the use of services to support the
requirements of software users (Wikipedia)

Simple Open Affordable

Marketing motto from Redhat related to Service-oriented
architecture but redefined to represent Redhat's key principles
regarding the market (Wikipedia)

Sexueel Overdraagbare Aandoening

An even more horrible term!
15
The good: Workflows

Workflows:

Chain of web services solving interesting Biological questions

Workflow management:

Building workflows

Service discovery

Failover & retry

E-labjournals / logging
16
The good: The Semantic Web

The Semantic Web is an evolving extension of the World Wide Web
in which web content can be expressed not only in natural
language, but also in a format that can be read and used by
software agents, thus permitting them to find, share and integrate
information more easily. (Wikipedia)

Automatic Service Discovery

XML by itself does not provide semantics

No explicit meaning of nesting of tags

Need data model: Book has a Title
17
The good: The Semantic Web

No explicit meaning of nesting of tags
<?xml version="1.0" encoding="UTF-8"?>
<Books>
<Book>
<Title>
XML in a Nutshell
</Title>
<Author>
Elliotte Rusty Harold
</Author>
<Author>
W. Scott Means
</Author>
<Publisher>
O’Reilly
</Publisher>
<Edition id=”
2
” yearOfPublication=”
2002
” />
</Book>
<Books>
<?xml version="1.0" encoding="UTF-8"?>
<Fire>
<Product>
Diesel
</Product>
<Product>
Wood
</Product>
<Product>
Water
</Product>
<Product>
Foam
</Product>
<Fire>
Book
has a
Title
Fire
fuelled by
Diesel
Fire
extinguished by
Foam
18
The good: The Semantic Web

Resource Description Framework
(RDF)

Web Ontology Language
(OWL)

BioMOBY ontologies
19
The bad: yet another standard

Interoperability is paramount!

Don’t create yet another standard!!!

XML’s extensibility is strength but also weakness

50+ BLAST services

text_plain
versus
text-plain
20
The bad: documentation

Interoperability is paramount!

Automatically generated WSDL files
without description of inputs, outputs & operations

BioMOBY Central ontologies sometimes not sufficient to figure out
how things work

Open Source academic projects: documentation 1.0 for code 3.0

No “Semantic Web” yet!
21
The ugly: XML parsing

Document Object Model (DOM)
“creates a tree in memory”

Simple API for XML (SAX)
“stream based parsing”

Hybrid solutions
“chunk based parsing”
22
The ugly: XML parsing

DOM

Access to all elements: Convenient

Requires lots of memory

SAX

Small memory footprint

No access to previously parsed elements: Inconvenient

Most implementations “DOM”: scalability issues
23
Conclusion

Not mature yet

Semantic web still a fairy tale

Doesn’t scale to things like the Ensemble pipeline yet

But...
24
Conclusion

Hundreds of services already available

Works well for small and mid sized distributed systems

Prevents firewall issues

No more brittle HTML parsing converters
25
26