BioinfoWF WEB SERVICES AND WORKFLOW MANAGEMENT FOR BIOINFORMATICS ANALYSIS

balecomputerΑσφάλεια

3 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

70 εμφανίσεις

BioinfoWF


WEB SERVICES AND WORKFLOW MANAGEMENT FOR
BIO
INFORMATICS
ANALYSIS


M.A. Genaev
*
1
,
D.A. Afonnikov
1,2

1

Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia

2

Novosibirsk State University, Novosibirsk, Russia

e
-
mail: mag@bionet.nsc.ru

*
C
orresponding author


Key words:
bioinformatics, workflow, grid processing, XML


Motivation and Aim:
The analysis of biological data in bioinformatics usually consists of several
steps performed by different programs subsequently. During the analysis prog
ress, the output of
one calculation module serves as an input of the other module, etc.
Thus, t
he overall procedure
could be organized as a workflow [1, 2]. For example, the calculation of the phylogenetic tree for
protein family requires protein sequence
extraction from databases, multiple sequence alignment,
phylogeny estimation. It should be noted, that most of single steps could be performed using
different routines. For example, sequence alignment
could be obtained using ClustalW, Mafft,
Muscle or
T
-
Co
ffee programs. The program’s choice by user often depends on the data under
analysis and the aim of the task.

Methods and Algorithms:
To perform workflow data processing for bioinformatics we developed
BioinfoWF system. It is written in Perl and based on t
he XML description of the program
options, input and output data for a single step of the workflow. The second part of the system
describes the workflow scheme, set the file data, the execution status of each step. The
BioinfoWF runs under command line on
the UNIX
-
like systems or as a web
-
service. The
workflow or its part can also perform on the multiprocessor cluster systems under Sun Grid
Engine.

Results:
We used BioinfoWF to develop Computer System for Analysis of Molecular Evolution
Modes of Protein Fam
ilies (SAMEM) and
functionally important
SNP detection in
the
regulatory regions of eukaryotic genes
.

Conclusion:

The BioinfoWF can be used to organize workflow management for various
bioinformatics tasks.

Availability:
The BioinfoWF available at
http://
pipeline.bionet.nsc.ru
.

Acknowledgements:
This work supported by SB RAS Integration project 1,

26,

113,

119, RAS
programs №22 (project 8) and "Biosphere origin and evolution”.

References:

1.

Ríos

J.
, Karlsson
J.
and Trelles

O.

(2009) Magallanes: a web ser
vices discovery and
automatic workflow composition tool,
BMC Bioinformatics
,
10
:334.

2.
O
inn T
.
, Addis M
.
, Ferris J
.
, Marvin D
.
, Senger M
.
, Greenwood M
.
, Carver T
.
, Glover K
.
,
Pocock M
.
R
.
, Wipat A
.
, Li P
.

(2004)
Taverna
:

a

tool for the composition and ena
ctment of
bioinformatics workflows,
Bioinformatics
,
20
:3045
-
3054.