Bioinformatics eScience gateway services using the HP-SEE ...

dasypygalstockingsBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

71 views

www.hp
-
see.eu

HP
-
SEE

HP
-
SEE project

and

the HPC
Bioinformatics Life Science g
ateway

M. KOZLOVSZKY

Obuda University

The HP
-
SEE initiative is co
-
funded by the European Commission under the FP7 Research Infrastructures contract no. 261499

Overview


The HP
-
SEE project



HP
-
SEE Life Sciences Virtual Community



HP
-
SEE Bioinformatics Life Science gateway



Sequence alignment a
pplications


workflow
based online bioinformatics services



Working with workflows/gUSE

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


2

Pan
-
European e
-
Infrastructures vision


The Research
Network infrastructure

provides fast interconnection and
advanced services among Research and
Education institutes of different countries


The Research
Distributed Computing
Infrastructure (Grid, HPC)
provides a
distributed environment for sharing
computing power, storage, instruments
and databases through the appropriate
software (middleware) in order to solve
complex application problems


This integrated environment is called
electronic infrastructure
(eInfrastructure)
allowing new
methods of global collaborative research
-

often referred to as
electronic science
(eScience)


The creation of the eInfrastructure is one
of the key objectives to facilitate building
of the
European Research Area


Network Infrastructure

e
-
Science
Collaborations

DCI Infrastructure

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


3

Context: the Model
-

Converged Communication & Service
Infrastructure for South
-
East Europe

SEE
-
LIGHT & GEANT

Comp physics,

Comp chem, Life sciences

Seismology,
Meteorology,
Environment

HP
-
SEE

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


4

Context: Timeline and funding

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


5

HP
-
SEE: Project


Contract :
RI
-
261499


Project type:
CP & CSA


Call
:
INFRA
-
2010
-
1.2.3: VRCs


Start date:
01/09/2010


Duration:
24 + 9 months


Total budget:

3 885 196



Funding from the EC:
2 100 000



Total funded effort, PMs:
539.5


Web site:
www.hp
-
see.eu


Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


6

HP
-
SEE: Partnership

Contractors (14)

Third Party / JRU mechanism used

associate universities / research centres

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


7

HP
-
SEE: Project Objectives


Objective 1


Empowering multi
-
disciplinary virtual
research communities



Objective 2


Deploying integrated infrastructure for
virtual research communities


Including a GEANT link to Southern Caucasus



Objective 3


Policy development and stimulating
regional inclusion in pan
-
European HPC trends



Objective 4


Strengthening the regional and national
human network

8

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


8

T
he
HP
-
SEE Life Science
V
R
C and its
objectives

Main goal:


U
tilize the combined HPC resources with regional needs coming from the
life/bioscience communities, fostering the research process in the field within the
region with the help of the large
-
scale high availability infrastructure, and facilitate the
cooperation between the sparsely distributed life science research centres.



Data and limitations



The Life Sciences domain has been revolutionized by advances in both computer hardware and

software algorithms.


Assembling the Human Genome


Gene
-
expression chips to understand cellular processes



E
xponential growth in the amount of

publicly available genomic data.


GeneBank



T
raditional database approaches

are no longer sufficient for rapidly performing life science queries
involving the fusion of data

types.



Existing computational tools were created by experimentalists dealing with data sets that were
miniscule in comparison to those available today. As a result, software that was once perfectly
adequate now performs slowly or is incapable of successful analysis on traditional computational
platforms.

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


9

Accessible infrastructure


HP
-
SEE Supercomputing
infrastructure












SEE
-
GRID
-
SCI Grid infrastructure

Country

Center

Computing
Cores

Teraflops

Bulgaria

BG Blue Gene/P

8192

27.85

HPCG

576

3.23

FYR of
Macedonia

FINKI SC

2016

9

Hungary

NIIFI SC

144

0.5

Pecs SC

1152

10

Debrecen SC

3078

18

Szeged

2112

14

Romania

InfraGRID

400

2.5

IFIN_BIO

256

2.72

IFIN_BC

368

3.9

NCIT

562

3.4

UVT Blue Gene/P

4096

13.9

Serbia

PARADOX

672

6.26

TOTAL

23624

115.26

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


10

HP
-
SEE’s LS Applications

7 applications from 5 countries


Greece:


Searching for novel miRNA genes and their targets

(miRs)


Network models of short and long term memory

(CMSLTM)



Montenegro
:


DNA Multi
-
core Analysis (DNAMA)



Hungary
:


Deep sequencing for short fragment alignment (DeepAligner)

-

gUSE & workflow based


In
-
silico Disease Gene Mapper

(DiseaseGene)



-

gUSE & workflow based



Georgia
:



Modeling of some biochemical processes with the purpose of realization of their thin and
purposeful synthesis (MSBP)



Armenia:



Molecular Dynamics Study of Complex systems

(MDSCS)


Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


11

Why gUSE/WS
-
PG
RADE


Infrastructure


HP
-
SEE infrastructure


Based on gLite and Arc as middleware


Authentication procedures are painfull (as usual)


Interoperabilty with grids is a plus


Application


Workflow like process with embedded (legacy) applications


Restricted input parameter sets for the algorithms


Service like operation


Portal features for a community


Knowledge
,

licensing & support


Open source software environment needed


Knowledge transfer required for the application specific modules




Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


12

HP
-
SEE Bioinformatics
eScience Gateway


HP
-
SEE Bioinformatics eScience Gateway hosted at
Obuda University, operated by
MTA
SZTAKI
.


gUSE+WS
-
PGRADE (v3.3.2)
-

Liferay based


SEE region’s supercomputing & grid infrastructure used


Accessible at:
http://ls
-
hpsee.nik.uni
-
obuda.hu:8080/liferay
-
portal
-
6.0.5


Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


13

Architecture and application
porting steps

Unified porting steps of the applications:

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


14

DeepAligner
-
Deep sequencing for short
fragment alignment


Description

& Objectives


Mapping short fragment reads to open
-
access eukaryotic genomes is solvable by a
group of algorithms (BLAST, BWA, PatternHunter, and other sequence alignment tools


BLAST /mpiblast or scalablast/ is one of the most frequently used tool in
bioinformatics and the others are relative new fast light
-
weighted tools that aligns
short sequences. Local installations of these algorithms are typically not able to handle
such problem size therefore the procedure runs slowly, while web based
implementations cannot accept high number of queries. The HP
-
SEE infrastructure
allows accessing massively parallel architectures and the sequence alignment code is

distributed free for academia.




Result


Online workflow based short sequence alignment service



Impact


Freely available
service/
code for large scale short sequence alignment



Collaborations


Hungarian Bioinformatics Association, Semmelweis University


HP
-
SEE infrastructure used: Hungarian HPC, NIIF’s supercomputing sites


Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


15

DeepAligner
-
Deep sequencing for short
fragment alignment (contd.)


Small scale launch (Home cluster):
PBS/Linux Cluster, at the Obuda University


John von Neumann Faculty of
Informatics.


Activity and technical assistance in pre
-
production stage
:
Technical assistance
was
provided
by MTA SZTAKI
and NIIF
.


Porting
:
Application was ported using(Perl/C). Workflow and GUI was created for the application by Obuda
University.


Benchmarking


Scaled from 32 cores to
96

cores (MPI).











DeepAligner Status


The online service is using two from NIIF
’s
supercomputing
infrastructure (
Budapest site and Szeged site)
.



Foreseen activities:
Parameter assignments optimi
z
ation of the GUI, more scientific publications about short
sequence alignment.
Further scaling is planned with performance analysis.



More information: http://hpseewiki.ipb.ac.rs/index.php/DeepAligner


Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


16

Development & working on
gUSE/WS
-
PGRADE


Pros


Close collaboration and useful support (pros)


ARC middleware connector was developed from
scratch by MTA SZTAKI on request


ASM and ARC submitter related bugs have been found
and reported


Helpful and skilled support & development team


Cons


ARC middleware problems (internal) hard to find



Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


17

Future plans


Additional plug
-
in like online bioinformatics services


More sequence alignment workflows


More sequence multiple alignment workflows


Sequence database quality measurement workflows


Open up the gateway for users outside SEE region



Thank you for you attention!


Questions?



Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


18

gUSE/
WS
-
PGRADE
architecture

ASM

Application specific Module


WS
-
PGRADE

DeepAligner

DiseaseGene

Summer School on Workflows and Gateways for Grids and Clouds
2012





Budapest ,Hungary 2
-
6.07.2012


19