GridQTL : Using the NGS to map

indexadjustmentInternet και Εφαρμογές Web

13 Νοε 2013 (πριν από 5 χρόνια και 1 μήνα)

165 εμφανίσεις

GridQTL : Using the NGS to map
genes
through a web portal

Jean
-
Alain Grunchec


University of Edinburgh

Plan


GridQTL Team and users


Introduction to the GridQTL project


Description of computing infrastructure and
software behind the scene


Short demonstration of the Grided service

GridQTL Team

* Sara Knott

+

Ian White

×

Jules Hernandez
-
Sanchez

#

Jean
-
Alain Grunchec

#

Kashif Saleem

* Chris Haley

* Dirk
-
Jan de Koning

×

Wenhua Wei

×

Burak Karacaoeren

×

Susan Rowe


* Jano van Hemert

#

John Allen


×

Biologist

+ Mathematician


# Computer scientist

GridQTL Users


443 Registered external users in 44 countries


211 have used the core services, 67 the LDLA

Plan


GridQTL Team and users


Purpose the GridQTL project



Description of computing infrastructure and
software behind the scene


Short demonstration of the Grided service

QTL mapping


Aim: To detect and locate genes (QTL) having
an effect on a quantitative trait


Quantitative trait


a trait with a continuous
measurement (size, weight, concentration)


QTL (quantitative trait locus)


a gene or DNA segment
having an effect on a quantitative trait

Rationale for QTL analysis


To understand genetic variation by dissecting
complex traits


fundamental knowledge of gene actions and
interactions


applications in agriculture


applications in medicine

Stature
190.0
185.0
180.0
175.0
170.0
165.0
160.0
155.0
150.0
145.0
Women
700
600
500
400
300
200
100
0
Std. Dev = 6.40
Mean = 169.1
N = 1785.00
History: QTL Express


Web portal to map QTL in experimental
populations


Based on Java servlets and uses a dedicated
pool of 6 computers


100+ Users


The increase of computational demand
degraded the quality of service very
significantly : 6 computers are not enough !


Most recent developments in genetics


New models are very computational (100s CPU hours per
analysis)


Potential for models which can be applied on complex pedigrees (real life
populations: ex LDLA)


Potential of more complex genetic models (multiple QTLs: ex epistasis)


Now feasible


100,000s marker genotypes per individual


10,000s phenotypes


1000s individuals


Current approaches may be inadequate to analyse resulting
large data sets


High Throughput analyses : 10,000s CPU hours per analysis



Plan


GridQTL Team and users


Introduction to the GridQTL project


Description of computing infrastructure


Short demonstration of the Grided service

Increasing the computational capacity
available to GridQTL


2006 : Condor pool of 10 computers


2007 : NGS
-
1 (500 CPUs)


2008 : ECDF+NGS
-
2 > 2600 CPUs

LEEDS

256 CPUs

WESTMINSTER

128 CPUs

MANCHESTER

256 CPUs

RAL

256 CPUs

PETRA
(condor)

10 CPUs

LINUX(FC, RHAT, SUSE)

SOLARIS

IRIX

Grid Infrastructure

ECDF

1456 CPUs

National Grid
Service

Local resources

In Edinburgh

OXFORD

256 CPUs

Server

CARDIFF

LANCASTER

Server running
GridSphere /
Tomcat

JSR
-
168 Portlet in browser

NGS/ECDF

Condor pool

globus

gsissh

ssh

AJAX :
JavaScript /
servlet

JSP

HTML

JavaScript

Software description

SWARM Meta
-
Scheduler

How do we use the NGS ?


Our users log on the website, are identified
through their unique user name.


They run queries by clicking buttons on the
web interface.


These buttons run some Java functions that
call Globus toolkit routines


The NGS authorize these routines to run by
recognising a
NGS portal certificate

which
identifies the web server and its administrator


An accounting system has to be put in place by
the administrator for usage audit.

Job submission


globus
-
job
-
submit

-
env JAVA_HOME=/usr/local/Cluster
-
Apps/java/jdk1.6.0_01

-
env host_name=ngs.wmin.ac.uk

ngs.wmin.ac.uk/jobmanager
-
pbs

-
stderr
/home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/.err.0000004696.4702.527.tx
t

-
stdout
/home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/.out.0000004696.4702.527.t
xt

-
maxtime 18

-
np 2
-
host
-
count 1

-
dir /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere

-
x "&(jobtype=single)(minMemory=1100)"
-
l
/home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/LDLA_GridSphere.sh


"4;4702;527;0000004696;achatzipli;0;server.cap.ed.ac.uk;qtlportlets/public/01573ec42120bf13
01218d475e67021b/;"

Profiling the application


Profiling software : gprof


Your own script


#/bin/csh


./myapplication.sh &


./profiler.csh $! &


wait


Memory : ps
-
o vsize,comm,user


Can monitor also disk usage ( some NGS
cluster have large temporary storage facilities )


Computational load : uptime




Failures happen !



Output failures : data corrupted during file transfer/on the
nodes/out of memory etc…



Duration failures : jobs terminated because of their
duration exceed the reserved duration



Submission failures : failure of the network during the
jobs submission



Server failures :
partly

handled








Plan


GridQTL Team and users


Introduction to the GridQTL project


Description of computing infrastructure



Short demonstration of the Grided service

Linkage Disequilibrium Linkage Analysis


Good for complex pedigrees


For instance a population of feral sheep from St Kilda


Or others… even plants (diploids).


Basically good for populations which would be too expensive
(or unethical) to breed for experimental purpose.


3,447 analyses run (198,142 Jobs on the Grid)

11,005 Hours CPU time

900 Hours user time

Thank you !


J. Hernández
-
Sánchez,
J.A. Grunchec

and S. Knott.
A web application to
perform linkage disequilibrium and linkage analyses on a computational
grid
. Bioinformatics 25(11): 1377
-
1383 (2009).


J.A. Grunchec
, J. Hernández
-
Sánchez and S. Knott
.
SWARM: A meta
-
scheduler to minimize job queuing duration in a Grid portal
. Accepted by
the International Conference of Cluster and Grid Computing Systems, Oslo,
Norway, July 2009.


G. Seaton, J. Hernandez, J.A. Grunchec, I. White, J. Allen, D.J. De

Koning,
Wenhua Wei, D. Berry, C. Haley, S. Knott.
G
rid
QTL:
a Grid portal for

QTL
mapping of compute intensive datasets
.

8th World Congress on Genetics
Applied to Livestock Production, August 13
-
18, 2006, Belo Horizonte, MG,
Brasil


Portal

http://cleopatra.cap.ed.ac.uk/gridsphere/gridsphere



http://gridqt1.cap.ed.ac.uk:8080/gridsphere/gridsphere

email :
jgrunche@staffmail.ed.ac.uk