TeraGrid Science Gateways

greenbeansneedlesSoftware and s/w Development

Dec 13, 2013 (3 years and 11 months ago)

172 views

TeraGrid Science Gateways

Nancy Wilkins
-
Diehr

TeraGrid Area Director for Science Gateways

wilkinsn@sdsc.edu

TeraGrid Rount Table, October 7, 2010

What have the gateways been up to?


Ultrascan


Borries Demeler, UT ; Suresh Marru, Raminder Singh, IU


Gateway software listing


Wrap up of support for Arroyo, RENCI science
portal


But hopefully not the end of TG usage by those groups


Dark Energy Survey


Jim Myers, Michelle Gower, NCSA


CUE presentation


Derek Simmel, PSC


login
-
env, build, comm, math, tg

TeraGrid Rount Table, October 7, 2010


GRAM5


Making use of TG portal user forum for discussion


Interest in sharing experiences with OSG


Update on Inca tests (able to recreate load from “Gateway Debug 2007”)


Gateway experiences


hung processes when errors pile up


SGE job manager issues


Nice work by David Carver (TACC), Suresh Marru (IU), Stu Martin (ANL)


Expressed Sequence Tag gateway


Archit Kulshrestha, IU


CIPRES


Over 600 users on TG Apr
-
June


2.7M hours awarded 7/1/10, “model gateway proposal”


But able to use much more than this


Gateways in the extension year


Gateway study

TeraGrid Rount Table, October 7, 2010

Analytical Ultracentrifugation

Emerging computational tool for the study of proteins


Samples from researchers all over
the world


Some (Germany, Australia) have
their own ultracentrifuges and use
only the analysis capabilities, others
send samples to UT to spin


Spin the samples at high speeds,
learn about macromolecule
properties


Monte Carlo simulations


Observations are electronically
digitized and stored for further
mathematical analysis

TeraGrid Rount Table, October 7, 2010

Source: Suresh Marru, IU

The Center for Analytical Ultracentrifugation of

Macromolecular Assemblies, UT Health Sciences


Comprehensive data analysis environment


Management of analytical ultracentrifugation data for single
users or entire facilities


Support for storage, editing, sharing and analysis of data


HPC facilities used for 2
-
D spectrum analysis and genetic algorithm
analysis


TeraGrid (~2M CPU hours used)


Technische University of Munich


Juelich Supercomputing Center


Portable graphical user interface


MySQL database backend for data management


Over 30 active institutions

TeraGrid Rount Table, October 7, 2010

Source: Suresh Marru, IU

Gateway
and

ASTA support

a growing trend


TeraGrid advanced support


Fault tolerance


Workflows


Use of multiple TG resources (using Lonestar, expanding to QueenBee
and Ranger, using Quarry for test server, waiting for GRAM5 on
Ranger)


Community account implementation


Remote steering


Improved UI (no manual specification of CPU time)


Applying lessons learned from GridChem, LEAD, incorporating new
features into OGCE


LEAD is portlet
-
based, Gridchem is java swing client side app, Ultrascan is
php and perl
-
based gateway, all can use OGCE


Big MPI app that forks off many independent runs, improvements here will
be tackled by TG's advanced support team


TeraGrid Rount Table, October 7, 2010

Source: Suresh Marru, IU

Gateway software listing


Populate TeraGrid’s
information service with
gateway software
information


Similar to RP software listings


But, RP listings are maintained at
RPs, IIS pulls from those sources


With gateways we are thinking
they fill in a form and push the
info to IIS


http://www.renci.org/~jdr0
887/gawsr
-
howto/

TeraGrid Rount Table, October 7, 2010

Dark Energy Survey


Know universe is expanding, but expansion is accelerating
for unknown reasons


DES is telescope experiment to constrain various theories
-

4m telescope in Chile, Fermi and others developing new lens,
working with simulated data until telescope goes online in
2011


200 TB raw data over 5 years, 4 PB of derived products
-

lots
of filtering


Thousands of jobs run on TeraGrid each week with very few
failures



Removing light from bright stars, airplanes, clouds,
calibration
-

telescope operated by staff, users will use the
portal to do queries for particular stars/regions of the sky
afterward

TeraGrid Rount Table, October 7, 2010

Source: Jim Myers and Michele Gower, NCSA


Condor dagman, condor
-
g, pre
-
ws gram, gridftp,
elf/ogrescript for monitoring (developed at ncsa), oracle


Challenges


Efficiently managing small jobs in big batch world


Databases stresses, block updates instead of individual
transactions for better performance, indexing strategies,
narrow vs wide tables


~100 front end users, expected to grow in production
-

changing paradigms from Sloan Digital Sky Survey
-

data
now too large for bulk downloads and full table scans

TeraGrid Rount Table, October 7, 2010

Source: Jim Myers and Michele Gower, NCSA

Expressed Sequence Tag (EST) Pipeline


Integrate existing computational biology software


Expand compute capacity by using TeraGrid


Take raw genome data in the FASTA format and run a series
of applications on it


RepeatMasker, PaCE, CAP3 and BLAST used to generate the final
assembled output


EST Pipeline based on the SWARM Web Service that
provides a web service interface to clients and also manages
the bulk job submission using the Birdbath API to submit to
Condor


Workflow is configured using a PHP based gateway that
allows users to upload input data and select programs to run


TeraGrid Rount Table, October 7, 2010

Source: Archit Kulshrestha, IU

Expressed Sequence Tag Assembly


ESTs

are a collection of random
cDNA

sequences, sequenced from a
cDNA

library or sequencing devices.


Typical inputs are of the order of millions
of sequences


Newer 454 devices produce higher
volume and are relatively easier to obtain
and operate


Stored in a file using the FASTA format


The
ESTs

are clustered and
assembled to form
contigs
.


The
contigs

are then used to
identify potential unknown genes, by
Blasting against a known protein
database.


Application

Purpose

RepeatMasker

Cleaning
sequences

PaCE

Clustering

CAP3

Assembly

BLAST

Identification

Source: Archit Kulshrestha, IU

TeraGrid Rount Table, October 7, 2010

Application Runtime Characteristics

RepeatMasker


Serial

Execution
on split input


Eg
. 1000 for 2
million

PaCE


MPI


Runtime of
several hours


Exponential
Growth in time with
growth in input
data. Increasing
number of
procs

works quite well


CAP3


Serial Runs on
Clusters generated

by
PaCE



Clusters
can be combined


Varied sizes with
varied resource
requirements (run
times of
milliseconds to
days)


BLAST


Serial


Takes
CAP3 results.
Number of jobs
controlled by
adjusting number
of sequences per
job.


Source: Archit Kulshrestha, IU

TeraGrid Rount Table, October 7, 2010

Results

Program

No.

Of Jobs

Wait time +
Run time

Repeat Masker

1000

11:56

PaCE

1

01:22

CAP3

4073

25:44

BLAST

893

49:00

The results are from a single 2 million job run and hence may not be
an accurate model of the wait time. However other than in the case of
BLAST the wait times were not a significant component of the total
time.


Long waits due to long queue times for small jobs.


Previous run times


5 days compared to 2. Serial waits eliminated.


Had hooks to inca to determine when jobs were down


Failure rate quite low


10
-
12 out of thousands

Source: Archit Kulshrestha, IU

TeraGrid Rount Table, October 7, 2010

Cyberinfrastructure for Phylogenetic
Research (CIPRES)


Enables large
-
scale
phylogenetic
reconstructions


Parallel versions of
applications such as
MrBayes, Raxml and
Garli run on Teragrid


Easy to use graphical
user interface

TeraGrid Rount Table, October 7, 2010

CIPRES Portal users consumed 1,200,000 TeraGrid cpu hours between Dec
2009 and June 2010. This was 3 times our projected use.


A new award of 2.7 million cpu hours was made on July 1, 2010.


The portal provides access to parallel versions of MrBayes, RAxML, and
GARLI, which all scale well on TG resources. The portal staff has worked
with TG special projects group personnel and community developers to
provide access to the fastest versions of MrBayes and RAxML available
anywhere.


Access to BEST, a variant of MrBayes, is planned in the near future.

A GPU platform called BEAGLE will be used to provide access to BEAST
on Teragrid (Lincoln), also in the near future.


The toolkit will be expanded to provide access to other community codes
that are appropriate for use on TeraGrid






Current Status:

Source: Mark Miller, SDSC

TeraGrid Rount Table, October 7, 2010

Usage Statistics for CIPRES Portal on TG 12/1/2009


5/31/2010

Month
d e c
j a n
f e b
m a r
a p r
m a y
SU's consumed
2 e 5
4 e 5
Month
d e c
j a n
f e b
m a r
a p r
m a y
Jobs Submitted
1 e 3
2 e 3
3 e 3
Month
d e c
j a n
f e b
m a r
a p r
m a y
SU's consumed
2 e 5
4 e 5
Month
d e c
j a n
f e b
m a r
a p r
m a y
Jobs Submitted
1 e 3
2 e 3
3 e 3
Source: Mark Miller, SDSC

TeraGrid Rount Table, October 7, 2010

Intellectual Merit:



the CIPRES portal is cited in at least 35 publications




this includes publications in Nature, PNAS, and Cell.




highlights of scientific findings:


New Family Tree for Arthropoda:
A team of scientists compared genetic
sequences from 75 arthropod species and drew a new family tree for the
most successful phylum of animals on Earth. This work represents an
important advance in the century
-
old problem of arthropod evolution.


Genome Sequence of a Transitional Eukaryote:
A group of scientists
sequenced the genome of Naegleria

gruberi,

a single
-
cell organism that is a
key transitional species between prokaryotes and eukaryotes. This work
provides new insights into the origins of subcellular organelles.


Co
-
evolution of Beetles and Flowering Plants:
A group of researchers studied
the evolutionary history of angiosperms and the beetles that interact with
them. The work provided compelling experimental evidence for the long
-
postulated co
-
evolution of these two symbiotic groups.


Source: Mark Miller, SDSC

TeraGrid Rount Table, October 7, 2010

Broad Impacts:



77% of all jobs have been submitted from locations in the USA.
Submissions are received regularly from researchers at top
-
tier institutions
such as Harvard, Yale, and Stanford.




Jobs are received regularly from academic institutions in 17 EPSCOR
states.




Job submissions have been received from 34 countries on 5 continents.



At least 5 undergraduate classes are known to use the portal routinely. This
is likely an underestimate (based on Web log patterns).



More than 45,000 jobs have been run on the Portal over its lifetime.
Between Dec 1, 2010 and June 30, 2010, users ran 6,108 parallel jobs on the
TeraGrid.


Source: Mark Miller, SDSC

TeraGrid Rount Table, October 7, 2010

Broad Impacts:


Impacts on Productivity:


Average wall time for RAxML and GARLI jobs decreased 3
-
4 fold with the
shift to TeraGrid resources.


Moreover, the number of RAxML jobs has doubled relative to the rate of
submission on the CIPRES Portal running on the CIPRES cluster alone.


Thus, TeraGrid access is helping users finish their jobs faster and also to
make more runs per unit time.


The average wall time for MrBayes jobs increased 2
-
fold on the TeraGrid,
but the number of jobs decreased by approximately 33%. This trend reflects
users’ ability to run much larger and longer jobs on TeraGrid than on the
CIPRES cluster. The increased maximum run
-
time limit for MrBayes
submissions to Abe (168 hours on Abe vs. 72 hours on the CIPRES cluster)
allowed users to complete their long runs with a single large submission,
thus eliminating the need to make smaller, incremental runs.


Source: Mark Miller, SDSC

TeraGrid Rount Table, October 7, 2010

Broad Impacts:


Improved User Access to TG: 100


150 new users per month access TG
resources; the number of repeat users is growing….


Month
dec
jan
feb
mar
apr
may*
Total users
100
200
300
400
500
*may is a partial month (18 days), error bar projects full month
Month
dec
jan
feb
mar
apr
may*
number of users
50
100
150
200
*may is a partial month (18 days)
Repeat Users
New Users
Source: Mark Miller, SDSC

TeraGrid Rount Table, October 7, 2010

New gateway activities in the extension year


Helpdesk support expanded


From .2 FTE in PY5 to 1.7 in Extension [NCSA, Purdue]


Helpdesk and Condor support, new GIS communities, SimpleGrid extensions


Accounting


Improved views for gateways now that we have attributes [TACC]


Community accounts


Continued work toward improved standardization [NICS]


Prebuilt VMs with gateway software


OGCE, SimpleGrid [IU, NCSA]


Online tutorials with CI Tutor and the EOT team


OGCE, SimpleGrid [IU, NCSA]


More example
-
based documentation


Less talk, more action, short videos, based on user feedback [NCSA, SDSC]


Remote vis for gateways [ORNL]

TeraGrid Rount Table, October 7, 2010

Targeted Support in the Extension

All staff available for assignments as new projects come in


Cactus


Meet the needs of several groups with large TG allocations [LSU]


GridChem, PolarGrid, Ultrascan


Scheduling, vis, Matlab processing, processing of centrifuge data for
large international project [IU]


CCSM
-
ESG


Continuing work to combine capabilities [NCAR, Purdue]


Uintah, computational fluids [NCAR, Utah]


SNS [ORNL]


CIPRES [SDSC]


OpenSocial for gateways [U Chicago]


Improved use of remote vis resources [ORNL]


Condor and cloud support [Purdue]

TeraGrid Rount Table, October 7, 2010

Gateway Sustainability Study

Small, non
-
TG, EAGER grant


Characteristics of short funding cycles


Build exciting prototypes with input from
scientists


Work with early adopters to extend
capabilities


Tools are publicized, more scientists
interested


Funding ends


Scientists who invested their time to use
new tools are disillusioned


Less likely to try something new again


Start again on new short
-
term project


Need to break this cycle


EAGER grant to look at characteristics
of successful gateways and domain
areas where a gateway could have a
big impact

TeraGrid Rount Table, October 7, 2010

4 focus group meetings over 2 years

First 2 held June, 2010


www.sciencegateways.org


TeraGrid Rount Table, October 7, 2010

Thank you for your attention!

Questions?




Nancy Wilkins
-
Diehr,
wilkinsn@sdsc.edu

www.teragrid.org