Presentation slides - Indiana University at SC12

signtruculentΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

68 εμφανίσεις

Supercomputing and
Bioinformatics
at IU


Abhinav Thota

Senior Analyst

Research Technologies/UITS


November 13, 2012


Outline


Three
bioinformatics projects
at IU which
are compute
intensive


I
think
these are
really good success
stories



These use novel and unconventional methods


mlRho

-

a program for estimating the population mutation and recombination rates
from shotgun
-
sequenced diploid
genomes


The researchers at IU need ~
6

millions CPU hours


Get the CPU hours


Optimize code


Trinity RNA
-
Seq

Assembler Performance
Optimization


Very popular tool used
by biotechnologists


Optimized the code


Galaxy
@
IU


A web based platform for data intensive biomedical research


2

mlRho

-

a program for estimating the population mutation
and recombination
rates


Prof. Mike Lynch’s group at IU came to us with the project


A

program for estimating the population mutation and recombination rates


Each
mlRho

job is serial but they work on different parts of the genome profiles


Thus they are embarrassingly parallel


We started by getting the the researchers on to XSEDE


We benchmarked the code on different machines


Scaled up to 128 cores using
BigJob
, a pilot
-
job tool


Successful XSEDE Proposal


XSEDE
is
supported
by the National Science Foundation.


It
replaces and expands on the NSF
TeraGrid

project.


More
than 10,000 scientists used the
TeraGrid

to complete thousands of research
projects, at no cost to the scientists
.


We asked for 6.3 million CPU hours and received 6.2 million CPU hours


The benchmarking and scaling study was really important


Our efforts to make the code faster


Another important factor is the usage of a pilot
-
job tool to run the serial jobs
concurrently, utilizing all of the cores in a processor


Our tests were on Ranger, an XSEDE resource at TACC and Mason,
NCGAS resource at IU




Benchmarking and Scaling tests on Ranger


Three genome profiles,
representing three size
categories


small, medium and
large


700 MB, 10 GB, 30 GB in size


X
-
Axis shows the number of
concurrent instances of
mlRho



Y
-
Axis shows the cumulative
distance travelled by all the
mlRho

instances running
concurrently


The distance traveled,
irrespective of the genome size,
increases linearly with number
of concurrent instances being
run


We can therefore predict the
number of CPU hours we need
to complete the science

Compilers and Optimization


It is always a good thing to do a basic profile and trace an application


Using the right compilers and optimization flags means a lot


Here are the runtimes with different compilers









Just by using the right compilers we saw a 10% improvement in performance


Compiler

Ranger

Mason

Intel

405

364

GNU

450

414

mlRho

trace using
Vampir




Memory Usage

Benefits of Tracing


We were able to look at the time spent in each function


Allows us to focus on the parts of the application that is taking up time


Lots of relevant information like memory usage at every step, the FLOPS achieved
and various other counters


Updates to the code

42 times faster!


The code is now 42 times faster


It is an iterative process


Major changes include the way the application reads and stores data


Pre
-
computing some of the tasks that are common to a particular genome


Whether or not substantial improvements materialize, it is always useful to look at the
code


Even a 10% speed
-
up makes a big difference if the overall CPU usage is in the
millions of hours

Trinity RNA
-
Seq Assembler Performance
Optimization

Robert Henschel, Matthias Lieber, Le
-
Shin Wu,
Philip M. Nista, Brian J. Haas, Richard D. LeDuc

Plan


Reproduce results from previous performance papers



Perform general optimizations


Optimize components



Publish results


Make source code available


Trinity


De novo

reconstruction of the transcriptome from
RNA
-
seq data


Using next generation sequencing on RNA



Identify and catalog all expressed genes


Capture gene expression level

Test Configuration


Trinity base version: 2012
-
03
-
17


Base configuration:


Standard makefile


GCC compiler version 4.4



Running Trinity using the following command line:


--
CPU
20
--
kmer_method
jellyfish

--
max_memory
4G

--
bfly_opts "
--
edge
-
thr=0.05"

--
min_contig_length
300


Input/Output and temporary files on Lustre



Performance

Our test

Optimizing Components


Optimizing Inchworm


Intel’s
OpenMP

runtime seems superior to GCC’s, for this workload


Optimizing
GraphFromFastA


Parallelizing read counting phase


Optimized file input to reduce
OpenMP

criticial

section


10x faster on 32 cores


Optimizing
ReadsToTranscripts


Changing buffered C++ I/O to regular C I/O


Changing
OpenMP

scheduler directive


2x faster
on 32 cores


Optimizing
QuantifyGraph


Thousands of embarrassingly parallel tasks, with runtimes of 160
ms

to
25 min


Optimized relational operator “<”


Reducing “system()” calls


Reducing the read buffer from 200MB to 1kB


5x faster on 32
cores

Trinity @ PSC Blacklight




SGI
Altix

UV 1000


Global shared memory, cc
-
NUMA architecture


Two partitions, each consisting of 2048 cores with 16
TByte

of memory



Each “blade” contains:


Two 8
-
core Intel Xeon processors at 2.27
Ghz


128
GByte

of RAM


Final Results



Conclusion


Significantly reduced runtime, while maintaining
correctness

of results


Results are published


Source code is commit to official SourceForge repository



Working on establishing a continued collaboration between IU/ZIH/Broad, to
further optimize Trinity




Galaxy instance at IU


Galaxy is an open, web
-
based platform for data intensive biomedical research.


Whether
on the free public server or your own instance, you can perform, reproduce,
and share complete analyses
.


NCGAS deployed a Galaxy instance at IU


The jobs are run on Mason


Questions?

Acknowledgements and Disclaimer

Research Technologies is a division of University Information Technology Services and is affiliated
with the the Pervasive Technology Institute

This work was supported in part by the Lilly Endowment, Inc. and the Indiana University
Pervasive Technology Institute

Any opinions presented here are those of the presenter(s) and do not necessarily
represent the opinions of the National Science Foundation or any other funding agencies

License Terms

Please cite as: Thota, A., Introduction to HPC. Tutorial presented at XSEDE12 (Chicago,
IL, 16 July 2012).

Items indicated with a © are under copyright and used here with permission. Such items
may not be reused without permission from the holder of copyright except where license
terms noted on a slide permit reuse.

Except where otherwise noted, contents of this presentation are copyright 2011 by the
Trustees of Indiana University.

This document is released under the Creative Commons Attribution 3.0
Unported

license
(
http://creativecommons.org/licenses/by/3.0/
). This license includes the following terms:
You are free to share


to copy, distribute and transmit the work and to remix


to adapt
the work under the following conditions: attribution


you must attribute the work in the
manner specified by the author or licensor (but not in any way that suggests that they
endorse you or your use of the work). For any reuse or distribution, you must make clear
to others the license terms of this work.