powerpoint - Webpages Marshall

tastelesscowcreekBiotechnology

Oct 4, 2013 (4 years and 1 month ago)

185 views

WV
-
INBRE

West Virginia
IDeA

Network of Biomedical Research Excellence

Managing the
NextGen

data
pipeline

Jim Denvir, Ph.D.

NextGen

data challenges


NextGen

Sequencing produces very large data
sets


Order of Terabytes (10
12

bytes) per run


Data analysis requires considerable computing
power and specialist management


Main challenge is in distilling useful
information from raw data

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Core Facility support


Bioinformatics and Genomics core facilities
provide support for investigators needing to
have
NextGen

Sequencing data analyzed


Perform analysis from early part of pipeline


Perform downstream analysis, or provide support
and software for individual investigators


Depending on needs and expertise of investigator

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

NextGen

Analysis Pipeline

Image Analysis

Base Calling

Demultiplexing*

Alignment

SNP calling or RNA
Read Counting

Statistical Analysis

Functional Analysis

CASAVA (
Illumina
) or open
source (Tuxedo Suite,
R/
Bioconductor
)


* May require custom
scripts

Partek

or R/
Bioconductor

Real Time Analysis
performed by RTA software
on sequencer

IPA

Automated

Core Facility

Investigator

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Commercial Tools


Examples: RTA, CASAVA,
Partek
, IPA


Pros:


Short learning curve


Potentially can be used by individual investigators


Usually come with technical support and training


Cons:


Expensive


Closed, proprietary source code

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Open Source


Examples: R/
Bioconductor
, Tuxedo suite


Pros:


Free


Open source


Enables rapid, community
-
led improvement


Potentially more academically reviewable


Cons:


Steeper learning curve


Typically prohibitive for individual investigators


Sparse technical support

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Tools developed on site


Pros:


Can fill in missing functionality from available
tools


Customized exactly to our needs


Potential for a revenue source


Cons:


Development is very time consuming

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Roadmap


Experience from microarray data analysis suggests:


Start with commercial tools


Rapid start
-
up enables us to focus on learning scientific basis for
the analyses


Transition to open
-
source tools for some parts of pipeline


Probably mid 2012
-
mid 2014


Provides for financial saving further down the road


Sometimes better received by journal reviewers


Initial steps of analysis pipeline and functional analysis will still be
managed by commercial software


Develop custom solutions only when needed

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Storing Data


Archiving data from
NextGen

experiments
requires a large amount of disc space


Once analysis is complete, some raw image
data will be deleted


Storage of data is more expensive than re
-
running
an experiment!


Will consider exceptions for experiments which
cannot be repeated


WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

NextGen

analysis server


Genomics Core has a Linux server for
managing analysis and storing data


Housed in
Drinko

library and managed by central
campus IT staff


Has 42 Terabytes of usable disc space


Uses redundant system to allow for potential of
drive failures without losing data


Additionally, IT will back up data off site

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Things to remember


Core facilities are there to help!


At experimental design stage, be sure you
understand what analysis the core facility will
perform


Would you prefer to have IPA done by the core, or
would you prefer control over that stage


If so, do you need training and/or support?

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Questions

WV
-
INBRE West Virginia IDeA Network of Biomedical Excellence

Presentation available at http://
users.marshall.edu/~denvir/presentations.html