Transcriptome analysis

clumpfrustratedΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

73 εμφανίσεις

Transcriptome

analysis


With a reference


Challenging due to size and complexity of datasets


Many tools available, driven by biomedical research


GATK and R/
Bioconductor

offer many options


Start by mapping reads to reference genome with a
mapping/alignment tool


deal with exon
-
intron junctions


Reconstruct transcripts from mapped reads


deal with
alternate splicing products


Calculate relative abundance of different transcripts


Estimate biological significance based on annotation


Example tools: Bowtie/
TopHat
, Cufflinks, Myrna


BIT 815: Analysis of Deep Sequencing Data

Workflow summary from a
review “From RNA
-
seq

reads to differential
expression results”, by
Oshlack

et al, Genome
Biol

11:220, 2010.



Note emphasis on statistical
analysis methods; an equal
emphasis should be placed
on experimental design.

The ‘Tuxedo’ suite of programs:

Bowtie,
TopHat
, Cufflinks and
CummeRbund


See
Trapnell

et al, Nature
Protocols 7:562


578, 2012 for
details

BIT 815: Analysis of Deep Sequencing Data


TopHat

maps reads


Cufflinks assembles transcripts


Cuffmerge

merges transcript
data detected in different
treatments


Cuffdiff

evaluates differential
expression


CummeRbund

provides
visualization tools


BIT 815:
Analysis of Deep Sequencing Data

Why merge data across treatments?

BIT 815:
Analysis of Deep Sequencing Data

Differential transcript abundance mechanisms

Transcriptome

analysis


Without a reference


First step is assembly


Transcriptome

assembly pipelines


Velvet/Oases


Oases is a post
-
assembly processor for Velvet


Trans
-
ABySS

(BCGSC)


based on
ABySS

parallel assembler


Rnnotator



based on Velvet


Trinity (Broad Institute)


a set of three programs


Common strategy: Assembly at multiple k
-
values, then
merging of resulting contigs, followed by refinement


Once an assembly is available, continue with analysis as
before

BIT 815: Analysis of Deep Sequencing Data

After
Transcriptome

Assembly…

BIT 815:
Analysis of Deep Sequencing Data


Some amount of analysis of differential splicing versus
differential promoter activity is possible, but conclusions
may be less robust in the absence of a reference


The fraction of the total number of genes that can be
discovered by RNA
-
seq

depends on the diversity of tissue
types and developmental stages analyzed, as well as the
depth of sequencing


330 million
SOLiD

reads
from a human cell line
detect only about 67% of
all annotated transcripts
in the human genome.



Characterization and
improvement of RNA
-
Seq

precision in quantitative
transcript expression
profiling.

Labaj

et al, Bioinformatics
27:i383
-
91, 2011

Transcriptome

analysis with RSEM

RNA
-
Seq

with Expectation Maximization

Li & Dewey, BMC Bioinformatics 12:323, 2011

BIT 815: Deep Sequencing

(a). Allows estimation of transcript abundance without a reference


genome, based on alignments to assembled transcripts, although the


transcripts can be taken from a reference genome sequence if it


is available

(b). Uses the Bowtie aligner by default, but considers reads that map to


multiple locations in the reference transcript collection

(
c). For each sample,
files of estimated transcript and isoform



abundance are produced, along with SAM files of alignments.

(
d).
The files of transcript and isoform abundance can be used to



evaluate
differential
expression using tools from
R
and
Bioconductor