Next Gen Sequencing Platforms

photohomoeopathΤεχνίτη Νοημοσύνη και Ρομποτική

24 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

77 εμφανίσεις

Data from the GS FLX and the SOLiD
Next Gen Sequencing Platforms

Jo Stanton

UO High Throughput DNA
Sequencing Unit

Anatomy and Structural Biology

University of Otago

http://sequence.otago.ac.nz

Libraries and Emulsion PCR



Remove Oil (emulsion breaking)


Enrich for beads with amplified DNA


Ready to load!

GS FLX


Whole genomes


de novo


Transcriptomes


Amplicons


Methylation


Metagenomes


Amplicons


deep sequencing of target
regions


Etc….

GS FLX or 454


Read length ranges 200bp to 500bp


Two chemistries: Standard and Titanium


Run Times <8 hours


100Mb to 500Mb per run



Raw data is processed
from a series of
individual images.


Each well’s data is
extracted, quantified,
and normalized.



Read data is converted
into flowgrams.

T

A

G

C

T

Signal strength is determined by homopolymer length

Flow Order

1
-
mer

2
-
mer

3
-
mer

4
-
mer

T
A
C
G

Instrument Output


dataRunParams.parse


General info about sequencing run


imageLog.parse


Log of all the images


#####.pif


Raw image files that show photon count for
each pixel.


Image processing


Background subtraction at pixle level


Locate sequencing wells on PTP


Extract raw signal from image
corresponding to each well


Write flow signals to ‘raw wells’ file

Image processing output


analysisParms.parse


Parameter file


revisedRegions.parse


Shows region of sequencing on PTP, corresponds to
loading gasket


region.wells


Contains background
-
subtracted signals for all nt and
Ppi flows for all active wells on all images.

Signal Processing

Uses region.wells output to:


Correct interwell crosstalk


Correct known “out
-
of
-
phase” errors


Correct signal droop and residual background
subtraction


Filter (pass/fail) processed reads on signal quality


Trim read ends for low quality and primer
sequence.


Generate ‘flowgrams’

Filters


Keypass


Checks sequence has Key


Dots


Excludes well after no nt incorporation after 4 flows


Mixed


Removes wells containing more than one sequence


Signal Intensity


Identifies wells with low signal readings and trims


Primer


Scans end of each read and trims off the GS FLX
adaptor sequence.

Quality Score


Phred equivalent



Q =
-
10 log
10
[1
-
P(≥n|s)]

where s is the observed signal, n is possible
homopolymer length.

Signal Processing Output


region
.wells


Contains processed flow
-
by
-
flow data for all wells


region
.trimInfo


Parallel to region.wells. Lists the pass/fail and trim
results of filters


454RuntimeMetrics.txt


Metrics for run performance (troubleshooting)


454QualityFilterMetrics.txt


Contains quality filter metrics

Cont.


454BaseCallerMetrics.txt


Number high quality reads, average read length, region
info


454RuntimeMetricsAll.txt


Run metrics including Control Beads


region.key
.454Reads.fna


FASTA file of basecalled reads


region.key
.454Reads.qual


Quality scores for read in region.key.454Reads.fna


uaccnoRegion
.sff


Standard Flowgram Format files of sequence trace data
for high quality reads.



Signal processing output files are used by
researchers for subsequent analysis


Total size 50Gb


Download from secure server from user account
(scp protocols)

SOLiD 3


Whole genome sequencing


Transcriptomes


Resequencing


Methylation studies


Etc…

SOLiD 3


Read length 50bp or 2 x 50bp


Run times from 3.5 days to 10 days


Up to 15Gb per run


http://appliedbiosystems.cnpg.com/
Video/flatFiles/699/index.aspx

Why worry about flow space and
colour space?



Accuracy


details are lost in translation!