Next generation sequencing: an

foulgirlsΠολεοδομικά Έργα

15 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

55 εμφανίσεις

Next generation sequencing: an
overview

A I
Bhat

Indian Institute of Spices Research

Calicut


DNA sequencing



Chain termination method
(
Sangers

et a
l., 1977): In
this method, the sequence
of a single stranded DNA
molecule is determined by
enzymatic synthesis of
complementary
polynucleotide chains,
these chains terminating at
specific nucleotide
positions.



The chemical degradation
method
(
Maxum

and
Gilbert, 1977), in which
the sequence of a double
stranded DNA molecule is
determined by treatment
with chemicals that cut
the molecule at specific
nucleotide positions

Chain termination method


Dye
-
terminator sequencing



Utilizes labelling of the chain terminator ddNTPs, which
permits sequencing in a single reaction


Each of the four dideoxynucleotide chain terminators is
labelled with different fluorescent dyes (ddA Green, ddT
Red, ddG Yellow and ddC Blue), each of which with
different wavelengths of fluorescence and emission
.


The fragment stopping at the base position can be
detected on the gel by a powerful laser beam.


Owing to its greater expediency and speed, dye
-
terminator sequencing is now the mainstay in automated
sequencing
.

Capillary electrophoresis


Sanger method can sequence only 1000

1200
bp


in one reaction

View of dye
-
terminator
read

Genome sequencing


1970s:
Bacteriophage


1995, the bacterium
Haemophilus

influenzae


Followed by several other bacteria and
archaea


The first eukaryotic chromosome sequence in 1992: yeast


Many eukaryotes several plants and their pathogens


2006: Human genome


Until 2006, all genome sequencing used Sanger chemistry



Shotgun
sequencing


Human
Genome Project


Genomic
DNA is
enzymatically

or
mechanically broken down


Cloned
into sequencing vectors


Sequenced
individually


Numerous
fragments of DNA sequenced

BIRTH OF GENOME
INFORMATICS AND NEXT GENERATION SEQUENCING


Whole genome sequencing


The core philosophy of massive parallel sequencing used in next
-
generation
sequencing (NGS) is adapted from shotgun
sequencing


NGS
-
breaking the entire genome into small pieces


Ligating DNA
to designated adapters


DNA
synthesis (sequencing
-
by
-
synthesis
)


massively
parallel sequencing


Coverage (number
of short reads that overlap each other within a specific
genomic
region)


Sufficient
coverage is critical for accurate assembly of the genomic
sequence.


To
ensure the correct identification of genetic
variants, short
-
read coverage
of at
least 30
×

is
recommended in whole
-
genome scans


(Zhang et al., 2011.
J Genet
Genomics, 38:95
-
109)


Next generation sequencing



Enables a genome to be sequenced within hours to days
.



The 454 FLX
Pyrosequencer

from
Roche Applied Sciences

was the first
next
-
generation sequencer to become commercially available in 2004,



The
Solexa

1G Genetic Analyzer from
Illumina

was commercialized
2006



SOLiD

(Supported
Oligonucleotide

Ligation and Detection) System from
Applied
Biosystems

launched in 2007




Next
-
next generation or third generation
sequencing



Single molecule sequencing



Technology

Amplification


o敡搠
汥湧瑨

T桲潵杨灵t

卥煵敮捥p批 sy湴桥h楳

Currently available

Roche/GS
-
FLX Titanium

Emulsion PCR

400
-
600
bp


500
Mbp/run

Pyrosequencing


Illumina/HiSeq

2000,
HiScan


Bridge PCR (Cluster
PCR)

2 x 100
bp

200
Gbp/run

Reversible terminators


ABI/
SOLiD

5500xl

Emulsion PCR

50
-
100
bp


>100
Gbp/run

Sequencing
-
by
-
ligation
(
octamers
)


Polonator/G.007

Emulsion PCR

26 bp

8
-
10
Gbp
/run

Sequencing
-
by
-
ligation
(monomers)

Helicos/Heliscope


No

35 (
25
-
55
)
bp


21e37
Gbp/run

True single
-
molecule
sequencing (
tSMS
)

In
development

Pacific

BioSciences
/RS

No

1000 bp

N/A

Single
-
molecule real time
(SMRT)

Visigen

Biotechnologies

No

>100
Kbp


N/A

U.S. Genomics

No

N/A

N/A

Single
-
molecule mapping


Genovoxx


No

N/A

N/A

Single
-
molecule
sequencing by synthesis


Oxford
Nanopore

Technologies

No

35 bp

N/A

Nanopores
/exonuclease
-
coupled


NABsys


No

N/A

N/A

Nanopores



E
lectronic
BioSciences


No

N/A

N/A

Nanopores


Platforms on NGS technologies



BioNanomatrix/nano

analyzer


No

400 Kbp

N/A

Nanochannel

arrays


GE
Global Research

No

N/A

N/A

Closed
complex/
nanoparticle



IBM

No

N/A

N/A

Nanopores



LingVitae


No

N/A

N/A

Nanopores



Complete
Genomics

No

70 bp

N/A

DNA
nanoball

arrays


base 4 innovation

No

N/A

N/A

Nanostructure arrays

CrackerBio


No

N/A

N/A

Nanowells



Reveo


No

N/A

N/A

Nano
-
knife edge


I
ntelligent
BioSystems


No

N/A

N/A

Electronics


lLightSpeed

Genomiics


N/A

Direct
-
read
sequencing
by EM

Next (2
nd
) generation platforms

3130XL

GS
-
FLX
-
Titanium

Genome Analyser

SOLiD

Applied
Biosystem

Roche

Illumina

Applied Biosystem

700bpx96

400bp x1 million

100bp x 2 billion

50bp x 2.4 billion

Specific targets

(
PCR
products,clones
)

De novo sequencing

Re
-
sequencing

(can de novo sequencing)

Re
-
sequencing

(can de novo sequencing)




Roche GS
-
FLX 454 Genome
Sequencer


Longest
short reads
(600
bp
) among
all the NGS
platforms


Generates
~400

600 Mb of sequence reads per
run


de
novo assembly of microbes in
metagenomics



Raw
base accuracy reported
is
very good (over 99
%)



Chemistry




Nucleotide incorporation releases
pyrophosphate

(
PPi
)




ATP
sulfurylase

quantitatively converts
PPi

to
ATP

in the
presence of adenosine 5
´

phosphosulfate
.




This ATP acts as fuel to the
luciferase
-
mediated conversion of
luciferin

to
oxyluciferin

that generates visible light in amounts
that are proportional to the amount of ATP.




The light produced in the
luciferase
-
catalyzed reaction is
detected by a camera and analyzed in a program.




Unincorporated nucleotides and ATP are degraded by the
apyrase
, and the reaction can restart with another nucleotide
.

Illumina
/
Solexa

Genome Analyzer


Superior data quality and proper read lengths have made it the system of
choice for many genome sequencing projects.


Majority of published NGS papers used Genome Analyzer.


uses a proprietary reversible terminator
-
based method that
enables detection of single bases as they are incorporated into
growing DNA strands


A fluorescently
-
labeled terminator is imaged as each
dNTP

is
added and then cleaved to allow incorporation of the next base.



Since all four reversible terminator
-
bound
dNTPs

are present
during each sequencing cycle, natural competition minimizes
incorporation bias.


The end result is true base
-
by
-
base sequencing that enables the
industry’s most accurate data for a broad range of applications.

Solexa
-
based Whole Genome Sequencing

Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3
alk

ABI
SOLiD

platform


The latest model, 5500
×
l solid system (previously known as SOLiD4hq)


Can generate over 2.4 billion reads per run with a raw base accuracy of
99.94%


The SOLiD4 platform probably provides the best data quality as a result
of its sequencing
-
by
-
ligation approach but the DNA library preparation
procedures prior to sequencing can be tedious and time consuming.



Preferred for Re
-
sequencing than
denovo

sequencing.



(Zhang et al., 2011)

Next generation sequencing using Roche 454


Sample Preparation


Nucleic acid isolation


Double
-
stranded
cDNA

synthesis


Rapid library preparation


Fragmentation (
Nebulization
/ shearing) into

smaller sized
fragments of 400 to 1000
bp




Addition of adopters




Remove small fragment (<300
bp
)




Library Quality Assessment
















Emulsion based
clonal

amplification (
emPCR
)



Preparation of reagents and of emulsion oil



Preparation of amplification mix (addition of additive, amplification


mix, primers, enzyme mix and
PPiase
)



DNA library capture (one molecule of DNA per bead and one bead

per aqueous
microreactor

to be insulated from other beads by

surrounding oil.



Emulsification (shaking captured library to form a water

in
-
oil

mixture)



Amplification (emulsified beads are clonally amplified)



Bead recovery and enrichment



Sequencing





Clonally amplified fragments loaded onto a
PicoTiter

Plate device for
sequencing (diameter of Plate wells allow only one bead per well)


After addition of sequencing enzymes, fluidics subsystem of
sequencing instrument flows individual nucleotides in a fixed order
across all wells


Addition of one (or more) nucleotide(s) complementary to the
template strand results in a
chemiluminescent

signal recorded by
the CCD camera within the instrument


During nucleotide flow, thousands of beads each carrying millions
of copies of
ss

DNA molecule are sequenced in parallel


Each 10
-
h sequencing run will typically produce over 1,000,000
flowgrams

(one
flowgram

per bead)








Base calling (to check quality of each read)


Trimming primer sequence


Production of
contigs

NGS platform under development (3
rd

Generation sequencers)


Aim single DNA molecule sequencing (without amplification)


Provides accurate data with long reads


i)
Flouresence

based single molecule sequencing (Pacific Biosciences;
US Genomics)


ii)
Nano

technologies for single molecule sequencing (Oxford
Nanopore

technologies,
Nabsys
,
BioNanomatrix
, Electronic Biosciences,
Cracker Bio)


iii)
Electronic detection for single molecule sequencing (
Reveo
, Intelligent
Biosystems
)


iv)
Electron microscopy for single molecule sequencing (Light speed
genomics, Halcyon Molecular, ZS Genetics)


Single Molecule Sequencing

(
Helicos

Biosciences, USA)


Billions of single molecules of sample DNA are captured on an application
-
specific proprietary surface serve as templates for the sequencing
-
by
-
synthesis


Polymerase and one fluorescently
labeled

nucleotide (C, G, A or T) are added.

The polymerase
catalyzes

the sequence
-
specific incorporation of fluorescent
nucleotides into nascent complementary strands on all the templates.


After a wash step, which removes all free nucleotides, the incorporated
nucleotides are imaged and their positions recorded.


The fluorescent group is removed in a highly efficient cleavage process, leaving
behind the incorporated nucleotide.


The process continues through each of the other three bases.


Multiple four
-
base cycles result in complementary strands greater than 25 bases
in length synthesized on billions of templates

providing a greater than 25
-
base
read from each of those individual templates.

Single
Molecule
Sequencing


(
Helicos

Biosciences,
USA)


Ion Sequencing


(
Rothberg et al., Life technologies: Nature, July 2011)


Non
-
optical method of DNA sequencing of genomes


Sequence data obtained by directly sensing the ions produced by
template
-
directed DNA polymerase synthesis using all
-
natural
nucleotides on this massively parallel semiconductor
-
sensing device
or ion chip



The ion chip contains ion
-
sensitive, 1.2 million wells, which provide
confinement and allow parallel, simultaneous detection of
independent sequencing reactions.


Performance of the system showed by sequencing three bacterial
and one human genome


World’s smallest solid state pH meter

DNA is fragmented, ligated to adapters, and clonally amplified onto beads.



Sequencing primers and DNA polymerase are then bound to the templates and
pipetted

into the chip’s loading port. Individual beads are loaded into individual sensor wells by
spinning. Well depth will allow only a single bead to occupy a well


All four nucleotides are provided in a stepwise fashion during an automated run. When
nucleotide in the flow is complementary to the template base directly downstream of the
sequencing primer, the nucleotide is incorporated into the nascent strand by the bound
polymerase.


This increases length of sequencing primer by one base (or more, if a
homopolymer

stretch is directly downstream of the primer) and results in the hydrolysis of the incoming
nucleotide triphosphate, which causes the net liberation of a single proton for each
nucleotide incorporated during that flow.


Release of proton produces a shift in pH of surrounding solution proportional to the no. of
nucleotides incorporated in the flow (0.02 pH units per single base incorporation). This is
detected by the sensor on the bottom of each well, converted to a voltage and digitized
by off
-
chip electronics . The signal generation and detection occurs over 4 s


After the flow of each nucleotide, a wash is used to ensure nucleotides do not remain in
the well.


Sequencing methods

Mining NGS data to obtain meaningful
information


Average NGS experiment generates gigabytes to terabytes of raw data


Existing bioinformatics tools functions fit into several general categories:
(
1
) alignment of reads to a reference sequence (2) de novo assembly (3)
reference
-
based assembly (
4
) genetic variation detection (such as SNV,
Indel
) (
5
) genome annotation (6) utilities for data analysis.


The most important step in NGS data analysis is successful
assembly or
alignment of reads to a reference genome
.


After successful alignment and assembly the next step is to interpret the
large number of
putative novel genetic variants (or mutations) present
by chance


Recognition of functional variants is at the
center

of the NGS data
analysis and bioinformatics

Thanks