NGS cancer genomics data processing and
analysis
Somak Roy, MD
Clinical fellow
Division of Urologic Surgical Pathology
University of Pittsburgh Medical Center
Outline
http://www.genome.gov/sequencingcosts/
Background
Epigenetic
profiling
NGS
Gene
fusion
detection
Mutation
profiling
Copy
number
variations
Structural
variants
Application in Cancer Genomics
NGS
in clinical domain
Sequence the sample DNA to obtain a string of characters
(ATGC)
Compare the obtained sequence to the reference
sequence (expected normal)
Any deviation from the reference (single or multiple
base(s)) is a variant.
Theme of DNA Sequencing
Evolution of Sequencing
Robison. Nat
Biotechnol
2011;29:805
-
7
Rothberg et al. Nature.
2011;475:348
-
52
Semiconductor Sequencing
Arch
Pathol
Lab Med. 2012;136:000
–
000;
doi
: 10.5858/arpa.2012
-
0107
-
RA
Optics
-
based Sequencing
NGS data processing elements
Signal processing
Raw signal
Normalization
Base calling
Quality control
FASTQ / unaligned BAM
Signal Processing
–
Non
-
optical
CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG
CTAGCTCG
CCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG
ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTA
CTAGCT
TAAGCTGATAGCTAGAG
CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG
CTAGCTCG
CCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG
ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTA
CTAGCT
TAAGCTGATAGCTAGAG
CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG
CTAGCTCG
CCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG
ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTA
CTAGCT
TAAGCTGATAGCTAGAG
CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG
CTAGCTCG
CCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG
ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTA
CTAGCT
TAAGCTGATAGCTAGAG
Signal Processing
-
Optical
CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG
CTAGCTCG
CCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG
ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTA
CTAGCT
TAAGCTGATAGCTAGAG
CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG
CTAGCTCG
CCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG
ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTA
CTAGCT
TAAGCTGATAGCTAGAG
CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG
CTAGCTCG
CCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG
ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTA
CTAGCT
TAAGCTGATAGCTAGAG
CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG
CTAGCTCG
CCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG
ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTA
CTAGCT
TAAGCTGATAGCTAGAG
Signal Processing
-
Homopolymer
Take a Peak into FASTQ !
Header: Sequence ID, additional info
Sequence
Optional header
Quality score
Phred
Score /
Phred
-
like score
Q
=
-
10*log
10
p
Per Base
Call score
Take a Peak into FASTQ !
Q
=
-
10*log
10
p
30
=
-
10*log
10
(10
-
3
)
20
=
-
10*log
10
(10
-
2
)
What are these characters ?
ASCII format
67
=
-
10*log
10
(p=?)
Read
Pile
-
up
G
G
G
Depth of
Coverage
5x
3x
Var
(G) frequency =3/5 (60%)
Variant
frequency
Mapping, Assembly & Variant Identification
Mapping / Alignment
Next
-
Generation DNA Sequencing Informatics. Ed. Brown SM. Cold Spring Harbor Laboratory Press. 2013
Mapping / Alignment
Pabinger
et al. Briefings in Bioinformatics. Jan 2013
Mapping / Alignment
-
QC
Next
-
Generation DNA Sequencing Informatics. Ed. Brown SM. Cold Spring Harbor Laboratory Press. 2013
Variant identification
Variant identification
Pabinger
et al. Briefings in Bioinformatics. Jan 2013
Variant identification
-
QC
Next
-
Generation DNA Sequencing Informatics. Ed. Brown SM. Cold Spring Harbor Laboratory Press. 2013
Annotation
1522648G>A
44512584G>C
8124526T>A
2544856_2544860
AATGC
..
55124785GA>CC
Public / custom databases
Nomenclature
Biological implication (Gene, transcript and protein level)
Genotype
-
phenotype correlations
Prognostic implication
Predictive implication
Report level
Definition / explanation
1
Missense
SNVs
, insertions and deletions, occurring in the coding, untranslated and splice site
regions, which are of
known
clinical significance.
2
Missense
SNV
, insertions and deletions, occurring in the coding, untranslated and splice site
regions, which are of
uncertain
clinical significance.
3
Variant(s) not classifiable into
levels
1, 2, 4, and 5 automatically by program. Pathologist is required
to review it and reclassify appropriately based on available evidence.
4
Synonymous SNV, SNV(s) with established evidence of benign biological and/or clinical outcome
and intronic variants (except splice
site and deep
intronic
with known significance
).
5
Sequence variants which have been proved to be platform specific sequencing errors based on
repeated experimental
data,
sanger
sequence confirmation
and thorough manual review of
read
pile
-
ups
(
eg
. Using
IGV
viewer).
Variant Classification
Visualization
Result Reporting, Management and Sharing
Targeted
sequencing
Whole
exome
sequencing
Whole genome
sequencing
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο