Bioinformatics pipeline for detection of immunogenic cancer ...

clumpfrustratedΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

83 εμφανίσεις

Towards Personalized
Genomics
-
Guided Cancer
Immunotherapy

Ion
Mandoiu

Department of Computer Science & Engineering


Joint work with

Sahar

Al
Seesi

(CSE)

Jorge
Duitama

(CIAT)

Fei

Duan
, Tatiana Blanchard,
Pramod

K.
Srivastava

(UCHC)

2

Mandoiu

Lab

Main Research Areas:


Bioinformatics Algorithms


Development of Computational Methods for Next
-
Gen Sequencing Data Analysis

Ongoing Projects


RNA
-
Seq

Analysis (NSF, NIH, Life Technologies)

-
Novel transcript reconstruction

-
Allele
-
specific
isoform

expression


Viral
quasispecies

reconstruction (USDA)

-
IBV evolution and vaccine optimization


Genome assembly and scaffolding, LD
-
based genotype calling, local ancestry
inference,
metabolomics
, …

-
More info & software at
http://dna.engr.uconn.edu


-
Computational
deconvolution

of heterogeneous samples

Genomics
-
Guided Cancer
Immunotherapy

C
T
C
AA
TT
G
A
T
G
AAA
TT
G
TT
C
T
G
AAA
C
T

G
C
A
G
A
G
A
T
A
G
C
T
AAA
GG
A
T
A
CC
GGG
TT

CC
GG
T
A
T
CC
TTT
A
G
C
T
A
T
C
T
C
T
G
CC
T
C

C
T
G
A
C
A
CC
A
T
C
T
G
T
G
T
GGG
C
T
A
CC
A
T
G



A
GG
C
AA
G
C
T
C
A
T
GG
CC
AAA
T
C
A
T
G
A
G
A

mRNA Sequencing

SYFPEITHI

ISETDLSLL

CALRRNESL



Tumor
Specific
Epitopes

Peptide

Synthesis

Immune System

Stimulation

Mouse Image Source: http://www.clker.com/clipart
-
simple
-
cartoon
-
mouse
-
2.html

Tumor

Remission

T
-
Cell

Response

Bioinformatics Pipeline

Read
Alignment


Hybrid alignment strategy (
HardMerge
)

Data
Cleaning


Clipping alignments & removal of PCR artifacts

Variant
Detection


Bayesian model based on quality scores (SNVQ)

Haplotyping


Max
-
Cut algorithm (
RefHap
)

Epitope

Prediction


PWM and ANN algorithms (
NetMHC
)

Hybrid Read Alignment Approach

http://en.wikipedia.org/wiki/File:RNA
-
Seq
-
alignment.png

mRNA
reads

Transcript
Library

Mapping

Genome
Mapping

Read
Merging

Transcript
mapped reads

Genome
mapped reads

Mapped
reads


More e
fficient

compared to spliced
alignment onto genome


Stringent filtering: reads with multiple
alignments are discarded

Clipping Alignments

0
0.5
1
1.5
2
2.5
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
63
65
67
69
71
73
Percentage of reads with
mismatches

Read position


Lane 1
Lane 2
Lane 3
Removal of PCR Artifacts

Variant Detection and Genotyping

AACGCGGCCAGCCGGCTTCTGTCGGCCAGCAGCCAGGAATCTGGAAACAATGGCTACAGCGTGC

AACGCGGCCAGCCGGCTTCTGTCGGCCAGCCG
G
CAG


CGCGGCCAGCCGGCTTCTGTCGGCCAGCAGCCCGGA


GCGGCCAGCCGGCTTCTGTCGGCCAGCCG
G
CAGGGA


GCCAGCCGGCTTCTGTCGGCCAGCAGCCAGGAATCT


GCCGGCTTCTGTCGGCCAGCAGCCAGGAATCTGGAA


CTTCTGTCGGCCAGCCG
G
CAGGAATCTGGAAACAAT


CGGCCAGCAGCCAGGAATCTGGAAACAATGGCTACA


CCAGCAGCCAGGAATCTGGAAACAATGGCTACAGCG


CAAGCAGCCAGGAATCTGGAAACAATGGCTACAGCG


GCAGCCAGGAATCTGGAAACAATGGCTACAGCGTGC

Reference

genome

Locus
i

R
i

Variant Detection and Genotyping


Pick genotype with the largest posterior probability

Accuracy as Function of Coverage

Haplotyping


Somatic cells are diploid, containing two nearly identical copies of
each
autosomal

chromosome


Novel mutations are present on only one chromosome copy


For
epitope

prediction we need to know if nearby mutations appear in
phase

Locus

Mutation

Alleles

1

SNV

C,T

2

Deletion

C,
-

3

SNV

A,G

4

Insertion

-
,GC

Locus

Mutation

Haplotype

1

Haplotype

2

1

SNV

T

C

2

Deletion

C

-

3

SNV

A

G

4

Insertion

-

GC

RefHap

Algorithm


Reduce the problem to Max
-
Cut


Solve Max
-
Cut


Build
haplotypes

according with the cut

Locus

1

2

3

4

5

f
1

*

0

1

1

0

f
2

1

1

0

*

1

f
3

1

*

*

0

*

f
4

*

0

0

*

1

3

f
1

1

1

-
1

-
1

f
4

f
2

f
3

h
1

00110

h
2

11001


Epitope

Prediction

J.W.
Yedell
, E
Reits

and J
Neefjes
. Making sense of mass destruction:
quantitating

MHC class I
antigen presentation. Nature Reviews Immunology, 3:952
-
961, 2003

C.
Lundegaard

et al
. MHC Class I
Epitope

Binding Prediction Trained on Small Data
Sets. In Lecture Notes in Computer Science, 3239:217
-
225, 2004

Profile weight matrix (PWM) model

R² = 0.5333

-20
-10
0
10
20
NetMHC

Score

SYFPEITHI Score

H2
-
Kd

Results on Tumor Data

Tumor Type

MethA

CMS5

RNA
-
Seq Reads (Million)

105.8

23.4

Genome Mapped

75%

54%

Transcriptome Mapped

83%

59%

HardMerge Mapped

50%

36%

HardMerge Mapped Bases (Gb)

3.18

0.41

High
-
Quality Heterozygous SNVs in CCDS Exons


1,504


232

Non
-
synonymous


1,160


182

Missense


1,096


178

Nonsense


63


4

No
-
stop


1


-

NetMHC

Predicted
Epitopes


836


142

Mean Tumor
Diameter (mm)

Days after tumor challenge

AUC (mm
2
)

P

< 0.0001


Tumor rejection potential of identified
epitopes

currently

evaluated
experimentally in the
Srivastava

lab

Ongoing Work


Sequencing of spontaneous tumors (TRAMP mice)


Detecting other forms of variation:
indels
, gene
fusions, novel transcripts


Incorporating predictions of TAP transport efficiency
and
proteasomal

cleavage in
epitope

prediction


Integration of mass
-
spectrometry data


Monitoring immune response by TCR sequencing