MAlign, a tool for aligning mRNA sequences to genomic sequences

weinerthreeforksΒιοτεχνολογία

2 Οκτ 2013 (πριν από 4 χρόνια και 12 μέρες)

95 εμφανίσεις

Structural Modelling and
Bioinformatics in Drug Discovery
and Infectious Disease

Shoba Ranganathan


Professor and Chair


Bioinformatics


Dept. of Chemistry and Biomolecular Sciences &

Adjunct Professor


Biotechnology Research Institute

Dept. of Biochemistry

Macquarie University

Yong Loo Lin School of Medicine

Sydney, Australia


National University of Singapore, Singapore

(shoba.ranganathan@mq.edu.au)

(shoba@bic.nus.edu.sg)


Visiting scientist @

Institute for Infocomm Research (I
2
R), Singapore




Bioinformatics is …..


Bioinformatics is the study of living
systems through computation

Data in Bioinformatics (in the main)





and their management and analysis


Networks,

pathways

and systems

Sequences

Genomes

Transcriptomes

Databases,

ontologies

Data & text
mining

Evolution and

phylogenetics

Maths/Stats

Algorithms

Physics/

Chemistry

Genetics and

populations

Structures

What is Immunoinformatics?


Using Bioinformatics to address problems
in Immunology


Application of bioinformatics to
accelerate
immune system

research has
the potential to deliver vaccines and
address immunotherapeutics.


Computational systems biology

of
immune

response

Immunoinformatics


Immunology

Computer

Science

Biology

Summary


Introduction


Structural Immunoinformatic
Database development


Data Analysis


Computational models


Applications


Networks,

pathways

and systems

Genetics and

populations

-
omics

Basic

immunology

Clinical

immunology

The immune system


Composed of many
interdependent cell
types,


organs and tissues


2nd most complex
system in


the human body

Figure by Dr. Standley LJ



Two types:

1.
Innate Immune System

2.
Adaptive Immune
System



It is a numbers game….


>
10
13

MHC

class

I

haplotypes

(IMGT
-
HLA)



10
7
-
10
15

T

cell

receptors

(Arstila

et

al
.
,

1999
)



>
10
9

combinatorial

antibodies

(Jerne,

1993
)




10
12

B

cell

clonotypes

(Jerne,

1993
)



10
11

linear

epitopes

composed

of

nine

amino

acids



>>
10
11

conformational

epitopes

Adaptive immune system


Major
Histocompatabilit
y Complex (MHC
Class I and II)


Human
Leukocyte
Antigen (HLA
in human)



Peptide binding
to MHC



Recognition of
pMHC


complex by the
TCR



Activation of T
cells



MHC Class I


CD8+
cytotoxic T cells



MHC Class II


CD4+ helper T cells

www.immunologygrid.org

1. Epitope

3. T cell receptor

How to generate a T cell
-
mediated
immune response

2. MHC

1.
Degradation of antigen

2.
Peptide binding to MHC

3.
Recognition of peptide
-
MHC complex by T
-
cells

Yewdell et al. Ann. Rev Immunol (1999)

20%
processed

0.5%

bind MHC

50%

CTL response

0.05% chance of immunogenicity

Antigen processing pathway:


peptides, MHC, T
-
cells


Physico
-
chemical properties

affect MHC
-
peptide binding




Suggest candidate epitopes by
in silico
screening of entire proteins and even
proteomes with specificity at:


the allele level


the supertype level


disease
-
implicated alleles
alone.


Minimize the number of wet
-
lab experiments


Cut down the lead time involved in epitope
discovery and vaccine design


Computational models can help
identify T cell epitopes

1.
Sequence
-
based approach


Pattern recognition techniques


binding motif, matrices,
ANN, HMM, SVM


Main limitations:


Require large amount of data for training


Preclude data with limited sequence conservation

2.
Structure
-
based approach


Rigid backbone modeling techniques


Flexible docking techniques


Main advantage: large training datasets unnecessary


Predicting MHC
-
binding peptides

Tong, Tan and Ranganathan (2007)
Briefings in Bioinformatics
8: 96
-
108

Our aim: Structure
-
based prediction

of MHC
-
binding peptides













Great potential to:


generate biologically meaningful data for analysis


predict candidate peptides for
alleles that have not
been widely studied, where sequence
-
based
approaches fail or are not attempted


predict binding affinity of peptides


predict non
-
contiguous epitopes



Structure determination through experimental methods is
both expensive and time
-
consuming


Has not been extensively studied due to high
computational costs and development complexity

Why structure?


Protein Threading
[Altuvia et al. 1995;
Schueler
-
Furman
et al. 2000
]


Homology Modeling
[Michielin et al. 2000]


Rigid/Flexible Docking
[Rosenfeld et al. 1993;
Sezerman et al. 1996; Rognan et al. 1999; Desmet et
al. 2000; Michielin et al. 2003]


Existing Structure
-
based


Prediction Techniques


Hypothesis for epitope selection


Peptides bound to MHC alleles are similar to
substrates bound to enzymes


“Lock
-
and
-
key” mechanism for peptide
selection


Shape


Size


Electrostatic characteristics


Introduction


Structural Immunoinformatic
Database development


Data Analysis


Computational models


Applications


Sequences

Databases,

ontologies

Basic

immunology

Genetics and

populations

Structures


MPID:MHC
-
Peptide Interaction Database



Govindarajan et al. (2003) Bioinformatics, 19: 309
-
310

RDB of 82 curated pMHC complexes (Class I: 64 & Class II:18)














Gap index =

Peptide/MHC interaction
characteristics

Gap Volume

Intermolecular

hydrogen bonds

Interface area

Gap volume

Interface area

Interacting Residues

Peptide

Length


MPID
-
T: MHC
-
Peptide
-
T Cell Receptor


Interaction Database

Tong et al. (2006)


Applied Bioinformatics, 5: 111
-
114


187 curated pMHC


16 with TCR


Human:110, Murine:74
and Rat:3


Alleles: 40















(interface area, H bonds, gap
volume and gap index)


101 new entries


187 entries (Human: 110; Murine: 74; Rat: 3)


134 non
-
redundant entries (class I: 100; class II: 34)


121 class I

and 41 class II entries


26 HLA alleles (class I: 18; class II: 8)


14 rodent alleles (class I: 8; class II: 6)


16 TCR/peptide/MHC complexes

Distribution of MHC by allele

Peptide/MHC binding motifs


Conserved peptide properties in solution structures


Classified according to


Alleles


Peptide length

Polar

Amide

Basic

Acidic

Hydrophobic

1.
There were only

36

crystal structures of
unique MHC (2006) alleles
vs.

1765

unique
MHC alleles identified in IMGT/HLA database

2.
Structure determination through experimental
methods is both expensive and time
-
consuming

3.
Homology model building

for alleles with no
structural data!


How to obtain structures of
experimentally unsolved alleles?



Introduction


Structural Immunoinformatic
Database development


Data Analysis

of pMHC Class I
complexes


Computational models


Applications


Data & text
mining

Maths/Stats

Structures


Class I peptides


N
-
termini residues



0.02


0.29
Å


C
-
termini residues



0.00


0.25 Å



Class II binding registers


Only 9 residues fit in
the binding groove


N
-
termini residues



0.01


0.22 Å


C
-
termini residues



0.02


0.27 Å

Conservation of nonamer peptide
backbone conformation


Introduction


Structural Immunoinformatic
Database development


Data Analysis



Computational models


Applications


Maths/Stats

Structures

Sequences

Physics/

Chemistry

1.
Finding the best fit conformation (docking) of
peptides within the MHC binding groove


2.
Screening potential binders from the
background

Two
-
step approach to predict

MHC
-
binding peptides


Docking is a computationally
exhaustive procedure


Large number of possible peptide conformations


3 global translational degrees of freedom


3 global rotational degrees of freedom


1 conformational degree of freedom for each rotatable bond

y

x

z


R


N


C


C
a


C

O





>10
10

possible conformations for a 10
-
residue peptide

Rapid docking of peptide to MHC


Tong, Tan & Ranganathan (2004)
Protein Sci.

13:2523
-
2532


Anchoring root fragments


to reduce search space
(
Pseudo
-
Brownian rigid body docking )


Loop modeling (
Loop closure


of central backbone by


satisfaction of spatial restraints)


Ligand backbone and


side
-
chain refinement (
entire
backbone and interacting side
-
chains

2

3

1

Benchmarking with existing techniques

Author

Technique

Peptide

RMSD
a

RMSD
b

Rognan et al.

Simulated Annealing

TLTSCNTSV

1.04

0.46

FLPSDFFPSV

1.59

1.10

GILGFVFTL

0.46

0.32

ILKEPVHGV

0.87

0.87

LLFGYPVYV

0.78

0.33

Desmet et al.

Combinatorial Buildup Algorithm

RGYVYQGL

0.56

0.32

Rosenfeld et al.

Multiple Copy Algorithm

FAPGNYPAL

2.70

0.40

GILGFVFTL

1.40

0.32

Sezerman

et al.

Combinatorial Buildup Algorithm

LLFGYPVYV

1.40

0.33

ILKGPVHGV

1.30

0.87

GILGFVFTL

1.60

0.32

TLTSCNTSV

2.20

0.46


a
RMSD of peptide backbone obtained from respective authors.

b
RMSD of peptide backbone
obtained in our work from redocking bound complexes and single template respectively.


Quantitative separation of binders from
non
-
binders: empirical free energy scoring
function


DQ3.2
b

involved in several autoimmune
diseases:


Celiac disease


insulin
-
dependent diabetes mellitus


IDDM
-
associated periodontal disease


autoimmune polyendocrine syndrome
type II



G
bind

= α

G
H

+ β

G
S

+

G
EL

+
C





G
bind

= binding free energy



G
H


= hydrophobic term



G
S



= decrease in side chain entropy



G
EL


= electrostatic term


C




= entropy change in system due to external factors


α
,
β
,
γ

optimized by least
-
square multivariate regression
with experimental binding affinities (IC
50
) of MHC
-
peptides
in training dataset (Rognan
et al
., 1999)

Quantitative separation of binders
from non
-
binders: empirical free
energy scoring function

Test case: MHC Class II DQ8


DQ3.2
b

(DQA1*0301/DQB1*0302)

is
involved in several autoimmune diseases:


Celiac disease


insulin
-
dependent diabetes mellitus


IDDM
-
associated periodontal disease


autoimmune polyendocrine syndrome
type II


Data used


Structure:
1JK8

-

DQ3.2
β

insulin B9
-
23 complex


Dataset I:

127

peptides with experimentally determined
IC
50

values

[70 high
-
affinity (IC
50

<

500 nM)
, 13 medium
-
affinity (
500 nM < IC
50

< 1500 nM

)and 23 low
-
affinity
(
1500 < IC
50

< 5000 nM)
binders and 21 non
-
binders
(
5000 < IC
50
)] derived from
biochemical studies.


87

with
known binding registers
.




Dataset II:
12

Dermatophagoides pternnyssinus (
Der p
2)
peptides with
experimental T
-
cell proliferation

values from
functional studies, with
7

peptides eliciting DQ3.2
β
-
restricted
T
-
cell proliferation
.



Training


56 binding conformations with known registers


30 non
-
binding conformations from 3 non
-
binders


Testing


Test set 1


68 peptides from biochemical
studies


16 strong ; 13 medium; 21 weak; 18 non
-
binders


Test set 2


12 peptides from functional studies


7 elicit T
-
cell proliferation

Scoring: Training & testing datasets

Y Q T I E E N I K I F E E D A

E285B 112
-
126 peptide

Core

sequence

Binding Energy

YQTIEENIK

-
23.12

QTIEENIKI

-
21.34

TIEENIKIF

-
25.32

IEENIKIFE

-
29.53

EENIKIFEE

-
32.27

ENIKIFEED

-
21.72

NIKIFEEDA

-
22.95

Screening class II binding register:
a sliding window approach


Docking


Anchoring root fragments


(probes) to reduce search


space


Loop modeling


Refinement of binding


register


Extension of flanking


residues for MHC Class II

A

B

C

D

4
-
step protocol used


Sensitivity (SE)

= number of binders correctly predicted



= TP/AP (TP+FN)


Specificity (SP)

= number of non
-
binders correctly predicted



= TN/AN (TN+FP)





Accuracy estimates

Area under ROC (receiver
operating characteristics)
curve:

>90% excellent

>80% good

Results for Training set


High SE (good
for most
predictions)




Very few FPs
,
but also fewer
predictions

Group

LMH

MH

H

A
ROC

0.88

0.93

0.93

Screening class II binding register:

HLA
-
DQ8 prediction accuracy for
Test Set I


Classification of binding peptides



High
-
affinity binders (H)


IC50

500 nM


Medium
-
affinity binders (M)



500 nM < IC50


1500 nM


Low
-
affinity binders (L)


1500 < IC50


5000 nM

Test Set 1: Improved
detection of binders
lacking position specific
binding motifs


Binding registers


20/23 (87%) binding registers


Only register (aa 4
-
12) from Test Set 2
(
Der p
2: 1
-
20
)

(SE=0.80; SP(LMH)=0.90)




Top 5 predictions are experimental
positives at very stringent threshold criteria
(SE=0.95; SP(H)=0.63)


T
-
cell proliferation

Multiple registers (SP=0.95, SE(LMHP
=0.81): 58% of Test Set 1)


Mainly for medium and high binders


Experimental support: Sinha
et al.

for
DRB1*0402


Is this why binding motifs are unsuccessful?


Introduction


Structural Immunoinformatic Database
development


Data Analysis


Computational models developed


Applications



Autoimmune blistering skin disorder


Characterized by autoantibodies targeting
desmoglein
-
3 (Dsg3)


Strong association with DR4 and DR6 alleles

Pemphigus vulgaris (PV)

http://www.medscape.com

adam.about.com

www.aafp.org

Who are the major players in PV?


DR4 PV implicated alleles (for Semitic)


DRB1*0401


DRB1*0402


DRB1*0404


DRB1*0406



DR6 PV implicated alleles (for Caucasians)


DRB1*1401


DRB1*1404


DRB1*1405


DQB1*0503


DR4 PV implicated alleles
(DRB1*0401, *0402, *0404, *0406)



High sequence conservation


97.9


99.0% identity


98.4


99.5% similarity


High structural conservation


C
α

RMSD <0.22 Å for all key binding pockets


7 polymorphic residues within binding cleft


Pocket 1 (
β
86),


Pocket 4 (
β
70, 71, 74)


Pocket 6 (
β
11)


Pocket 7 (
β
71)


Pocket 9 (
β
37)


What is known about DR4?

DR6 PV implicated alleles

(DRB1*1401, *1404, *1405, DQB1*0503)



High sequence conservation


85.8


94.1% identity


83.2


97.3% similarity


High structural conservation


C
α

RMSD

<0.22 Å for all key binding pockets


14 polymorphic residues within binding clefts


Pocket 1 (
β
86)


Pocket 4 (
β
13,
70, 71, 74, 78)


Pocket 6 (
β
11)


Pocket 7 (
β
28, 30, 67,
71)


Pocket 9 (
β
9,
37, 57, 60)


What is known about DR6?


9 stimulatory Dsg3 peptides tested on PV patients
possessing DR4 and DR6 PV implicated alleles


1.
Dsg3 96
-
112

(DR4, DR6)

2.
Dsg3 191
-
205 (DR4, DR6)

3.
Dsg3 206
-
220 (DR4, DR6)

4.
Dsg3 252
-
266 (DR4, DR6)

5.
Dsg3 342
-
356 (DR4, DR6)

6.
Dsg3 380
-
394 (DR4, DR6)

7.
Dsg3 763
-
777 (DR4, DR6)

8.
Dsg3 810
-
824 (DR4)

9.
Dsg3 963
-
977 (DR4)

Clues…

DR4 PV


8/9 investigated Dsg3 peptides fit perfectly into DRB1*0402


Atomic clashes with all other investigated DR4 subtypes


DR6 PV


6/9 investigated Dsg3 peptides fit perfectly into DRB1*0503


Atomic clashes with all other investigated DR6 subtypes


HLA association in DR6 PV more likely to be at DQ than
DR locus


Consistent with experimental work done by Sinha
et al
.
(2002, 2005, 2006)

Disease associated alleles vs.
innocent bystanders

Tong
et al.

(2006)
Immunome Research
, 2: 1


1/9

investigated Dsg3 peptides fits existing binding motifs


Flanking residues


clashes in fitting binding register


Register
-
shift for Peptide V (Dsg3 342
-
356)


Detected binding register: Dsg3 346
-
354


Binding motifs: Dsg3 347
-
355
(Veldman
et al
., 2003)



: Dsg3 345
-
353
(Sinha
et al
., 2006)


Whither sequence motifs (again!)?


Docking of 936 15mer Dsg3 peptides generated using a
sliding window of size 15 across the entire Dsg3 glycoprotein

Large
-
scale screening of Dsg3
peptides


Dsg3 peptide (sliding window width 15)

N

C

Binding register

(sliding window width 9)

Flanking residues

Tong
et al.

(2006)
BMC Bioinformatics
, 7(Suppl 5): S7


Training set: 8
peptides each,
with exp. IC
50

values and known
binding registers
(5 binders and 3
non
-
binders)

Large
-
scale screening of Dsg3 peptides

Common epitopes possibly responsible
for inducing disease in DR4 & DR6
patients

Significant level of cross reactivity observed between
DRB1*0402 and DQB1*0503 ( A
ROC
=0.93)



57%

of peptides investigated in this study predicted to
bind to both alleles with high affinity


90% of known Dsg3 peptides predicted to bind to both
alleles



12/20 top predicted DQB1*0503
-
specific Dsg3 peptides
from transmembrane region


All top predicted DQB1*0402
-
specific Dsg3 peptides
from extracellular regions


Disease initiation implications: DR4 from ECD; DR6 from
TM



Multiple binding registers revisited


76% (410/539) predicted high
-
affinity binders to DRB1*0402
possess > 2 binding registers


57% (384/673)
predicted high
-
affinity binders to DQB1*0503
possess > 2 binding registers


66% (354/539) bind both alleles at different registers


Similar proportion (70%) detected in known binders to both
alleles



Both alleles bind


similar peptides via


different binding


registers


What next?


We have developed a predictive model for
HLA
-
C (Cw*0401)

with very limited (
only
six
) experimental binding values.


The model yields excellent results for test
data (A
ROC
=0.93)
.


Application to determine
immunological
hot spots for HIV
-
1 p24
gag

and gp160
gag

glycoproteins shows binding energies
similar to HLA
-
A and

B.

Conclusions


Computational models for immunogenic
epitope prediction can be successfully
developed, even for alleles with limited
experimental data.


While computations can never completely
replace “wet
-
lab” experiments,
in silico

predictions can significantly cut down the
development time of therapeutic vaccines.

Acknowledgements


Dr. (
Victor
) J.C. Tong, I2R, Singapore


A/Prof. Tin Wee Tan, NUS


Dr. Animesh Sinha, Weill Medical College of
Cornell University & Michigan State
University, USA


Drs. J. Tom August (JHU) and Vladimir
Brusic (DFCI) (NIAID
-
NIH Grant #5 U19
AI56541 & Contract
#HHSN266200400085C).


All of you!