Structural bioinformatics for glycobiology

dasypygalstockingsBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

69 views

Structural bioinformatics for
glycobiology

Structural glycoinformatics approaches


Structural modeling


Comparative modeling of glycoproteins


Complex modeling: glycoprotein replacement


Modeling of the complex of glycans and GBPs and GTs:


docking


Analysis of interaction specificities


Key residues vs. Specific glycan conformations


Molecular Dynamics


Modeling the dynamics of the recognition of glycans by
GBPs


Modeling the
enzymology

of GTs: quantum mechanic
calculations


obtain sequence (target)

fold assignment

comparative

modeling

ab initio

modeling

build, assess model

Approaches to predicting protein structures

high identity

long alignment

low identity

fragment alignment

Sequence
-
sequence alignment or

Sequence
-
structure alignment

Comparative modeling of proteins


Definition
:



Prediction

of

three

dimensional

structure

of

a

target

protein

from

the

amino

acid

sequence

(primary

structure)

of

a

homologous

(template)

protein

for

which

an

X
-
ray

or

NMR

structure

is

available
.



Why a Model
:


A

Model

is

desirable

when

either

X
-
ray

crystallography

or

NMR

spectroscopy

cannot

determine

the

structure

of

a

protein

in

time

or

at

all
.

The

built

model

provides

a

wealth

of

information

of

how

the

protein

functions

with

information

at

residue

property

level,

e
.
g
.

the

interaction

with

the

ligands,

GBPs/GTs

with

glycans
.

?

KQFTKCELSQNLYDIDGYGRIALPELICTMF
HTSGYDTQAIVENDESTEYGLFQISNALWCK
SSQSPQSRNICDITCDKFLDDDITDDIMCAK
KILDIKGIDYWIAHKALCTEKLEQWLCEKE

Comparative Modeling

(or homology modeling)

Use as template &
model

8lyz

1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK
FESNFNTQATNRNTDGSTDYGILQINSRWWCND
GRTPGSRNLCNIPCSALLSSDITASVNCAKKIV
SDGNGMNAWVAWRNRCKGTDVQAWIRGCRL

Share Similar
Sequence

Homologous

Homology models have RMSDs less than 2Å more than 70% of the time.

Homology models can be very smart!

.

0

2

0

4

0

6

0

8

0

1

0

0

0

5

0

1

0

0

1

5

0

2

0

0

2

5

0

Number of residues aligned

Percentage sequence

identity/similarity

(B.Rost, Columbia, NewYork)

Sequence identity implies
structural similarity

Don’t


know


region
.....

Sequence similarity implies structural similarity?

Step 1: Fold Identification


Aim: To find a template or templates structures from protein database (PDB)


Improved Multiple sequence alignment methods improves
sensitivity
-

remote homologs
PSIBLAST, CLUSTAL

pairwise sequence alignment
-

finds high homology sequences
BLAST

Fold recognition programs


find low homology sequences (threading,
profile
-
profile alignment)

Step 2: Model Construction


Aim: To build three dimension (3D) structures of proteins, coordinates of every
atoms of the homology proteins

Approach 1: protein structure buildup: cores, loops and sidechains;


Approach 2: whole protein modeling: constraint
-
based optimization.


Commonly used programs:


Modeller (
http://salilab.org/modeller/
)


Swiss
-
model (
http://swissmodel.expasy.org/
)


Geno3D (
http://geno3d
-
pbil.ibcp.fr/
)


… …


Step 3: Model Construction

Modeling of glycan
-
protein complexes


Template: glycan
-
protein complex;


Case 1: same glycan, different protein


Glycoprotein replacement: comparative modeling of protein
structure


Energy minimization, allowing structural flexibility of glycans


Case 2: same protein, different glycan


Flexible
docking

of glycans


Case 3: different protein and different glycan


Comparative modeling of proteins


Flexible
docking

of glycan


Can also be applied without a template of complex

Flexible docking


Semi
-
flexible (rigid protein, flexible
ligand
)


Useful for drug screening


>150 programs: Dock,
AutoDock
,
FlexX
/
FlexE
, …


Flexible protein: mainly
sidechains

(hard)


Two elements of semi
-
flexible docking algorithms


ligand

sampling methods


Pattern matching: Genetic Algorithm, Molecular Dynamics, Monte
Carlo…


Treatment of intermolecular forces:


Simplified scoring functions: empirical, knowledge
-
based and
molecular mechanics e.g. AMBER, CHARMM, GROMOS, ...


Very simple treatment of
solvation

and entropy, or completely
ignored!


Flexible docking of glycans to proteins


Glycan structure sampling


Automatic generation / sampling of 3D glycan
structures: Sweet II (http://www.dkfz
-
heidelberg.de/spec/sweet2)


Docking of each glycan conformation to the GBP:
Scoring schemes


Empirical scores


Forcefield


GLYCAM: modified AMBER
forcefield

/ MD tools for glycans
(R. Woods group)


Challenge: water molecules

Flexibility of molecules


Atoms connected
by covalent bonds


Bond lengths and
bond angles are
rigid


Torsion (dihedral)
angles are flexible

Frequently used definitions of
glycosidic torsion angles

Angle

NMR style

C − 1
crystallographic
style

C + 1
crystallographic
style

ϕ

H
1

C
1

O

C′
x

O
5

C
1

O

C′
x

O
5

C
1

O

C′
x

ψ

C
1

O

C′
x

H′
x

C
1

O

C′
x

C′
x
−1

C
1

O

C′
x

C′
x
+1

ψ [(1

6)
-
linkage]

C
1

O

C′
6

C′
5

C
1

O

C′
6

C′
5

C
1

O

C′
6

C′
5

ω

嬨1

6)
-
linkage]

O

C′
6

C′
5

H′
5

O

C′
6

C′
5

C′
4

O

C′
6

C′
5

O′
5

ASN

sweet2: http://www.dkfz
-
heidelberg.de/spec/sweet2/

Induced fit? rigid receptor
hypethesis

Preferred torsion angles of glycans

Cone
-
like (left) and umbrella
-
like (right) topologies of
2
-
3 and 2
-
6
siaylated

glycans binding to influenza viral
HAs

Chandrasekaran, et. al. Nature Biotechnology 26, 107
-

113 (2008)

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155
-
1162

Combine structural analysis with the glycan array analysis: providing structural insights.

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155
-
1162

Ligand binding by
the scavenger
receptor C
-
type
lectin (SRCL) and
LSECtin

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155
-
1162

Binding of multiple classes
of ligands to DC
-
SIGN and
the macrophage galactose
receptor. Model of the
binding site in the
macrophage galactose
receptor with a bound
GalNAc residue, based on
the structure of the
galactose
-
binding mutant of
mannose
-
binding protein
that was created by
insertion of key binding site
residues from the galactose
-
binding receptor.

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155
-
1162

Mechanisms of mannose
-
binding protein interaction with ligands.

Molecular Dynamics: simulation of
molecular motions


Energy model of conformation


Two main approaches:


Monte Carlo
-

stochastic


Molecular dynamics


deterministic


Understand molecular function and
interactions


Catalysis of enzymes


Complementary to experiments


Obtain a movie of the interacting molecules



Basic Concepts of simulation of
molecular motion

1.
Compute energy for the interaction between
all pairs of atoms.

2.
Move atoms to the next state.

3.
Repeat.



Energy Function


Target function that MD uses to govern the
motion of molecules (atoms)


Describes the interaction energies of all atoms
and molecules in the system


Always an approximation


Closer to real physics
--
> more realistic, more
computation time (I.e. smaller time steps and
more interactions increase accuracy)

Hy = Ey

F = MA

exp(
-

D

E/kT)

domain

quantum

chemistry

molecular

dynamics

Monte Carlo

mesoscale

continuum



Length Scale

10
-
10

M

10
-
8

M

10
-
6

M

10
-
4

M

10
-
12

S

10
-
8

S

10
-
6

S

Taken from Grant D. Smith

Department of Materials Science and Engineering

Department of Chemical and Fuels Engineering

University of Utah

http://www.che.utah.edu/~gdsmith/tutorials/tutorial1.ppt


Scale in Simulations

The energy model


http://cmm.cit.nih.gov/modeling/guide_documents/molecular_mechani
cs_document.html

The NIH Guide to Molecular Modeling


Proposed by Linus Pauling
in the 1930s


Bond angles and lengths
are almost always the same


Energy model broken up
into two parts:


Covalent terms


Bond distances (1
-
2
interactions)


Bond angles (1
-
3)


Dihedral angles (1
-
4)


Non
-
covalent terms


Forces at a distance
between all non
-
bonded
atoms



The energy equation

Energy =


Stretching Energy +


Bending Energy +


Torsion Energy +


Non
-
Bonded Interaction Energy


These equations together with the data (parameters) required to describe
the behavior of different kinds of atoms and bonds, is called a
force
-
field.

Bond Stretching Energy

k
b

is the spring constant of the bond.

r
0

is the bond length at equilibrium.

Unique k
b

and r
0

assigned for each bond
pair, i.e. C
-
C, O
-
H

Bending Energy

k


is the spring constant of the bend.


0

is the bond length at equilibrium.

Unique parameters for angle bending are
assigned to each bonded triplet of atoms
based on their types (e.g. C
-
C
-
C, C
-
O
-
C, C
-
C
-
H, etc.)

Torsion Energy

A

controls the amplitude of the curve

n

controls its periodicity



shifts the entire curve along the
rotation angle axis (

).

The parameters are determined from
curve fitting.

Unique parameters for torsional rotation
are assigned to each bonded quartet of
atoms based on their types (e.g. C
-
C
-
C
-
C, C
-
O
-
C
-
N, H
-
C
-
C
-
H, etc.)

Non
-
bonded Energy

A

determines the degree the attractiveness

B

determines the degree of repulsion

q

is the charge

A

determines the degree the attractiveness

B

determines the degree of repulsion

q

is the charge

Simulating In A Solvent


The smaller the system, the more particles on the
surface


1000 atom cubic crystal, 49% on surface


10
6

atom cubic crystal, 6% on surface


Would like to simulate infinite bulk surrounding
N
-
particle system


Two approaches:


Implicitly


Explicitly


Periodic boundary conditions

Schematic representation of periodic
boundary conditions.

http://www.ccl.net/cca/documents/molecular
-
modeling/node9.html

Parameters for MD: Forcefield


Derived from direct experimental
measurements on small molecules (~10
atoms)



Commonly used: AMBER, CHARMM,
GROMOS, etc


GLYCAM for MD of glycoconjugates (derived from
AMBER forcefield)

Monte Carlo

Explore the energy surface by randomly probing the
configuration space by a Markov Chain approach

Metropolis method (avoids local minima):







1.
Specify the initial atom coordinates.

2.
Select atom
i

randomly and move it by random displacement.

3.
Calculate the change of potential energy,
D
E

corresponding to
this displacement.

4.
If
D
E

< 0, accept the new coordinates and go to step 2.

5.
Otherwise, if
D
E


0, select a random R in the range [0,1] and:

1.
If
e
-
D
E/kT

< R
accept and go to step 2


2.
If
e
-
D
E/kT



R
reject and go to step 2

Deterministic Approach


Provides us with a trajectory of the system.


From atom positions, velocities, and accelerations,
calculate atom positions and velocities at the next
time step.


Integrating these infinitesimal steps yields the
trajectory of the system for any desired time
range.


Typical simulations of small proteins including
surrounding solvent in the pico
-
seconds.


Deterministic / MD methodology


From atom positions, velocities, and
accelerations, calculate atom positions and
velocities at the next time step.


Integrating these infinitesimal steps yields the
trajectory of the system for any desired time
range.


There are efficient methods for integrating these
elementary steps with Verlet and leapfrog
algorithms being the most commonly used.

MD algorithm


Initialize system


Ensure particles do not overlap in initial positions
(can use lattice)


Randomly assign velocities.


Move and integrate.

{r(t), v(t)}

{r(t+
D
t), v(t+
D
t)}

Leapfrog algorithm

MD studies of Prion proteins


Prion

protein (
PrP
) is associated with an unusual class of
neurodegenerative

diseases


Scrapie

(sheep); bovine spongiform

encephalopathy (BSE) in cattle;
kuru
,
Creutzfeldt
-
Jacob

disease (CJD),
Gerstmann
-
Sträussler
-
Scheinker

syndrome

(GSS), and fatal familiar insomnia (FFI) in humans


Protein
-
only

hypothesis (
Prusiner
, 1982): the disease

is caused
by an abnormal form of the 250 amino acid
PrP
, which

accumulates in plaques in the brain.


PrP

(
PrP
Sc
) differs from the

normal cellular form (
PrP
C
) only in its 3
-
D structure,
and FTIR

and CD spectra indicate it has a significantly increased content

of ß
-
sheet conformation compared with
PrP
C


Glycosylation appears to protect
prion

protein (
PrP
C
)

from the conformational
transition to the disease
-
associated

scrapie

form (
PrP
Sc
);



PrP is a glyco
-
protein


Available NMR structures are for non
-
glycosylated

PrP
C

only


Glycosylation appears to protect prion protein
(PrP
C
)

from the conformational transition to
the disease
-
associated

scrapie form (PrP
Sc
)


Objective: study of the influence of two N
-
linked

glycans (Asn181 and Asn197) and of the
GPI anchor attached to

Ser230

Zuegg, et. al., Glycobiology, 2000, 10(10):959
-
974.

MD simulations


Molecular dynamics simulations on

the C
-
terminal region of human prion
protein
Hu
PrP(90

230),

with and without the three glycans


AMBER94 force field in a periodic box model with explicit water

molecules,
considering all long
-
range electrostatic interactions


Hu
PrP(127

227)

is stabilized overall from addition of the glycans,
specifically

by extensions of two helix and reduced flexibility

of the linking
turn containing Asn197;


The stabilization appears

indirect, by reducing the mobility of the
surrounding water

molecules, and not from specific interactions such as H
bonds

or ion pairs.


Asn197 having a stabilizing role, while Asn181 is within

a region with already
stable secondary structure

Zuegg, et. al., Glycobiology, 2000, 10(10):959
-
974.

Cone
-
like (left) and umbrella
-
like (right)
topologies of 2
-
3 and 2
-
6 siaylated glycans
binding to influenza viral HAs

Chandrasekaran, et. al. Nature Biotechnology 26, 107
-

113 (2008)

A retrospective analysis

MD simulation of glycan binding of
influenza HAs


A combined approach (MD + sequences) to predict ligand
-
binding mutants of H5N1 influenza HA


Modeling the ligand
-
bound state of H5N1 HA using the isolate VN1194
bound to α2,3
-
sialyllactose as previously crystallized


Excess mutual information was computed between each residue of each
monomer and the corresponding bound ligand, using the average
mutual information between the residue and all residues as an estimate
of the “background” mutual information.


Combine these results with sequence analysis of H5N1 mutational data
to predict clusters of residues that undergo coordinated mutation, which
have some capacity to vary but are subject to selective pressure relating
mutation. These residues may be richer targets to change ligand
specificity than residues absolutely conserved or residues that display
uncorrelated mutations (involved in immune escape).

Kasson, et. al., JACS, 2009, 131 (32), pp 11338

11340

Experimentally identified
ligand
-
binding mutations in
red, the top 5% of residues by
dynamics scoring in cyan
(overlap of these two in
magenta), and the six mutation
sites identified by both
dynamics and sequence
analysis in yellow.

The top three mutations
from the ligand dissociation
analyses in yellow. A
modeled α2,3
-
sialyllactose
is shown in orange.

Prediction of dissociation rate for
HA mutants (
in silico
mutagenesis)


Bayesian analysis methods to predict dissociation rates based
on extensive simulation of each mutant and evaluate whether
a mutant has a faster dissociation rate than the influenza
clinical isolate that we use as a wild
-
type reference.


These simulations were used to estimate the dissociation rate
for each mutation.


The mutation sites predicted by analysis of the molecular
dynamics data include both residues immediately contacting
the bound glycan and residues located farther away on the
globular head of the hemagglutinin molecule.