folding and unfolded-state dynamics and

gradebananaSoftware and s/w Development

Dec 2, 2013 (3 years and 8 months ago)

58 views

All
-
atom molecular simulations of protein
folding and unfolded
-
state dynamics and
structure with accelerated calculations on GPU

Cezary

Czaplewski

Faculty of Chemistry

University of
Gdańsk

Poland

The 10th Protein Folding Winter School, KIAS, February, 7
-
11, 2011


Molecular Simulation of
ab

Initio
Protein Folding for

a Millisecond Folder
NTL9(1
-
39)

Vincent A. Voelz,
1

Gregory R. Bowman,
2


Kyle Beauchamp,
2

Vijay S. Pande
1,2,3

1

Department of Chemistry, Stanford University,


2

Biophysics Program, Stanford University

3

Department of Structural Biology Stanford University

J
. AM. CHEM. SOC.
2010,
132
,
1526

1528


Computer simulations, validated by experiment, can
help gain a complete understanding of how proteins
fold.


Over a million
-
fold range in folding rates = possible
diversity in folding mechanism.


Folding@Home

using
GPU

allowing for several
folding trajectories of 39
-
residue NTL9(1
-
39), the
slowest
-
folding protein (~1.5 ms folding time)
folded
ab

initio

with all
-
atom

model MD

to date.


Insights into folding mechanism based on Markov
state model (MSM).


10
-
15

femto

10
-
12

pico

10
-
9

nano

10
-
6

micro

10
-
3

milli

10
0

seconds

b
ond

vibration

loop

closure

h
elix

form
ation

folding of


-
桡楲灩hs

protein

fold
ing

all atom
MD step

sidechain

rot
ation

GPU


Type of CPU attached to a
graphics
card

dedicated to
calculating floating point operations


Incorporates
stream processing
microchips which
contain

special mathematical operations


Stream Processing: applications can use
multiple

computational units

without explicitly managing

allocation, synchronization, or communication

among those units.

CPU vs. GPU

CPU


4 cores

Floating
-
Point Operations per Second for the CPU and GPU

Trp
-
cage 4.1
m
s

Pitera
, Swope,
PNAS

2003

Proteins folded
ab

initio
by all atom MD

Fip35 WW 13
m
s

Ensign,
Pande
,
Biophys
. J.,
2009

Villin

headpiece 10
m
s

Zagrovic
, Snow, Shirts,
Pande
,
JMB

2002

Fast folding
villin

variant <1
m
s

Ensign, Kasson,
Pande
,
JMB

2007

NTL9(1
-
39)

~1.5 ms

experimental folding
time


Folding@Home

using
Gromacs

with
OpenMM

library written
specially for GPU allowing dramatically longer trajectories


AMBER ff96 with
Onufriev
,
Bashford,Case

GBSA


Up to 10000 parallel MD simulations at 300, 330, 370 and 450K


Starting from native, random coil, extended


Aggregate 1.52 ms


Out of the ~
3000

trajectories started from unfolded states at
370K only
two

reach <3.5 Å RMSD and
eight

<4 Å RMSD


Number of folding events is consistent with a simple model of
parallel uncoupled folding as a two
-
state Poisson process:





n


= ∫M(t)k exp(
-
M(t)
kt
)
dt



M(t)

is the number of parallel simulations that reach time
t.


k

is ~640/s experimental folding rate

Distributions of
rmsd

for native
-
state simulations of NTL9(1−39) after 10
μs

The number of parallel simulations
at 370 K that reach time
t
.

Posterior predictions of

the folding rate

A snapshot from a
folding trajectory 3.1 Å
RMSD

Non
-
native and native
-
like hydrophobic core
arrangements

Markov state model (MSM)


MSM constitutes a
kinetic
clustering


Conformations that can interconvert rapidly are grouped
into the same state


Conformations that can only interconvert slowly are
grouped into separate states


Satisfies the Markov property

the identity of the next
state depends only on the identity of the current state and
not any of the previous states


Transition probability matrix
T
propagates state
probabilities
p



An implied timescale
k
for given lag time
t

can be
calculated from the
eigenvalues

m

of matrix
T


Detail of
MSMBuilder

package



100,000 microstates were generated by clustering
conformations separated by 10 ns using
k
-
centers algorithm


The remaining 90% of the data was then assigned to these
clusters


The resulting microstates had an average radius of ~4.5 Å


A
macrostate

model generated by lumping microstates into
2,000
macrostates

using the Robust
Perron

Cluster Analysis
(PCCA+) algorithm


Although only a few folding trajectories were observed
directly, a network of many possible pathways can be
inferred from the overlapping sampling of local transitions.


Top 10 folding fluxes, calculated by a greedy backtracking
algorithm

Implied timescales Markov State Models
(MSMs) built at lag times between 1 and 32 ns

100,000
-
microstate model

2000
-
macrostate model

A scatter plot of the 2000
macrostates


Shown in
red

are the 14
macrostates

transited by the top ten pathway fluxes

A 2000
-
state Markov State Model (MSM).


The top 10 folding pathways account for

25% of the
total flux and transit 14 of the 2000
macrostates


Contact profile subspaces used to
calculate
Q
a

Q

12

Q

13

nat
nat
nat
c
c
c
c
Q



c(x)


contact profile


indexed by
x = (
i
, j)


The 14
macrostates

plotted along structural and
kinetic reaction coordinates

Contact profiles for the 14
macrostates

involved

in the top folding pathways

Values of
Q
for each of the 14
macrostates

involved in the top ten folding pathways

Q
-
values plotted versus
p
fold

(
committor
) values

Macrostates

l
,
m

and
n

have very similar
structural ensembles and similar
p
fold

values

These states differ mostly in

their hairpin registrations and
packing of the hairpin loop.

Conclusions


Existing force field models using implicit solvent are
accurate enough to fold proteins
ab

initio
at long
time scales, opening

the door to simulating more
structurally complex proteins.


There need not be a single pathway or single,
dominant mechanism for the folding of a given
protein.


Multiple mechanisms could be
simultaneously
present .


The sequence of the protein, coupled with the
chemical environment, control the balance to which
each mechanistic pathway is seen.

Take
-
home message


GPU can speed up your simulations 10 times


Existing force field models using implicit solvent are
accurate enough to fold proteins during MD
.


With only a few folding trajectories observed directly,
a network of many possible pathways can be inferred
from kinetic clustering using the
Markov State Model.


Several pathways for the folding of a given protein.


Multiple folding mechanisms (a diffusion
-
collision or
nucleation
-
condensation) could be
simultaneously
present .