All

atom molecular simulations of protein
folding and unfolded

state dynamics and
structure with accelerated calculations on GPU
Cezary
Czaplewski
Faculty of Chemistry
University of
Gdańsk
Poland
The 10th Protein Folding Winter School, KIAS, February, 7

11, 2011
Molecular Simulation of
ab
Initio
Protein Folding for
a Millisecond Folder
NTL9(1

39)
Vincent A. Voelz,
1
Gregory R. Bowman,
2
Kyle Beauchamp,
2
Vijay S. Pande
1,2,3
1
Department of Chemistry, Stanford University,
2
Biophysics Program, Stanford University
3
Department of Structural Biology Stanford University
J
. AM. CHEM. SOC.
2010,
132
,
1526
–
1528
•
Computer simulations, validated by experiment, can
help gain a complete understanding of how proteins
fold.
•
Over a million

fold range in folding rates = possible
diversity in folding mechanism.
•
Folding@Home
using
GPU
allowing for several
folding trajectories of 39

residue NTL9(1

39), the
slowest

folding protein (~1.5 ms folding time)
folded
ab
initio
with all

atom
model MD
to date.
•
Insights into folding mechanism based on Markov
state model (MSM).
10

15
femto
10

12
pico
10

9
nano
10

6
micro
10

3
milli
10
0
seconds
b
ond
vibration
loop
closure
h
elix
form
ation
folding of

桡楲灩hs
protein
fold
ing
all atom
MD step
sidechain
rot
ation
GPU
•
Type of CPU attached to a
graphics
card
dedicated to
calculating floating point operations
•
Incorporates
stream processing
microchips which
contain
special mathematical operations
•
Stream Processing: applications can use
multiple
computational units
without explicitly managing
allocation, synchronization, or communication
among those units.
CPU vs. GPU
CPU
–
4 cores
Floating

Point Operations per Second for the CPU and GPU
Trp

cage 4.1
m
s
Pitera
, Swope,
PNAS
2003
Proteins folded
ab
initio
by all atom MD
Fip35 WW 13
m
s
Ensign,
Pande
,
Biophys
. J.,
2009
Villin
headpiece 10
m
s
Zagrovic
, Snow, Shirts,
Pande
,
JMB
2002
Fast folding
villin
variant <1
m
s
Ensign, Kasson,
Pande
,
JMB
2007
NTL9(1

39)
~1.5 ms
experimental folding
time
•
Folding@Home
using
Gromacs
with
OpenMM
library written
specially for GPU allowing dramatically longer trajectories
•
AMBER ff96 with
Onufriev
,
Bashford,Case
GBSA
•
Up to 10000 parallel MD simulations at 300, 330, 370 and 450K
•
Starting from native, random coil, extended
•
Aggregate 1.52 ms
•
Out of the ~
3000
trajectories started from unfolded states at
370K only
two
reach <3.5 Å RMSD and
eight
<4 Å RMSD
•
Number of folding events is consistent with a simple model of
parallel uncoupled folding as a two

state Poisson process:
〈
n
〉
= ∫M(t)k exp(

M(t)
kt
)
dt
M(t)
is the number of parallel simulations that reach time
t.
k
is ~640/s experimental folding rate
Distributions of
rmsd
for native

state simulations of NTL9(1−39) after 10
μs
The number of parallel simulations
at 370 K that reach time
t
.
Posterior predictions of
the folding rate
A snapshot from a
folding trajectory 3.1 Å
RMSD
Non

native and native

like hydrophobic core
arrangements
Markov state model (MSM)
•
MSM constitutes a
kinetic
clustering
•
Conformations that can interconvert rapidly are grouped
into the same state
•
Conformations that can only interconvert slowly are
grouped into separate states
•
Satisfies the Markov property
—
the identity of the next
state depends only on the identity of the current state and
not any of the previous states
•
Transition probability matrix
T
propagates state
probabilities
p
•
An implied timescale
k
for given lag time
t
can be
calculated from the
eigenvalues
m
of matrix
T
Detail of
MSMBuilder
package
•
100,000 microstates were generated by clustering
conformations separated by 10 ns using
k

centers algorithm
•
The remaining 90% of the data was then assigned to these
clusters
•
The resulting microstates had an average radius of ~4.5 Å
•
A
macrostate
model generated by lumping microstates into
2,000
macrostates
using the Robust
Perron
Cluster Analysis
(PCCA+) algorithm
•
Although only a few folding trajectories were observed
directly, a network of many possible pathways can be
inferred from the overlapping sampling of local transitions.
•
Top 10 folding fluxes, calculated by a greedy backtracking
algorithm
Implied timescales Markov State Models
(MSMs) built at lag times between 1 and 32 ns
100,000

microstate model
2000

macrostate model
A scatter plot of the 2000
macrostates
Shown in
red
are the 14
macrostates
transited by the top ten pathway fluxes
A 2000

state Markov State Model (MSM).
The top 10 folding pathways account for
∼
25% of the
total flux and transit 14 of the 2000
macrostates
Contact profile subspaces used to
calculate
Q
a
Q
12
Q
13
nat
nat
nat
c
c
c
c
Q
c(x)
–
contact profile
indexed by
x = (
i
, j)
The 14
macrostates
plotted along structural and
kinetic reaction coordinates
Contact profiles for the 14
macrostates
involved
in the top folding pathways
Values of
Q
for each of the 14
macrostates
involved in the top ten folding pathways
Q

values plotted versus
p
fold
(
committor
) values
Macrostates
l
,
m
and
n
have very similar
structural ensembles and similar
p
fold
values
These states differ mostly in
their hairpin registrations and
packing of the hairpin loop.
Conclusions
•
Existing force field models using implicit solvent are
accurate enough to fold proteins
ab
initio
at long
time scales, opening
the door to simulating more
structurally complex proteins.
•
There need not be a single pathway or single,
dominant mechanism for the folding of a given
protein.
•
Multiple mechanisms could be
simultaneously
present .
•
The sequence of the protein, coupled with the
chemical environment, control the balance to which
each mechanistic pathway is seen.
Take

home message
•
GPU can speed up your simulations 10 times
•
Existing force field models using implicit solvent are
accurate enough to fold proteins during MD
.
•
With only a few folding trajectories observed directly,
a network of many possible pathways can be inferred
from kinetic clustering using the
Markov State Model.
•
Several pathways for the folding of a given protein.
•
Multiple folding mechanisms (a diffusion

collision or
nucleation

condensation) could be
simultaneously
present .
Comments 0
Log in to post a comment