Basics of molecular dynamics simulations

rawsulkyInternet and Web Development

Dec 11, 2013 (3 years and 6 months ago)

116 views

Basics of molecular dynamics simulations


Reduction of the quantum problem to a classical one


Parameterization of force fields


Boundaries, energy minimization, integration
algorithms


Analysis of trajectory data


Lecture notes are at :

www.physics.usyd.edu.au/~serdar/ws


1

Quantum, classical and stochastic description

of molecular systems


Quantum mechanics (
Schroedinger

equation)


The most fundamental approach but feasible only for few atoms (~10).


Approximate methods (e.g. density functional theory) allows treatment of
larger systems (~1000) and dynamic simulations for picoseconds.


Classical mechanics (Newton’s equation of motion)


Most atoms are heavy enough to justify a classical treatment (except H).
The main problem is finding accurate potential functions (force fields).


MD simulation of 100,000 atoms for many nanoseconds is now feasible.


Stochastic mechanics (
Langevin

equation)


Most biological processes occur in the range of
microsec

to
millisec
.


Thus to describe such processes, a simpler (coarse
-
grained)
representation of atomic system is essential (e.g. Brownian dynamics).

2














































,
2
2
2
2
2
2
2
2
2
,
,
i
i
i
ne
e
j
i
j
i
j
i
i
i
i
n
i
i
ne
e
n
e
z
U
e
m
H
e
z
z
M
H
E
U
H
H
r
R
r
r
R
R
r
R
r
R


Many
-
body Schroedinger eq. for a molecular system

Here m and M
i

are the mass of the electrons and nuclei,

r


and R
i

denote the electronic and nuclear coordinates,

and



and

i

denote the respective gradients.

nuclear hamiltonian

electronic hamiltonian

elect
-
nucl. interact.

(1)

3


















r
R
R
r
R
,
,
i
e
i
n
i

























r
R
R
r
R
R
,
,
i
e
i
n
i
e
i
n
ne
e
n
E
U
H
H



Separation of the electronic wave function

Nuclei are much heavier and hence move much slower than electrons.
This allows decoupling of their motion from those of electrons.
Introduce the product wave function:

Substituting this ansatz in the Schroedinger eq. gives

For fixed nuclei, the electronic part gives



















r
R
R
r
R
,
,
i
e
i
e
i
e
ne
e
E
U
H


(2)

4





























r
R
R
r
R
R
R
,
,
i
e
i
n
i
e
i
n
i
e
n
E
E
H


Substitute the electronic part back in the Schroedinger eq.

Eqs. (2, 3) need to be solved simultaneously, which is a formidable

problem for most systems. For two nuclei, there is only one coordinate

for R (the distance), so it is feasible. But for three
-
nuclei, there are

4 coordinates (in general for N nuclei, 3N
-
5 coordinates are required),

which makes numerical solution very difficult.

Born
-
Oppenheimer (adiabatic) approximation consists of neglecting

the cross terms arising from





(which are of order m/M), so that the nuclear part becomes







r
R
,
i
e
n
H














i
n
i
n
i
e
n
E
E
H
R
R
R




(3)

5













i
e
j
i
j
i
j
i
i
i
i
i
i
E
e
z
z
U
N
i
U
dt
d
M
R
R
R
R
R
R








2
2
2
,
,
1

Classical approximation for nuclear motion

Nuclei are heavy so their motion can be described classically, that is,

instead of solving the Schroedinger Eq. (3), we solve the corresponding

Newton’s eq. of motion

At zero temperature, the potential can be minimized with respect to the

nuclear coordinates to find the equilibrium conformation of molecules.

At finite temperature, Eqs. (2) and (4) form the basis of ab initio MD

(ignores quantum effects in nuclear motion and electronic exc. at finite T)

(4)

6

Methods of solution for the electronic equation

Two basic methods of solution:

1.
Hartree
-
Fock (HF) based methods: HF is a mean field theory.


One finds the average, self
-
consistent potential in which electrons move.


Electron correlations are taken into account using various methods.

2. Density functional theory: Solves for the density of electrons.


Better scaling than HF (which is limited to ~10 atoms); 100’s of atoms.


Car
-
Parrinello MD (DFT+MD) has become popular in recent years.

(2)
























r
R
R
r
R
r
R
r
r
,
,
2
,
2
2
2
2
i
e
i
e
i
e
i
i
i
i
E
e
z
e
m






















Electronic part of the Schroedinger eq. (2) has the form

7

e
e
n





)
(
)
(
r
r
r



)]
(
[
'
'
)
'
(
)
(
2
)]
(
[
)
(
)
(
3
3
2
3
2
,
2
r
r
r
r
r
r
r
r
R
r
r
r
R
n
E
r
rd
d
n
n
e
n
T
H
r
d
n
e
z
e
z
U
U
H
E
xc
e
i
i
i
e
i
i
i
e
ne
e
ne
e
e
e

























Density functional theory (DFT)

In DFT one uses the density function of electrons, n(r) instead of
their wavefunction

Expectation value of the electronic energy in the ground state becomes

8

N
i
dt
d
M
N
i
j
ij
i
i
i
,
,
1
,
2
2






F
F
r
Classical mechanics

Molecular dynamics (MD) is the most popular method for simulation
studies of biomolecules. It is based on Newton’s equation of motion.

For N interacting atoms, one needs to solve N coupled Diff. Eq’s:

Force fields are determined from experiments and ab initio methods.

Analytically this is an intractable problem for N>2,

but we can solve it easily on a computer using numerical methods.

Current computers can handle N ~ 10
5
-
6

particles, which is large
enough for description of most biomolecules.

Integration time, however is still a bottleneck (10
6

steps @ 1 fs = 1 ns)

9

i
i
i
i
i
i
i
m
dt
d
m
R
v
F
r




2
2
Stochastic mechanics

In order to deal with the time bottle
-
neck in MD, one has to simplify
the simulation system (coarse graining). This is achieved by
describing parts of the system as continuum with dielectric constants.

Examples:



transport of ions in electrolyte solutions (water


continuum)



protein folding (water


continuum)



function of transmembrane proteins (lipid, water


continuum)

To include the effect of the atoms in the continuum, modify Newton’s
eq. of motion by adding frictional and random forces:

Langevin equation

10

Molecular dynamics (MD) simulations

MD programs consist of over 100,000 lines of coding. To understand

what is going on in an MD simulation, we need to consider several

topics:

1.

Force fields (potential functions among atoms)

2.

Boundaries and computation of long
-
range interactions

3.
Molecular mechanics (T=0, energy minimization)

4.
MD simulations (integration algorithms, ensembles)

5.
Analysis of trajectory data to characterize structure and
function of proteins


11

Empirical force fields (potential functions)

Problem: Reduce the ab initio potential energy surface among the atoms














j
i
k
j
i
k
j
i
j
i
i
U
U
U
)
,
,
(
)
,
(
3
2
R
R
R
R
R
R
To a classical many
-
body interaction in the form









i
e
j
i
j
i
j
i
i
E
e
z
z
U
R
R
R
R





2
Such a program has been tried for water but so far it has failed to

produce a classical potential that works. In strongly interacting systems,

it is difficult to describe the collective effects by summing up the

many
-
body terms.

Practical solution: Truncate the series at the two
-
body level and assume

(hope!) that the effects of higher order terms can be absorbed in U
2

12

Non
-
bonded interactions

Interaction of two atoms at a distance R = |
R
i

-

R
j
| can be decomposed

into 4 pieces

1.
Coulomb potential

2.
Induced polarization

3.
Attractive dispersion (van
der

Waals) (1/R
6
)

4.
Short range repulsion (e

R/a
)

The last two terms are combined into a 6
-
12
Lennard
-
Jones potential



Combination rules for different atoms:























6
12
4
R
R
U



LJ
2
/
)
(
j
i
ij
j
i
ij









13

R
q
q
U
j
i
0
4


Coul
12
-
6 Lennard
-
Jones potential (U is in kT, r in Å)

3
3.5
4
4.5
5
-0.5
0
0.5
1
1.5
2
2.5
r
U(r)
A
kT
3
,
4
1




14

Because the polarization interaction is many
-
body and requires iterations,

it has been neglected in current generations of force fields.

The non
-
bonded interactions are thus represented by the Coulomb and

12
-
6 LJ potentials.

Model

R
O
-
H

(Å)


q
HOH

q
H

(e)



(kT)



(Å)

m

(D)

eps (T=298)


SPC


1.0




109.5

0.410

0.262

3.166

2.27

65
±
5


TIP3P


0.957



104.5

0.417

0.257

3.151

2.35

97
±
7


Exp.












1.86

80


SPC: simple point charge

TIP3P: transferable intermolecular potential with 3 points

Popular water models (rigid)

15

Covalent bonds

In molecules, atoms are bonded together with covalent bonds, which
arise from sharing of electrons in partially occupied orbitals. If the
bonds are very strong, the molecule can be treated as rigid as in
water molecule. In most large molecules, however, the structure is
quite flexible and this must be taken into account for a proper
description of the molecules. This is literally done by replacing the
bonds by springs. The nearest neighbour interactions involving 2, 3
and 4 atoms are described by harmonic and periodic potentials









0
2
0
2
0
cos
1
2
2


q
q
q










ijkl
ijkl
ijkl
ijk
ijk
ijk
ij
ij
r
ij
bond
n
V
k
r
r
k
U
bond stretching


bending



torsion (not very good)

16

Interaction of two H atoms in the ground (1s) state can be described

using Linear Combinations of Atomic Orbitals (LCAO)

)
1
(
)
1
(
2
1
s
c
s
c
B
A







)
1
(
)
1
(
2
1
s
s
B
A






From symmetry, two solutions with lowest and highest energies are:


+ Symmetric

-

Anti
-
symmetric


17

The R dependence of the potential energy is approximately given by
the Morse potential



2
)
(
0
1
R
R
e
e
D
U





Morse
2
0
)
(
2
1
R
R
k
U
bond


where

D
e
: dissociation energy

R
0
: equilibrium bond distance


Controls the width of the potential

Classical representation:

e
D
k
2
/


k

18

An in
-
depth look at the force fields

Three force fields, constructed in the eighties, have come to dominate


the MD simulations

1.
CHARMM (Karplus @ Harvard)


Optimized for proteins, works well also for lipids and nucleic acids

2.
AMBER (Kollman & Case @ UCSF)


Optimized for nucleic acids, otherwise quite similar to CHARMM

3. GROMOS (Berendsen & van Gunsteren @ Groningen)


Optimized for lipids and does not work very well for proteins


(due to smaller partial charges in the carbonyl and amide groups)

The first two use the TIP3P water model and the last one, SPC model.

They all ignore the polarization interaction (polarizable versions are
under construction)

19

Charm parameters for alanine






Partial charge (e)


ATOM N
-
0.47



|

ATOM HN 0.31 HN

N

ATOM CA 0.07



| HB1

ATOM HA 0.09



|


/

GROUP


HA

CA

CB

HB2

ATOM CB
-
0.27



|


\

ATOM HB1 0.09



| HB3

ATOM HB2 0.09



O==C

ATOM HB3 0.09



|

ATOM C 0.51

ATOM O
-
0.51

Total charge: 0.00

20

Bond lengths :





k
r

(kcal/mol/Å
2
)


r
0
(Å)

N

CA

320.


1.430

(1)

CA

C

250.


1.490

(2)

C

N

370.


1.345

(2)

(peptide bond)


O

C

620.


1.230

(2)

(double bond)

N

H

440.


0.997

(2)

HA

CA

330.


1.080

(2)

CB

CA

222.


1.538

(3)

HB

CB

322.


1.111

(3)

1.
NMA (N methyl acetamide) vibrational spectra

2.
Alanine dipeptide ab initio calculations

3.
From alkanes

2
0
)
(
r
r
k
U
r


bond
21

Bond angles :





k
q

(kcal/mol/rad
2
)

q
0
(deg)

C

N CA

50.


120.

(1)

C

N H

34.


123.

(1)

H

N CA

35.


117.

(1)

N

CA C

50.


107.

(2)

N

CA CB

70.


113.5

(2)

N

CA HA

48.


108.

(2)

HA CA CB

35.


111.

(2)

HA CA C

50.


109.5

(2)

CB

CA C

52.


108.

(2)

N

C CA

80.


116.5

(1)

O

C CA

80.


121.

(2)

O

C N

80.


122.5

(1)

2
0
)
(
q
q
q


k
U
angle
Total 360 deg.

Total 360 deg.

22

Basic dihedral configurations

trans





cis

Definition of the dihedral

angle for 4 atoms A
-
B
-
C
-
D



)
cos(
1
2
0





n
V
U
n
dihed
Dihedrals:

23

Dihedral parameters:








V
n

(kcal/mol)


n


0
(deg)


Name


C N CA C

0.2

1

180.







N CA C N

0.6

1 0.








CA C N CA

1.6

1 0.




w


H N CA CB

0.0

1 0.


H N CA HA

0.0

1 0.

C N CA CB

1.8

1 0.

CA C N H

2.5

2

180.

O C N H

2.5

2

180.


O C N CA

2.5

2

180.



)
cos(
1
2
0





n
V
U
n
dihed
24

Boundaries

In macroscopic systems, the effect of boundaries on the

dynamics of biomolecules is minimal. In MD simulations,

however, the system size is much smaller and one has to

worry about the boundary effects.


Using nothing is not realistic for bulk simulations.


Minimum requirement: water beyond the simulation box must be
treated using a continuum representation


Most common solution: periodic boundary conditions.


The simulation box is replicated in all directions just like in a
crystal. The cube and rectangular prism are the obvious choices
for a box shape.

25

Periodic boundary conditions in two dimensions: 8 nearest neighbours

Particles in the box freely move to the next box, which means when
one moves out another appears from the opposite side of the same
box. In 3
-
D, there are 26 nearest neighbours.

26

Treatment of long
-
range interactions

Problem: the number of non
-
bonded interactions grows as N
2

where N is
the number of particles. This is the main computational bottle neck
that limits the system size.

Because the Coulomb potential between two unit charges, has long range
[U=560 kT/r (Å)], use of any cutoff is problematic and should be
avoided.

All the modern MD codes use the Ewald sum method to treat the long
-
range interactions without cutoffs. Here
one replaces the point
charges with Gaussian distributions, which leads to a much faster
convergence of the sum in the reciprocal (Fourier) space.

The remaining part (point particle


Gaussian) falls exponentially in real
space, hence can be computed easily.


27

Molecular mechanics

Molecular mechanics deals with the static features of

biomolecular systems at T=0 K, that is, the energy is given

solely by the potential energy of the system.

Two important applications of molecular mechanics are:

1.
Energy minimization (geometry optimization):


Find the coordinates for the configuration that minimizes the
potential energy.

2.
Normal mode analysis:


Find the vibrational excitations of the system built on the absolute
minimum using the harmonic approximation.

28

Molecular dynamics

In MD simulations, one follows the trajectories of N particles

according to Newton’s equation of motion:



where U(
r
1
,…,
r
N
) is the potential energy function

consisting of bonded and non
-
bonded interactions.

We need to consider:


Integration algorithms, e.g., Verlet, Leap
-
frog


Initial conditions and choice of the time step


MD simulations at constant temperature and pressure


Constrained dynamics for rigid molecules (SHAKE)

)
,
,
(
,
2
2
N
i
i
i
i
i
i
U
dt
d
m
r
r
F
F
r




29

2
2
)
(
)
(
)
(
)
(
t
m
t
t
t
t
t
t
i
i
i
i
i







F
v
r
r
2
2
)
(
)
(
)
(
)
(
)
(
)
(
)
(
t
m
t
t
t
t
t
t
t
m
t
t
t
t
i
i
i
i
i
i
i
i
i












F
v
r
r
F
v
v
Integration algorithms

Given the position and velocities of N particles at time t, integration
of Newton’s equation of motion yields at t+

t

In the popular Verlet algorithm, one eliminates velocities using the
positions at t
-

t,

Adding eq. (*) yields:

2
)
(
)
(
)
(
2
)
(
t
m
t
t
t
t
t
t
i
i
i
i
i








F
r
r
r
(*)

30

t
t
t
t
t
t
t
m
t
t
t
t
t
t
t
i
i
i
i
i
i
i
i
















)
2
/
(
)
(
)
(
)
(
)
2
/
(
)
(
)
(
2
v
r
r
F
v
r
r
t
t
t
t
t
t
t
m
t
t
t
t
t
i
i
i
i
i
i
i














)
2
/
(
)
(
)
(
)
(
)
2
/
(
)
2
/
(
v
r
r
F
v
v
In the Leap
-
frog algorithm, the positions and velocities are
calculated at different times separated by

t/2

To show its equivalence to the Verlet algorithm, consider

Subtracting the two equations yields the Verlet result.

If required, velocity

at t is obtained from:

)]
2
/
(
)
2
/
(
[
2
1
)
(
t
t
t
t
t
i
i
i






v
v
v
31

To iterate these equations, we need to specify the initial conditions.


The initial configuration of biomolecules can be taken from the
Protein Data Bank (PDB) (if available).


In addition, membrane proteins need to be embedded in a lipid
bilayer.


All the MD codes have facilities to hydrate a biomolecule, i.e., fill
the void in the simulation box with water molecules at the correct
density.


Ions can be added at random positions.

After energy minimization, these coordinates provide the positions at
t=0.

Initial velocities are sampled from a Maxwell
-
Boltzmann distribution:



kT
v
m
kT
m
v
P
ix
i
i
ix
2
exp
2
)
(
2
2
/
1









32

MD simulations at constant temperature and pressure

MD simulations can be performed in the NVE ensemble, where all 3
quantities are constant. Due to truncation errors, keeping the energy
constant in long simulations can be problematic. To avoid this
problem, the alternative NVT and NPT ensembles are employed. The
temperature of the system can be obtained from the average K. E.

Thus an obvious way to keep the temperature constant at T
0

is to
scale the velocities as:

)
(
),
(
)
(
0
t
T
T
t
v
t
v
i
i




NkT
K
2
3

Because K. E. has considerable fluctuations, this is a rather crude
method.

33

A better method which achieves the same result more smoothly is
the Berendsen method, where the atoms are weakly coupled an
external heat bath with the desired temperature T
0

If T(t) > T
0

, the coupling term is negative, which invokes a viscous
force slowing the velocity, and vice
-
versa for T(t) < T
0


Similarly in the NPT ensemble, the pressure can be kept constant
by simply scaling the volume. Again a better method (Langevin
piston), is to weakly couple the pressure difference to atoms using a
force as in above, which will maintain the pressure at the desired
value (~1 atm).

dt
d
t
T
T
m
dt
d
m
i
i
i
i
i
i
r
F
r









1
)
(
0
2
2

34

Analysis of trajectory data

1. Fundamental quantities


Total energy, temperature, pressure, volume (or density)

2. Structural quantities


Root Mean Square Deviation (RMSD)


Distribution functions (e.g., pair and radial)


Conformational analysis (e.g., Ramachandran plots, rotamers)

3. Dynamical quantities


Time correlation functions & transport coefficients


Free energy calculations using perturbation, umbrella sampling and
steered MD methods

35

Fundamental quantities


Total energy
: strictly conserved but due to numerical errors it may
drift.


Temperature
: would be constant in a macroscopic system
(fluctuations are proportional to 1/

N, hence negligible). But in a
small system,
they will fluctuate around a base line. Temperature
coupling ensures that the base line does not drift from the set
temperature.


Pressure
: similar to temperature, though fluctuations are much
larger compared to the set value of 1 atm.


Volume

(or density): fixed in the NVT ensemble but varies in NPT.
Therefore, it needs to be monitored during the initial equilibration
to make sure that the system has converged to the correct volume.
Relatively quick for globular proteins but may take a long time (~1
ns) for systems involving lipids (membrane proteins).

36

Root Mean Square Deviation

For an N
-
atom system, variance and RMSD at time t are defined as

Where
r
i
(0) are the reference coordinates. Usually they are taken
from the first frame in an MD simulation, though they can also be taken
from the PDB or a target structure. Because side chains in proteins are
different, it is common to use a restricted set of atoms such as
backbone or C

. RMSD is very useful in monitoring the approach to
equilibrium (typically, signalled by the appearance of a broad shoulder).

It is a good practice to keep monitoring RMSD during production runs

to ensure that the system stays near equilibrium.










RMSD
,
)
0
(
)
(
1
)
(
1
2
N
i
i
i
t
N
t
r
r
37

Evolution of the RMSD for the backbone atoms of ubiquitin

38



r
r
r
N
r
g
water
ion




2
4
)
(
)
(
Distribution functions

Pair distribution function g
ij
(r) gives the probability of finding a pair
of atoms (i, j) at a distance r. It is obtained by sampling the distance
r
ij

in MD simulations and placing each distance in an appropriate bin,
[r, r+

r]. Pair distribution functions are used in characterizing
correlations between pair of atoms, e.g., hydrogen bonds, cation
-
carbonyl oxygen. The peak gives the average distance and the width,
the strength of the interaction.

When the distribution is isotropic (e.g., ion
-
water in an electrolyte),
one samples all the atoms in the spherical shell, [r, r+

r]. Thus to
obtain the radial distribution function (RDF), the sampled number
needs to be normalized by the volume ~4

r
2

r (as well as density)

39

Conformational analysis

In proteins, the bond lengths and bond angles are more or less fixed.

Thus we are mainly interested in conformations of the torsional
angles. As the shape of a protein is determined by the backbone
atoms, the torsional angles,

(N−C

) and



(C−C

), are of particular
interest. These are conveniently analyzed using the Ramachandran
plots. Conformational changes in a protein during MD simulations can
be most easily revealed by plotting these torsional angles as a
function of time.

Also of interest are the torsional angles of the side chains,
c
1
, c
2
,
etc.

Each side chain has several energy minima (called rotamers), which
are separated by low
-
energy barriers (~ kT). Thus transitions
between different rotamers is readily achievable, and they could play
functional roles, especially for charged side chains.

40

Time correlation functions & transport coefficients

At room temperature, all the atoms in the simulation system are in a
constant motion characterized by their average kinetic energy:
(3/2)kT.
Free atoms or molecules diffuse according to the equation

Dt
t
2
6
)
(

r
dt
t
D




0
)
(
)
0
(
3
1



v
v
While this relationship can be used to determine the diffusion
coefficient, more robust results can be obtained using correlation
functions, e.g., D is related to the velocity autocorrelation function as

Similarly, conductance of charged particles is given by the Kubo formula

i
i
i
q
dt
t
VkT







v
J
J
J






,
)
(
)
0
(
3
1
0
(current)

41

2
)
0
(
)
(
)
0
(
)
(









t
t
C
Bound atoms or groups of atoms fluctuate around a mean value.

Most of the high
-
frequency fluctuations (e.g. H atoms), do not directly

contribute to the protein function. Nevertheless they serve as
“lubricant” that enables large scale motions in proteins (e.g. domain
motions) that do play a significant role in their function. As mentioned
earlier, large scale motions occur in a ~ ms to s time scale, hence are
beyond the present MD simulations. (Normal mode analysis provides an
alternative.) But torsional fluctuations occur in the ns time domain, and
can be studied in MD simulations using time correlation functions, e.g.

These typically decay exponentially and such fluctuations can be

described using the Langevin equation.

42

Free energy calculations

Free energy is the most important quantity that characterizes a
dynamical process. Calculation of the absolute free energies is difficult
in MD simulations. However free energy differences can be estimated
more easily and several methods have been developed for this purpose.
The starting point for most approaches is Zwanzig’s perturbation
formula for the free energy difference between two states A and B:






If the two states are not similar enough, there is a large hysteresis
effect and the forward and backward results are not equal.

)
(
)
(
/
)]
(
exp[
ln
)
(
/
)]
(
exp[
ln
)
(
A
B
G
B
A
G
kT
H
H
kT
A
B
G
kT
H
H
kT
B
A
G
B
B
A
A
A
B


















43

Alchemical transformation and free energy perturbation

To obtain accurate results with the perturbation formula, the energy
difference between the states should be < 2 kT, which is not satisfied
for most biomolecular processes. To deal with this problem, one
introduces a hybrid Hamiltonian



and performs the transformation from A to B gradually by changing
the parameter

from 0 to 1 in small steps. That is, one divides [0,1]
into n subintervals with {

i
,
i = 0
,
n}, and for each

i

value, calculates
the free energy difference from the ensemble average


B
A
H
H
H






)
1
(
)
(
i
kT
H
H
kT
G
i
i
i
i





/
))]
(
)
(
(
exp[
ln
)
(
1
1








44









1
0
1
)
(
n
i
i
i
G
G


The total free energy change is then obtained by summing the
contributions from each subinterval

The number of subintervals is chosen such that the free energy
change at each step is < 2 kT, otherwise the method may lose its
validity.






1
0
)
(



d
H
G
Thermodynamic integration

: Another way to obtain the free energy

difference is to integrate the derivative of the hybrid Hamiltonian:

This integral is evaluated most efficiently using a Gaussian quadrature.

45

Absolute free energies from umbrella sampling

Here one samples the densities along a reaction coordinate and
determines the potential of mean force (PMF) from the Boltzmann eq.














)
(
)
(
ln
)
(
)
(
)
(
)
(
0
0
/
)]
(
)
(
[
0
0
z
z
kT
z
W
z
W
e
z
z
kT
z
W
z
W




Here z
0

is a reference point, e.g. a point in bulk where W vanishes.

In general, a particle cannot be adequately sampled at high
-
energy

points. To counter that, one introduces harmonic potentials, which
restrain the particle at desired points, and then unbias its effect. For
convenience, one introduces umbrella potential at regular intervals along
the reaction coordinate (e.g. ~0.5
Å
). The PMF’s obtained in each
interval are unbiased and optimally combined using the Weighted
Histogram Analysis Method (WHAM).

46

Steered MD (SMD) simulations and Jarzynski’s equation

Steered MD is a more recent method where a harmonic force is
applied to an atom on a peptide and the reference point of this force
is pulled with a constant velocity. It has been used to study unfolding
of proteins and binding of ligands. The discovery of
Jarzynski’s
equation in 1997 enabled determination of PMF from SMD, which has
boosted its applications.

)]
(
[
,
.
0
/
/
t
k
W
e
e
f
i
kT
W
kT
F
v
r
r
F
ds
F









Jarzynski’s equation:


Work done by

the harmonic force

This method seems to work well in simple systems and when

F is large


but beware of its applications in complex systems!

47

A critical look at simulation methods

Molecular dynamics is the “standard model” of biomolecules.


A higher level (quantum) description is simply not feasible for any
biomolecular system.


A lower level description sacrifices the atomic detail and cannot
be expected to succeed without guidance from MD.

Nevertheless, MD simulations alone cannot provide a complete

description of biomolecules, and hence we need to appeal to both

higher and lower level theories to make progress.


Quantum description (in a mixed QM/MM scheme) is required for

1.
Development and testing of force fields

2.
Enzyme reactions that involve making or breaking of bonds

3.
Transport of light particles (e.g., protons and electrons)

48


Stochastic description and coarse
-
graining are required for any
process that is beyond the current capabilities of MD simulations,
that is, more than 10
5
-
6

atoms and longer than microseconds.

1.
Protein
-
protein interactions, protein aggregation (water in
continuum, protein may be rigid, coarse
-
grained, semi
-
flexible or
flexible)

2.
Aggregation of membrane proteins (water and lipid in continuum,
protein as above)

3.
Ion channels and transporters: modelling of ion permeation (water
and lipid in continuum, protein may be rigid or semi
-
flexible)

4.
Protein folding (water in continuum, protein coarse
-
grained)

5.
Self assembly of lipid bilayers and micelles (water and lipid
molecules coarse
-
grained)

6.
DNA condensation (water in continuum, DNA coarse
-
grained)

49

The guiding principle:
Occam’s razor

Use the simplest method that yields the answer, provided:


The method is justified (domain of validity)


The parameters employed can be derived from a more fundamental
theory.

Most of the people who use molecular dynamics or other simulation

methods treat them as black boxes and apply them to biomolecules

without worrying too much whether they are valid or relevant.

You also need to watch out for experts who publish only the good

results!

A cautionary tale
: Continuum description of ion channels using the

Poisson
-
Nernst
-
Planck (drift
-
diffusion) equations.

50

Here,

: potential, r: charge density,
J
: flux,
n
: concentration of ions

PNP equations are solved simultaneously, yielding potential and flux

Assumption
: Ions are represented by an average charge density,
z
v
en
v


which invokes a mean
-
field approximation.

This is alright in a bulk electrolyte, but in ion channels with radius

smaller than the Debye length (8 Å for 0.15 M), one should worry about

the validity of PNP equations.

Poisson
-
Nernst
-
Planck (PNP) Equations



































kT
en
z
n
D
J
en
z
ext
0
Poisson equation



Nernst
-
Planck equation

51

+

+

+

+

+

+

+



= 2



= 80

Induced charges

at the water
-
protein interface

Image force on an ion

In application of PNP to Ca channels, diffusion coeff. is reduced by 10
-
4
!

The problem
: Ions induce like
-
charges at a water
-
protein interface, which

repel them. When ions are represented as a continuous charge density,

this image force (or dielectric self
-
energy) is severely underestimated.

water


protein

52

A simple test: PNP vs Brownian dynamics (BD)

Control study:

Set artificially
ε

= 80

in the protein. No induced charges on the


boundary, hence no discrepancy between the two methods

r = 4
Å

Na
+

Cl
-

53

Realistic case:

In the realistic case (
ε

= 2

in the protein), ions do not enter the
channel in BD due to the dielectric self
-
energy barrier.

Only in large pores (r > 10 Å), validity of PNP is restored.

r = 4
Å

PNP

BD

54