Molecular Modeling Methods & Ab Initio Protein Structure Prediction

rawsulkyInternet και Εφαρμογές Web

10 Δεκ 2013 (πριν από 3 χρόνια και 10 μήνες)

77 εμφανίσεις

1

Molecular Modeling Methods &

Ab Initio

Protein Structure Prediction

By Haiyan Jiang

Oct. 16, 2006

2

About me

2003, Ph.D in Computational Chemistry, University of Science
and Technology of China


Research: New algorithms in molecular structure optimization


2004~2006, Postdoc, Computational Biology, Dalhousie
University


Research: Protein loop structure and the evolution of protein
domain

3

Publications

Haiyan

Jiang,

Christian

Blouin,

Ab

Initio

Construction

of

All
-
atom

Loop

Conformations,

Journal

of

Molecular

Modeling,

2006
,

12
,

221
-
228
.

Ferhan

Siddiqi,

Jennifer

R
.

Bourque,

Haiyan

Jiang,

Marieke

Gardner,

Martin

St
.

Maurice,

Christian

Blouin,

and

Stephen

L
.

Bearne,

Perturbing

the

Hydrophobic

Pocket

of

Mandelate

Racemase

to

Probe

Phenyl

Motion

During

Catalysis,

Biochemistry,

2005
,

44
,

9013
-
9021
.

(Responsible

for

building

the

simulation

model

and

performing

molecular

dynamics

study)

Yuhong

Xiang,

Haiyan

Jiang,

Wensheng

Cai,

and

Xueguang

Shao,

An

Efficient

Method

Based

on

Lattice

Construction

and

the

Genetic

Algorithm

for

Optimization

of

Large

Lennard
-
Jones

Clusters,

Journal

of

Physical

Chemistry

A,

2004
,

108
,

3586
-
3592
.

Xueguang

Shao,

Haiyan

Jiang,

Wensheng

Cai,

Parallel

Random

Tunneling

Algorithm

for

Structural

Optimization

of

Lennard
-
Jones

Clusters

up

to

N=
330
,

Journal

of

Chemical

Information

and

Computer

Sciences,

2004
,

44
,

193
-
199
.



4

Publications

Haiyan

Jiang,

Wensheng

Cai,

Xueguang

Shao
.
,

New

Lowest

Energy

Sequence

of

Marks’

Decahedral

Lennard
-
Jones

Clusters

Containing

up

to

10000

atoms,

Journal

of

Physical

Chemistry

A,

2003
,

107
,

4238
-
4243
.

Wensheng

Cai,

Haiyan

Jiang,

Xueguang

Shao
.
,

Global

Optimization

of

Lennard
-
Jones

Clusters

by

a

Parallel

Fast

Annealing

Evolutionary

Algorithm,

Journal

of

Chemical

Information

and

Computer

Sciences,

2002
,

42
,

1099
-
1103
.

Haiyan

Jiang,

Wensheng

Cai,

Xueguang

Shao
.
,

A

Random

Tunneling

Algorithm

for

Structural

Optimization

Problem,

Physical

Chemistry

and

Chemical

Physics,

2002
,

4
,

4782
-
4788
.

Xueguang

Shao,

Haiyan

Jiang,

Wensheng

Cai
.
,

Advances

in

Biomolecular

Computing,

Progress

in

Chemistry

(chinese)

2002
,

14
,

37
-
46
.

Haiyan

Jiang,

Longjiu

Cheng,

Wensheng

Cai,

Xueguang

Shao
.
,

The

Geometry

Optimization

of

Argon

Atom

Clusters

Using

a

Parallel

Genetic

Algorithm,

Computers

and

Applied

Chemistry

(chinese),

2002
,

19
,

9
-
12
.


5

Unpublished work

Haiyan

Jiang,

Christian

Blouin,

The

Emergence

of

Protein

Novel

Fold

and

Insertions
:

A

Large

Scale

Structure
-
based

Phylogenetic

Study

of

Insertions

in

SCOP

Families,

Protein

Science,

2006
.

(under

review)


6

Contents

Molecular modeling methods and applications in
ab initio

protein
structure prediction

Potential energy function

Energy Minimization

Monte Carlo

Molecular Dynamics

Ab initio protein loop modeling

Challenge

Recent progress

CLOOP


7

Molecular Modeling Methods

Molecular modeling methods

are the theoretical methods and
computational techniques used to simulate the behavior of
molecules and molecular systems

Molecular Forcefields

Conformational Search methods

Energy Minimization

Molecular Dynamics

Monte Carlo simulation

Genetic Algorithm

8

Ab Initio

Protein Structure Prediction

Ab initio

protein structure prediction

methods build protein 3D
structures from sequence based on physical principles.

Importance

The
ab initio

methods are important even though they are
computationally demanding

Ab initio

methods predict protein structure based on physical models,
they are indispensable complementary methods to Knowledge
-
based
approach


eg.



Knowledge
-
based approach would fail in following conditions:


Structure homologues are not available


Possible undiscovered new fold exists

9

Applications of MM in
Ab Initio

PSP


Basic idea


Anfinsen’s
theory
:
Protein native structure corresponds to the
state with the lowest free energy of the protein
-
solvent system.

General procedures

Potential function



Evaluate the energy of protein conformation


Select native structure

Conformational search algorithm



To produce new conformations


Search the potential energy surface and locate the global minimum
(native conformation)

10

Protein Folding Funnel

Local mimina

Global minimum

Native Structure

11

Potential Functions for PSP

Potential function

Physical based energy function


Empirical
all
-
atom

forcefields:
CHARMM
,
AMBER
,
ECEPP
-
3
,
GROMOS
,
OPLS


Parameterization: Quantum mechanical calculations, experimental
data


Simplified potential:
UNRES

(
united residue
)

Solvation energy


Implicit solvation model:
Generalized Born

(GB) model,
surface
area based model


Explicit solvation model:
TIP3P

(computationally expensive)

12

General Form of All
-
atom Forcefields











































pairs


,
tic
electrosta
pairs


,

der Waals
van
6
12
Hbonds
10
12
dihedrals
angles
2
0
bonds
2
0
total
cos
1
j
i
ij
j
i
j
i
ij
ij
ij
ij
ij
ij
ij
ij
b
r
q
q
r
B
r
A
r
D
r
C
n
K
K
r
r
K
V







Electrostatic
term

H
-
bonding term

Van der Waals term

Bond stretching
term

Dihedral term

Angle bending
term

r

Φ

Θ





O

H

r

r

r

The most
time
demanding
part.

13

Search Potential Energy Surface

We are interested in minimum points on Potential Energy Surface (PES)

Conformational search techniques

Energy Minimization

Monte Carlo

Molecular Dynamics

Others: Genetic Algorithm,
Simulated Annealing


14

Energy Minimization

Energy minimization

Methods

First
-
order minimization:
Steepest descent
,
Conjugate gradient
minimization

Second derivative methods:
Newton
-
Raphson method

Quasi
-
Newton methods:
L
-
BFGS

Local miminum

15

Monte Carlo

Monte Carlo


In molecular simulations, ‘Monte Carlo’ is an importance
sampling technique.

1. Make random move and produce a new conformation

2. Calculate the energy change

E

for the new conformation

3. Accept or reject the move based on the
Metropolis criterion



exp( )
E
P
kT

 
Boltzmann factor

If

E
<0, P>1, accept new conformation;

Otherwise: P>rand(0,1), accept, else reject.


16

Monte Carlo

Monte Carlo (MC) algorithm

Generate initial structure
R

and calculate
E(R)
;

Modify structure
R

to
R’

and calculate
E(R’)
;

Calculate

E

=
E(R’)



E(R)
;

IF

E
<0
, then
R



R’
;

ELSE


Generate random number
RAND = rand(0,1)
;


IF
exp(



E
/
KT
) > RAND
, then
R



R’
;


ENDIF

ENDIF

Repeat for
N

steps;

Monte Carlo Minimization (MCM) algorithm

Parallel Replica Exchange Monte Carlo algorithm

17

Molecular Dynamics

Molecular Dynamics (MD)


MD simulates the Movements of all the particles in a molecular system by
iteratively solving Newton’s equations of motion.
















MC view many frozen butterflies in a museum; MD watch the butterfly fly.

18

Molecular Dynamics

Algorithm

For atom
i,
Newton’s equation of motion is given by





Here,
r
i

and
m
i

represent the position and mass of atom
i

and
F
i
(
t
)
is the force on atom
i

at time
t
.

F
i
(
t
) can also be expressed as the
gradient of the potential energy




V

is potential energy. Newton

s equation of motion can then relate
the derivative of the potential energy to the changes in position as a
function of time.





2
2
d
d
i
i i
t
t m
t

r
F
i i i
F ma

i i
V
 
F


2
2
d
d
i
i i
t
V m
t
 
r
(1)

(2)

(4)

(3)

19

Molecular Dynamics

Algorithm (continue)

To obtain the movement trajectory of atom, numerous numerical algorithms
have been developed for
integrating the equations of motion
. (Verlet algorithm,
Leap
-
frog algorithm)


Verlet algorithm



The algorithm uses the positions and accelerations at time
t
, and the positions
from the previous step to calculate the new positions





Selection of time step



Time step is approximately one order of magnitude smaller than the fastest
motion


Hydrogen vibration ~ 10 fs (10
-
15
s), time step = 1fs

2
( ) 2 ( ) ( ) ( )
t t t t t t t
  
    
r r r a
t

( )
t t


r
20

Molecular Dynamics

MD Software

CHARMM

(Chemistry at HARvard Molecular Mechanics) is a program for
macromolecular simulations, including energy minimization, molecular
dynamics and Monte Carlo simulations.

NAMD
is a parallel molecular dynamics code designed for high
-
performance
simulation of large biomolecular systems.


http://www.ks.uiuc.edu/Research/namd/

Application in PSP

Advantage: Deterministic; Provide details of the folding process

Limitation: The protein folding reactions take place at m
s

level, which is at
the limit of accessible simulation times


It is still difficult to simulate a whole process of a protein folding using the
conventional MD method.

21

Time Scales of Protein Motions and MD

MD Time Scale

10
-
15

10
-
6

10
-
9

10
-
12

10
-
3

10
0

(
s)

(
fs)

(
ps)

(
μ
s)

(
ns)

(
ms)

Bond stretching

Elastic vibrations of proteins

α
-
Helix folding

β
-
Hairpin

folding

Protein folding

22

MD is fun!


A small protein
folding movie:
simulated with
NAMD/VMD

23

Other Conformational Search Algorithms

Global optimization algorithms



Optimization” refers to trying to find the global energy minimum
of a potential surface.

Genetic Algorithm (GA)

Simulated Annealing (SA)

Tabu Search (TS)

Ant Colony Optimization (ACO)

A model system: Lennard Jones clusters

24

Applications of MM methods in PSP

Application in PSP

Combination of several conformational search techniques

Recent developments


Simplified force field: united residue force field


Segment assembly


Secondary structure prediction are quite reliable, so conformation can be
produced by assemble the segments

Ab initio

PSP software

Rosetta
is a five
-
stage fragment insertion Metropolis Monte Carlo method

ASTRO
-
FOLD

is a combination of the deterministic

BB

global optimization
algorithm, and a Molecular Dynamics approach in torsion angle space

LINUS
uses a Metropolis Monte Carlo algorithm and a simplified physics
-
based force field

25

ASTRO
-
FOLD

26

References

Hardin C,
et. al.

Ab initio protein structure prediction.
Curr Opin
Struct Biol.

2002, 12(2):176
-
81.

Floudas CA,
et. al.

Advances in protein structure prediction and de
novo protein design: A review.
Chemical Engineering Science
, 2006,
61: 966
-
988.

Klepeis JL, Floudas CA, ASTRO
-
FOLD: a combinatorial and global
optimization framework for ab initio prediction of three dimensinal
structures of proteins from the amino acid sequence,
Biophysical
Journal
, 2003, 85: 2119
-
2146.

27

Ab Initio

Protein Loop Prediction

Protein loop


Protein loops are polypeptides


connecting more rigid structural


elements of proteins like helices and strands.



Challenge in Loop Structure Prediction

Loop is important to protein folding and protein function even
though their size is small, usually <20 residues

Loops exhibit greater structural variability than helices and strands

Loop prediction is often a limiting factor on fold recognition methods

28

Ab Initio

Protein Loop Prediction

Ab initio methods have recently received increased
attention in the prediction of protein loop


Potential energy function


Molecular mechanics force field is usually better than statistical
potential in protein loop modeling.


Recent progress

Dihedral angle sampling

Clustering

Select representative structures from ensembles

29

Ab Initio

Loop Prediction Methods

Loopy

Random tweak

Colony energy

Fiser’s method


MM methods:


Physical energy function


Energy Minimization + MD + SA

Forrest & woolf


Predict membrane protein loop

MM methods: MC + MD

Review:


Floudas C.A. et al, Advances in protein structure prediction and de novo protein
design: A review,
Chemical Engineering Science
, 2006, vol. 61, 966
-
988.

30

CLOOP:

Ab Initio

Loop Modeling Method

CLOOP build all
-
atom ensemble of protein loop conformations (it
is not a real protein loop prediction method)

Paper



Haiyan Jiang, Christian Blouin,
Ab Initio

Construction of All
-
atom Loop
Conformations,
Journal of Molecular Modeling
, 2006, 12, 221
-
228.

CLOOP methods


Energy function: CHARMM

Dihedral sampling

Potential smoothing technique

The designed minimization (DM) strategy

Divided loop conformation construction

31

The Energy Function of CHARMM Forcefield

CHARMM

elec
vdw
imp
dihe
angle
UB
bonds
CHARMm
E
E
E
E
E
E
E
E










bonds
b
bonds
b
b
k
E
2
0
)
(



UB
UB
UB
S
S
k
E
2
0
)
(



angle
angle
k
E
2
0
)
(







dihe
dihe
n
k
E
))
(cos(
1
(






imp
imp
imp
k
E
2
0
)
(





























nonbond
ij
ij
ij
ij
ij
vdw
r
R
r
R
E
6
min,
12
min,
2



nonbond
ij
j
i
elec
r
q
q
E
0
4

32

CLOOP

Dihedral sampling

Loop main
-
chain
dihedral


and


are generated by sampling main
-
chain dihedral angles from a restrained

/


set


The restrained dihedral range has 11 pair of

/


dihedral sub
-
ranges. It was obtained by adding 100 degree variation on each
state of
the 11

/


set developed by Mault and James

for loop
modeling.


Side chain conformations are built randomly.




33

CLOOP

Potential smoothing technique



A soft core potential provided in CHARMM software package
was applied to smooth non
-
bonded interactions







soft
r
is the switching distance for the soft core potential


is the distance of the two interacting atoms

CHARMM
nonbond nonbond
E E

soft
r r

)
(
CHARMM
nonbonded soft nonbonded
E k r r E
  
soft
r r

r
34

CLOOP


The designed minimization (DM) strategy

Minimization methods:


steepest descent, conjugate gradient, and adopted basis
Newton
-
Raphson minimization method

Two stages:


1. Minimize the internal energy terms of loop conformations including
bond, angle, dihedral, and improper


2. The candidates were further minimized with the full CHARMM
energy function including the van der Waal and electrostatic energy
terms.

35

CLOOP

Divided loop conformation construction










Generate position of middle residue

Build initial conformation of main chain with dihedral sampling

Build side chain conformation

Run DM and produce closed loop conformation

36

CLOOP


Performance of CLOOP





CLOOP

was

applied

to

construct

the

conformations

of

4
,

8
,

and

12

residue

long

loops

in

Fiser’s

loop

test

set
.

The

average

main
-
chain

root

mean

square

deviations

(RMSD)

obtained

in

1000

trials

for

the

10

different

loops

of

each

size

are

0
.
33
,

1
.
27

and

2
.
77

Å
,

respectively
.





The

performance

of

CLOOP

was

investigated

in

two

ways
.

One

is

to

calculate

loop

energy

with

a

buffer

region
,

and

the

other

is

loop

only
.

The

buffer

region

included

a

region

extending

up

to

10

Å

around

the

loop

atoms
.

In

energy

minimization,

only

the

loop

atoms

were

allowed

to

move

and

all

non
-
loop

atoms

include

those

in

the

buffer

region

were

fixed
.





37

Loop Conformations built by CLOOP

a. 1gpr_123
-
126
b.

135l_84
-
91
c.

1pmy_77
-
88


38

Performance of CLOOP

39

Conclusion


CLOOP can be applied to build a good all
-
atom conformation
ensemble of loops with size up to 12 residues.

Good efficiency, CLOOP is faster than RAPPER

The contribution of the protein to which a loop is attached (i.e.
the ‘buffer region’ ) facilitates the discrimination of near
-
optimal loop structures.

The soft core potentials and a DM strategy are effective
techniques in building loop conformations.


40

Thanks!