Protein structure and folding

baconossifiedMechanics

Oct 29, 2013 (3 years and 11 months ago)

77 views

H
-
bonds


D
-
H … A


Assignment: if D…A < 3.7 Å in crystal
structure. (normally 2.7
-
3.1 Å).


Energy of stablization:
-
12~
-
40 kJ/mol)


Tends to be linear.


Only weakly stabilize proteins. (!?)

A survey over H
-
bonds in globular
proteins
(J. Mol. Biol.

(1992) 226, 1143)

Local H
-
bonds?


The authors made
this conclusion:
“Most H
-
bonds are
local.”


Should be more
critically reviewed.


Most H
-
bonds are
between beckbone
atoms.




Source: K. Schulten Group

University of Illinois Urbana
-
Champaign

Protein folding, dynamics and
structural evolution

Chapter 9

Questions


How does a peptide sequence find its
native, functional conformation?


Is there a set of fundamental principles?


We discussed several factors determining
protein structure. Can we *predict* the
structure, from sequence information yet?

Determinants of Protein Folding


We will discuss the following factors, as listed in V&V
chapter 9.


Space Packing.


Directed mainly by internal residues


Protein structures are hierarchically organized.


Protein structures are highly adaptable.


Secondary structure can be context dependent.


Changing the fold of a protein.


Still, keep in mind that most of them are based from
observations and deductions.


Exceptions are possible.


Whether the statistical methods/criteria are acceptable is
another question.

Compactness



Proteins are like
liquids and glasses,
instead of crystalline
solids.


The reverse
statement is, does
compactness serve
as a factor
determining protein
structure?



Compactness helps, but not
enough.



Folding is directed mainly by
internal residues


Mutations that change surface residues
are accepted more frequently and are
less likely to affect protein conformations
than are changes of internal residues.


This is consistent with the idea of
Hydrophobic force
-
driven folding.

Determinants of Protein Folding


Space Packing.


Directed mainly by internal residues


Protein structures are hierarchically
organized.


Protein structures are highly adaptable.


Secondary structure can be context
dependent.


Changing the fold of a protein.

Determinants of Protein Folding


Space Packing.


Directed mainly by internal residues


Protein structures are hierarchically
organized.


Protein structures are highly adaptable.


Secondary structure can be context
dependent.


Changing the fold of a protein.

Protein structures are quite
“resistant” to mutations


A large number of single residue
mutations do not yield a very different
structure.


A complete study was done in phage T4
lysozyme by B. W. Matthews.


Homologous proteins comes with some
sequence identity and they are often
structurally similar.


Determinants of Protein Folding


Space Packing.


Directed mainly by internal residues


Protein structures are hierarchically
organized.


Protein structures are highly adaptable.


Secondary structure can be context
dependent.


Changing the fold of a protein.

Voet
Biochemistry

3e

© 2004 John Wiley & Sons, Inc.

Table 9
-
1

Propensities and Classifications of Amino Acid
Residues for
a

Helical and
b

Sheet Conformations.

Page 300

Secondary structure


Secondary structure prediction can be
done with more sophisticated algorithms.


Artificial intelligence such as neuronal
networks or support vector machines.


Basically look at a local sequence and
recongnize its pattern.


Usually such methods need a training set.
I.e. knowledge
-
based methods.

Voet
Biochemistry

3e

© 2004 John Wiley & Sons, Inc.

Figure 9
-
6

NMR structure of protein GB1.

Page 280


Green: residues 23
-
33.


Cyan: residues 42
-
53.


Chm
-
alpha: a new
sequence replaces green
part.


Chm
-
beta: the same new
sequence replaced cyan
part.


Both are structurally
similar to native GB1.


The same sequence can be
either an alpha helix or a
beta sheet structure,
depending on their
context.

Determinants of Protein Folding


Space Packing.


Directed mainly by internal residues


Protein structures are hierarchically
organized.


Protein structures are highly adaptable.


Secondary structure can be context
dependent.


Changing the fold of a protein.

Homology


Proteins that share some sequence
identity may be structurally similar.


One evidence that support evolution.


Proteins with as little as 20% sequence
identity may have similar structure.


How much should be changed for a
protein to assume a different structure?

Voet
Biochemistry

3e

© 2004 John Wiley & Sons, Inc.

Figure 9
-
7

X
-
Ray structure of Rop protein, a homodimer
of
aa

motifs that associate to form a 4
-
helix bundle

Page 281


GB1 and Rop are
structurally different.


50% of the residues of
GB1 is changed, yielding
a new polypeptide that
assumes Rop
-
like
structure.


This new peptide has
41% sequence identity
with native Rop.


The idea of Protein
design and engineering.

Protein Folding


Levinthal’s paradox


If for each residue there are only two degrees of
freedom (

,

).


Assume each can have only 3 stable values.


This leads to 3
2n
possible conformations.


If a protein can explore 10
13

conformation per
second. (10 per picosecond).


Still requires an astronomical amount of time to fold
a protein.


This is impossible. So protein must fold in a
way that does not randomly explore each
possible conformations.

Molten Globule


Much of the secondary structure that is present
in a native proteins forms within a few
milliseconds.


This is called hydrophobic collapse.


Something called “Molten Globule”.


Slightly (5
-
15% in radius) larger than native
conformation.


Significant amount of secondary structure formed.


Side chains are still not ordered/packed.


Structure fluctuation is much larger. Not very
thermodynamically stable.

Are proteins sticky tapes?


Are they simply
hetereopolymers that
like to form H
-
bond,
hydrophobic interactions
with each other?


Proteins are not any random
hetero
-
polymers


By observation:


every protein has a very stable native structure,


while polymers are usually random in their
conformation.


Interesting observation for simple models: the

designability

. In the following materials are from:


R. Helling et al.,

The designability of protein structures

,
J. Mol.
Graphics and Modelling
,
19
, 157, (2001).


J. Miller et al.,

Emergence of highly designable protein
-
backbone
conformations in an off
-
lattice model


Proteins
,
47
, 506 (2002).


Steven S. Plotkin and Jose N. Onuchic,

Understanding protein folding
with energy landscape theory Part I : Basic concepts


Quart. Rev. of
Biophys.,

35
, 2 (2002), 111.

A 3D lattice HP model


Enumerate all 2
27

possible
sequences.


Each sequence has a lowest
energy structure.


Some sequences share the
same structure. Count the
number of sequence per unique
structure
N
S
.


Plot the distribution of
N
S
.


Assuming only two kinds
of residule H and P.


Well
-
studied before.

E
HH
=

2.3, E
HP
=

1, E
PP
=0


(a) Histogram of
N
S

for the 3
×

3
×

3 system. (b)
Average energy gap between the ground state
and the first excited state versus
N
S

for the 3
×

3
×

3 system.

N
S
: Number of seq. corresponding to a structure S

3794 different
sequences
share ONE
structure!

On the average,
these structure
have large
energy gaps
between the
lowest energy
structure and
the next lowest
one.

Some structures are very

popular


for many different sequences

N
S
: Number of seq. corresponding to a structure S

3794 different
sequences
share ONE
structure!

On the average,
these structure
have large
energy gaps
between the
lowest energy
structure and
the next lowest
one.

Such a property does not depend
on the model used


Very similar behavior are seen in 2D 6
×
6
HP model and in 2D or 3D models with 20
different amino acids.


Off the lattice: 23mer 3 state model

Zinc finger of 1PSV

Off
-
lattice model: results


a:
Backbone configuration of the 11th most designable
23
-
mer structure


b:
Backbone configuration of the zinc finger 1NC8,
truncated to 23 amino acids.


What does it mean?

Sequence

Structure

The energy landscape


Proteins are in a special subset of
heteropolymers


Such that the number of possible structures are greatly
reduced.


Evolution!


Therefore protein structure prediction is not as hard as it
appears. (still a hard problem though..)


That also explains why knowledge
-
based methods works.


Nevertheless, the tools developed offers valuable clues
for the structure of a new protein.

Computer Simulation


Goals:

1)
Structure prediction: From primary sequences to tertiary
structures (so that we can infer its function)

2)
Known structure (from X
-
Ray or NMR or another simulation).
Want:

dynamics (how it moves at room temperature, with a
ligand, or with a mutation).


(1) above is difficult but do
-
able. We will discuss
about some of the methods.


(2) is often done with the same methods developed
for (1).

Computers


Deals with numbers and logical
operations.


Needs some

principles


(written in
mathematical equations).


For protein simulations there are different
approaches:

1.
Physics
-
based

2.
Knowledge
-
based


Physics


Laws for particles moving and interaction


Classical Mechanics (Newton

s Equation of
motion)



Quantum Mechanics (Schr
ö
dinger

s Equation)





Many developments in physical chemistry
can be used.

Physics
-
Based protein simulation


All quantum mechanics (QM) calculation is
not feasible.


QM can be applied to a small set of atoms.


Modeling of an active site (other atoms: not
treated or treated as dielectric continuum)


Can get total energies (binding vs. non
-
binding, pK
a

etc.), wave function (charge
distribution).


QM/MM simulations (other atoms: treated with
M
olecular
M
echanics)

An example of using QM
(Case et al.,
J. Biol. Inorg. Chem.

2002,

7,

632)


Rieske iron
-
sulfur protein
in
bc
-
type cytochromes


Calculations based on density
functional theory (DFT) performed.


pK
a

and redox potentials can be
obtained from total energies of
several states.


Change of pK
a

(proton
-
binding)
and redox potential (electron
-
binding) are strongly coupled, as
observed in experiments.

Using classical mechanics for
protein structure and dynamics


Ignore electrons,
assigning (empirical)
force fields for atoms
(or clusters of atoms).


A very simple potential:

Force fields: bond stretching

and bending

A. R. Leach, “Molecular Modelling”, 1996

Torsional potential

A. R. Leach, “Molecular Modelling”, 1996

Ab initio

QM results

3 point charges

N
2

molecules:

Known to have an

Electric quadrupole

moment

5 point charges

Polarization: many
-
body effect

Physics based: methods


Energy Minimization


Steepest descent


Conjugated gradient


Monte Carlo Simulation


Random sampling


Stimulated annealing


Molecular Dynamics


Compute conformational
changes.

Energy surface of two torsional
angles


Very shallow valleys.


Similar in energy.


Determining the (
ψ
,
φ
)
conformations of peptide
backbones is even more
complicated.

Trapping at a local minimum


Standard practice: use Monte Carlo (random sampling)
with stimulated annealing techniques.

Using classical mechanics for
protein structure and dynamics


With a Force field (
V
(
r
N
)

), for lowest energy structure


Find the structure that gives energy minimum. Hopefully
this is done within finite amount of computer resources.
And hopefully this energy minimum gives the desired
native protein structure.


For protein dynamics: calculate trajectories (Newton

s
eq.) at thermal condition and find the averaged physical
quantities.

Questions to ask:


Is the energy function correct?


Precise enough to discriminate other non
-
native structure.


Yet simple enough for computers to carry out
efficiently.


Is the conformational search good enough
to cover the global minimum?

Take
-
home messages:

Physics
-
based methods


Protein folding without any prior
knowledge about protein structure is a
difficult task.


Protein structure prediction is often quoted
as an

N
-
P complete problem

, i.e. the
complexity of the problem grows
exponentially as the number of residues
increases.


Structures of small proteins (~10
1

-

10
2

a.a.) can be solved in principle.