CS612 - Algorithms in Bioinformatics

lambblueearthBiotechnology

Sep 29, 2013 (3 years and 8 months ago)

87 views

CS612 - Algorithms in Bioinformatics
Spring 2012 { Class 22
April 17,2012
From a Rigid Ligand to a Flexible Ligand
Torsional (Dihedral) Degrees of Freedom (DOF)
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Robotics-inspired Approach to Protein Flexibility
Similarity between proteins and robots:exploration of
complex high-dimensional space
Similarity exploited to sample conformations with spatial
constraints
Articulated manipulator
Protein Extended Backbone
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Robotics-inspired Approach to Protein Flexibility
Exploration of protein conformational space has parallels in
robotics
0/1 collisions for robots versus energy eld for proteins
adapted from J.-C.
Latombe,Stanford
adapted from P.Smith,
KSU
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Robotics-inspired Approach to Protein Flexibility
Dimensionality of conguration space
DOFs (rigid-body transformations and DOFs of the ligand)
Too many DOFs mean that the conguration space of the
ligand is high-dimensional and dicult to search
Similar issue when planning motions for an articulated robotic
chain in a cluttered environment
Geometric complexity of the free space
Dicult to determine whether a ligand conformation and
specic position and orientation result in a good t
Similar issue for an articulated robot
Address:Plan motions in the conguration space but compute in
workspace (protein surface or cavity)!
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Probabilistic Roadmap Motion Planning (PRM)
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Probabilistic Roadmap Motion Planning (PRM)
Congurations are sampled by picking coordinates at random
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Probabilistic Roadmap Motion Planning (PRM)
Congurations are sampled by picking coordinates at random
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Probabilistic Roadmap Motion Planning (PRM)
Sampled congurations are tested for collision (in workspace!)
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Probabilistic Roadmap Motion Planning (PRM)
The collision-free congurations are retained as\milestoned"
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Probabilistic Roadmap Motion Planning (PRM)
Each milestone is linked by straight paths to its k-nearest neighbors
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Probabilistic Roadmap Motion Planning (PRM)
Each milestone is linked by straight paths to its k-nearest neighbors
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Probabilistic Roadmap Motion Planning (PRM)
The collision-free links are retained to form the PRM
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Application of PRM to Protein-Ligand Docking
Protein is assumed to be
rigid
A xed coordinate system P
is attached to the protein
Ligand is a small exible
molecule
A moving coordinate system
L is dened using three
bonded atoms in the ligand
A conformation of the ligand
is dened by the position
and orientation of L relative
to P and the torsional angles
of the ligand
A.P.Singh,J.C.Latombe,and D.L.Brutlag.A Motion Planning Approach to Flexible Ligand Binding.Proc.7th
ISMB,pp.252-261,1999
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Roadmap Construction:Node Generation
The nodes of the roadmap are
generated by sampling
conformations of the ligand
uniformly at random in the
parameter space (around the
protein)
The energy of each sampled
conformation is E = E
interaction
(electrostatic) + E
internal
(vdw)
A sampled conformation is
retained with probability:
p =
8
>
<
>
:
0 if E > E
max
E
max
E
E
max
E
min
if E
min
 E  E
max
1 if E < E
min
Results in denser distribution of
nodes in low-energy regions of
conformational space
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Roadmap Construction:Edge Generation
Each node is connected to
its closest neighbors by
straight edges
Each edge is discretized so
that between q
i
and q
i +1
no
atom moves by more than
some"= 1

A.
Results in denser distribution of
nodes in low-energy regions of
conformational space
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Querying the Roadmap
For a given goal node q
g
(e.g.,binding conformation),
the Dijkstras single-source
shortest-path algorithm
computes the lowest-weight
paths from q
g
to each node
(in either direction) in
O(N log N) time,where N
= number of nodes
Various quantities can then
be easily computed in O(N)
time,e.g.,average weights
of all paths entering qg and
of all paths leaving q
g
(binding and dissociation
rates K
on
and K
o
)
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Computing Binding Conformations
Sample many (several
1000s) ligands
conformations at random
around protein
Repeat several times:
Select lowest-energy
conformations that are close
to protein surface
Re-sample around them
Retain k (approx.10)
lowest-energy conformations
whose centers of mass are at
least 5

A apart
Active site
?
lactate dehydrogenase
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Testing on Three Complexes
PDB ID:1ldm Receptor:Lactate Dehydrogenase (2386
atoms,309 residues) Ligand:Oxamate (6 atoms,7 dofs)
PDB ID:4ts1 Receptor:Mutant of tyrosyl-transfer-RNA
synthetase (2423 atoms,319 residues) Ligand:L-
leucyl-hydroxylamine (13 atoms,9 dofs)
PDB ID:1stp Receptor:Streptavidin (901 atoms,121
residues) Ligand:Biotin (16 atoms,11 dofs)
Nurit Haspel
CS612 - Algorithms in Bioinformatics
From Flexible Ligand to Flexible Receptor?
Nurit Haspel
CS612 - Algorithms in Bioinformatics
From Flexible Ligand to Flexible Receptor?
Protein receptors are exible and in water probably look like this!
Nurit Haspel
CS612 - Algorithms in Bioinformatics
From Flexible Ligand to Flexible Receptor?
Target receptor is big has many DOFs (in the thousands) Need to
somehow nd and focus only on relevant motions.
Nurit Haspel
CS612 - Algorithms in Bioinformatics
From Flexible Ligand to Flexible Receptor?
The dimensionality of the protein
conformational space is much larger than
that of a small ligand
PRM-based methods that sample
thousands of conformations to get a good
view of the ligand conformational space
are not sucient
Challenge:from 7-10 DOFs to thousands
of DOFs
Goal:Model protein exibility to capture
relevant conformations of the exible receptor
Nurit Haspel
CS612 - Algorithms in Bioinformatics
From Flexible Ligand to Flexible Receptor?
Flexibility limited to one or few amino
acids on or near the binding site of the
receptor
Soft docking:docking ligand
conformations to a single average receptor
conformation
Ensemble docking:docking of ligand
conformations to individual protein
conformations
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Limited Receptor Flexibility
Selection of specic degrees
of freedom such as on
designated amino acids on
binding site
Shown here:
Acetylcholinesterase:
Phe330 exible { acts as
swinging gate
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Limited Receptor Flexibility
Moving larger number of amino acids (illustration on
acetylcholinesterase)
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Critical Assessment of Protein Interactions (CAPRI)
Protein-protein docking competition
The equivalent of CASP for protein-protein docking
Community-wide experiment that started in 2001
Interesting review of docking methods:S.Vajda & C.J.
Camacho.Proteinprotein docking:is the glass half-full or
half-empty?Trends in Biotechnology,22(3):110-116,2004.
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Finding Folding Pathways Using RPM
Degrees of freedom { number of rotatable backbone dihedral
angles (approx.2N,number of amino acids)
Nodes generated in a similar manner as the docking scheme
above.
Sampling cannot be done at random due to high
dimensionality { sampling is done from a set of distributions
around the native state.
Edges connect neighboring nodes in a similar manner to the
one described above.
Can be used to discover folding pathways,intermediate
structures and other folding events.
G.Song,N.Amato,RECOMB 2001
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Loops Using Inverse Kinematics
Goal:Model the ensemble of conformations of a protein.
It is known that proteins are not rigid but uctuate about an
ensemble of structures under equilibrium conditions.
Focus mostly on loop regions,as they are the most exible
ones.
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Loops Using Inverse Kinematics
Inverse kinematics:Manipulate the degrees of freedom of an
articulated chain to satisfy some end-constraints.
In this case - manipulate the rotational degrees of freedom of
a loop region to nd possible loop conformations that attach
to the rest of the protein.
Cyclic Coordinate Descent (CCD):solve for and rotate one
dihedral at a time.
Canutescu A.A.,and Dunbrack R.L.Protein Science 12,2003
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Loops Using Inverse Kinematics
Cyclic Coordinate Descent:
solve for and rotate one
dihedral at a time
Given:atom at current
position M,target position F
Goal:Solve for dihedral 
s.t.jF Mj2 = S() <"
threshold
Time complexity:Linear
time on the nr.DOFs to
solve for all dihedrals of a
chain
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Loops Using Inverse Kinematics
Cyclic Coordinate Descent:
solve for and rotate one
dihedral at a time
Given:atom at current
position M,target position F
Goal:Solve for dihedral 
s.t.jF Mj2 = S() <"
threshold
Time complexity:Linear
time on the nr.DOFs to
solve for all dihedrals of a
chain
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Loops Using Inverse Kinematics
Cyclic Coordinate Descent:
solve for and rotate one
dihedral at a time
Given:atom at current
position M,target position F
Goal:Solve for dihedral 
s.t.jF Mj2 = S() <"
threshold
Time complexity:Linear
time on the nr.DOFs to
solve for all dihedrals of a
chain
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Loops Using Inverse Kinematics
Cyclic Coordinate Descent:
solve for and rotate one
dihedral at a time
Given:atom at current
position M,target position F
Goal:Solve for dihedral 
s.t.jF Mj2 = S() <"
threshold
Time complexity:Linear
time on the nr.DOFs to
solve for all dihedrals of a
chain
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Modeling Loops Using Inverse Kinematics
Since there is redundancy,many solutions are feasible.
Find rotations to satisfy spatial constraints on atoms Combine
with energy minimization to obtain physical structures
Example:Chymotrypsin inhibitor 2
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Equilibrium Fluctuations
More DOFs than spatial constraints can be exploited to generate
fragment uctuations
Example:Chymotrypsin inhibitor 2
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Equilibrium Fluctuations
Sample equilibrium uctuations:
Spatially constrained through Cyclic Coordinate Descent
Energetically constrained to be feasible
Local Fluctuations in
-Lactalbumin
Boltzmann ensemble average
RMSD
x
=
X
Confs
RMSD(C;C
native)
e
E
c
Q
E
c
= E
c
E
native
Q =
X
Confs
e
E
c
Nurit Haspel
CS612 - Algorithms in Bioinformatics
Equilibrium Fluctuations
-Lactalbumin (-Lac)
123 residues
Hydrogen exchange
protection factors available
Ubiquitin
76 residues NMR
information on uctuations
available
Nurit Haspel
CS612 - Algorithms in Bioinformatics