CS612 - Algorithms in Bioinformatics

Spring 2012 { Class 22

April 17,2012

From a Rigid Ligand to a Flexible Ligand

Torsional (Dihedral) Degrees of Freedom (DOF)

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Robotics-inspired Approach to Protein Flexibility

Similarity between proteins and robots:exploration of

complex high-dimensional space

Similarity exploited to sample conformations with spatial

constraints

Articulated manipulator

Protein Extended Backbone

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Robotics-inspired Approach to Protein Flexibility

Exploration of protein conformational space has parallels in

robotics

0/1 collisions for robots versus energy eld for proteins

adapted from J.-C.

Latombe,Stanford

adapted from P.Smith,

KSU

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Robotics-inspired Approach to Protein Flexibility

Dimensionality of conguration space

DOFs (rigid-body transformations and DOFs of the ligand)

Too many DOFs mean that the conguration space of the

ligand is high-dimensional and dicult to search

Similar issue when planning motions for an articulated robotic

chain in a cluttered environment

Geometric complexity of the free space

Dicult to determine whether a ligand conformation and

specic position and orientation result in a good t

Similar issue for an articulated robot

Address:Plan motions in the conguration space but compute in

workspace (protein surface or cavity)!

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Probabilistic Roadmap Motion Planning (PRM)

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Probabilistic Roadmap Motion Planning (PRM)

Congurations are sampled by picking coordinates at random

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Probabilistic Roadmap Motion Planning (PRM)

Congurations are sampled by picking coordinates at random

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Probabilistic Roadmap Motion Planning (PRM)

Sampled congurations are tested for collision (in workspace!)

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Probabilistic Roadmap Motion Planning (PRM)

The collision-free congurations are retained as\milestoned"

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Probabilistic Roadmap Motion Planning (PRM)

Each milestone is linked by straight paths to its k-nearest neighbors

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Probabilistic Roadmap Motion Planning (PRM)

Each milestone is linked by straight paths to its k-nearest neighbors

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Probabilistic Roadmap Motion Planning (PRM)

The collision-free links are retained to form the PRM

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Application of PRM to Protein-Ligand Docking

Protein is assumed to be

rigid

A xed coordinate system P

is attached to the protein

Ligand is a small exible

molecule

A moving coordinate system

L is dened using three

bonded atoms in the ligand

A conformation of the ligand

is dened by the position

and orientation of L relative

to P and the torsional angles

of the ligand

A.P.Singh,J.C.Latombe,and D.L.Brutlag.A Motion Planning Approach to Flexible Ligand Binding.Proc.7th

ISMB,pp.252-261,1999

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Roadmap Construction:Node Generation

The nodes of the roadmap are

generated by sampling

conformations of the ligand

uniformly at random in the

parameter space (around the

protein)

The energy of each sampled

conformation is E = E

interaction

(electrostatic) + E

internal

(vdw)

A sampled conformation is

retained with probability:

p =

8

>

<

>

:

0 if E > E

max

E

max

E

E

max

E

min

if E

min

E E

max

1 if E < E

min

Results in denser distribution of

nodes in low-energy regions of

conformational space

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Roadmap Construction:Edge Generation

Each node is connected to

its closest neighbors by

straight edges

Each edge is discretized so

that between q

i

and q

i +1

no

atom moves by more than

some"= 1

A.

Results in denser distribution of

nodes in low-energy regions of

conformational space

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Querying the Roadmap

For a given goal node q

g

(e.g.,binding conformation),

the Dijkstras single-source

shortest-path algorithm

computes the lowest-weight

paths from q

g

to each node

(in either direction) in

O(N log N) time,where N

= number of nodes

Various quantities can then

be easily computed in O(N)

time,e.g.,average weights

of all paths entering qg and

of all paths leaving q

g

(binding and dissociation

rates K

on

and K

o

)

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Computing Binding Conformations

Sample many (several

1000s) ligands

conformations at random

around protein

Repeat several times:

Select lowest-energy

conformations that are close

to protein surface

Re-sample around them

Retain k (approx.10)

lowest-energy conformations

whose centers of mass are at

least 5

A apart

Active site

?

lactate dehydrogenase

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Testing on Three Complexes

PDB ID:1ldm Receptor:Lactate Dehydrogenase (2386

atoms,309 residues) Ligand:Oxamate (6 atoms,7 dofs)

PDB ID:4ts1 Receptor:Mutant of tyrosyl-transfer-RNA

synthetase (2423 atoms,319 residues) Ligand:L-

leucyl-hydroxylamine (13 atoms,9 dofs)

PDB ID:1stp Receptor:Streptavidin (901 atoms,121

residues) Ligand:Biotin (16 atoms,11 dofs)

Nurit Haspel

CS612 - Algorithms in Bioinformatics

From Flexible Ligand to Flexible Receptor?

Nurit Haspel

CS612 - Algorithms in Bioinformatics

From Flexible Ligand to Flexible Receptor?

Protein receptors are exible and in water probably look like this!

Nurit Haspel

CS612 - Algorithms in Bioinformatics

From Flexible Ligand to Flexible Receptor?

Target receptor is big has many DOFs (in the thousands) Need to

somehow nd and focus only on relevant motions.

Nurit Haspel

CS612 - Algorithms in Bioinformatics

From Flexible Ligand to Flexible Receptor?

The dimensionality of the protein

conformational space is much larger than

that of a small ligand

PRM-based methods that sample

thousands of conformations to get a good

view of the ligand conformational space

are not sucient

Challenge:from 7-10 DOFs to thousands

of DOFs

Goal:Model protein exibility to capture

relevant conformations of the exible receptor

Nurit Haspel

CS612 - Algorithms in Bioinformatics

From Flexible Ligand to Flexible Receptor?

Flexibility limited to one or few amino

acids on or near the binding site of the

receptor

Soft docking:docking ligand

conformations to a single average receptor

conformation

Ensemble docking:docking of ligand

conformations to individual protein

conformations

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Limited Receptor Flexibility

Selection of specic degrees

of freedom such as on

designated amino acids on

binding site

Shown here:

Acetylcholinesterase:

Phe330 exible { acts as

swinging gate

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Limited Receptor Flexibility

Moving larger number of amino acids (illustration on

acetylcholinesterase)

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Critical Assessment of Protein Interactions (CAPRI)

Protein-protein docking competition

The equivalent of CASP for protein-protein docking

Community-wide experiment that started in 2001

Interesting review of docking methods:S.Vajda & C.J.

Camacho.Proteinprotein docking:is the glass half-full or

half-empty?Trends in Biotechnology,22(3):110-116,2004.

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Finding Folding Pathways Using RPM

Degrees of freedom { number of rotatable backbone dihedral

angles (approx.2N,number of amino acids)

Nodes generated in a similar manner as the docking scheme

above.

Sampling cannot be done at random due to high

dimensionality { sampling is done from a set of distributions

around the native state.

Edges connect neighboring nodes in a similar manner to the

one described above.

Can be used to discover folding pathways,intermediate

structures and other folding events.

G.Song,N.Amato,RECOMB 2001

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Loops Using Inverse Kinematics

Goal:Model the ensemble of conformations of a protein.

It is known that proteins are not rigid but uctuate about an

ensemble of structures under equilibrium conditions.

Focus mostly on loop regions,as they are the most exible

ones.

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Loops Using Inverse Kinematics

Inverse kinematics:Manipulate the degrees of freedom of an

articulated chain to satisfy some end-constraints.

In this case - manipulate the rotational degrees of freedom of

a loop region to nd possible loop conformations that attach

to the rest of the protein.

Cyclic Coordinate Descent (CCD):solve for and rotate one

dihedral at a time.

Canutescu A.A.,and Dunbrack R.L.Protein Science 12,2003

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Loops Using Inverse Kinematics

Cyclic Coordinate Descent:

solve for and rotate one

dihedral at a time

Given:atom at current

position M,target position F

Goal:Solve for dihedral

s.t.jF Mj2 = S() <"

threshold

Time complexity:Linear

time on the nr.DOFs to

solve for all dihedrals of a

chain

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Loops Using Inverse Kinematics

Cyclic Coordinate Descent:

solve for and rotate one

dihedral at a time

Given:atom at current

position M,target position F

Goal:Solve for dihedral

s.t.jF Mj2 = S() <"

threshold

Time complexity:Linear

time on the nr.DOFs to

solve for all dihedrals of a

chain

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Loops Using Inverse Kinematics

Cyclic Coordinate Descent:

solve for and rotate one

dihedral at a time

Given:atom at current

position M,target position F

Goal:Solve for dihedral

s.t.jF Mj2 = S() <"

threshold

Time complexity:Linear

time on the nr.DOFs to

solve for all dihedrals of a

chain

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Loops Using Inverse Kinematics

Cyclic Coordinate Descent:

solve for and rotate one

dihedral at a time

Given:atom at current

position M,target position F

Goal:Solve for dihedral

s.t.jF Mj2 = S() <"

threshold

Time complexity:Linear

time on the nr.DOFs to

solve for all dihedrals of a

chain

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Modeling Loops Using Inverse Kinematics

Since there is redundancy,many solutions are feasible.

Find rotations to satisfy spatial constraints on atoms Combine

with energy minimization to obtain physical structures

Example:Chymotrypsin inhibitor 2

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Equilibrium Fluctuations

More DOFs than spatial constraints can be exploited to generate

fragment uctuations

Example:Chymotrypsin inhibitor 2

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Equilibrium Fluctuations

Sample equilibrium uctuations:

Spatially constrained through Cyclic Coordinate Descent

Energetically constrained to be feasible

Local Fluctuations in

-Lactalbumin

Boltzmann ensemble average

RMSD

x

=

X

Confs

RMSD(C;C

native)

e

E

c

Q

E

c

= E

c

E

native

Q =

X

Confs

e

E

c

Nurit Haspel

CS612 - Algorithms in Bioinformatics

Equilibrium Fluctuations

-Lactalbumin (-Lac)

123 residues

Hydrogen exchange

protection factors available

Ubiquitin

76 residues NMR

information on uctuations

available

Nurit Haspel

CS612 - Algorithms in Bioinformatics

## Comments 0

Log in to post a comment