Works Cited - Computational Science Laboratory

richessewoozyBiotechnology

Oct 1, 2013 (3 years and 10 months ago)

126 views

Drug Docking

Seth Holladay

Project Write
-
U
p

CS 618



Introduction

Drug docking is an
in silico

method for narrowing down
which

drug
s

or other small molecules (ligand
s
) possibly fit
a given protein (the receptor).

It
can narrow down
which drugs
hypothetical
ly fit
, or
in other words,
more
closely
predict

a
solution, so that much less
(potentially harmful)
in
vivo

testing

is necessary
. However, as is general
ly
the case in bioinformatics, drug docking

generally will not find the exact solution

and can be

very computationally
expensive
, especially with test sets larger than a few molecules
.

“The only way to overcome such limitations is
to incorporate as much biological knowledge as possible.”
(Periwal, 2006)


In ord
er to compute docks in a reasonable amount of time, I implemented the DOCK algorithm. It represents
possible docking sites in a receptor with low
-
density spheres

.


Background

A lot of research has addressed
drug docking solutions

(Lilien, et al.)

in the last c
ouple decades. As described in
(Schneidman
-
Duhovny, et al., 2004)
, the general procedure is to take a target protein and match ligands from a
ligand database to it. First, one must compute a representation for the protein and ligand surfaces, next extract
features
on each
that can narrow down the search space, then fin
d a best fit for the li
gand and receptor based on
those

features, and finally compute a score of how well the ligand matches.


The search for matching feat
ures can either be done by brute force

(sped up with Fourier transforms)
, by
matching local geometric

or chemical features, or else by evolutionary methods.

Scoring functions
compute the
best fits at an atomic level and
fall down into either empirical or knowledge
-
based, which can be applied during or
after matching.

(Schneidman
-
Duhovny, et al., 2004)
.

Bioinformatics
(Orengo, et al., 2003)

discusses how
a
Root Mean Square Deviation function works with the DOCK algorithm.
A major implementation d
rawback falls
when the molecules’

shape
s change
at

interaction.


(Orengo, et al., 2003)

also addres
ses ligand
-
protein matching (C
hapter 13
)
,

clarifying that it can be

solved at an
atomic level
, distinguishing it from
protein
-
protein matching which
generally can
not.

It gives a strong overview of
setting up the grid representation and search algorithms. It also shows a method, the DOCK algorithm, that
finds favorable binding sites on the protein.


2


More recent work
s

give more accurate and alternative methods for drug

docking.
Newer methods include taking
into account
the free
-
energy change from separate

to bound states
(Yang, et al., 2009)

and
using actual protein
sequences in the matching equation
(Bleakley, et
al., 2009)
.

They demonstrate that applying current algorithms
to
large sets of proteins and ligands still yields
less than desirable results.


Some
molecular
-
level

characteristics of interest for docking are the solvent radius of the
molecule and the

van der
Waals radius of each atom
, measured in
angstrom
s

(
Å
)
.
The solvent radius
(typically
1.4Å
)

determines

the
distance
from the molecule

at
which solvent molecules (e.g. ligands) can fit
(Gerber, 1998)
.

A molecule can be
modeled as an implicit surface
, the
3
-
dimension
al area of influence of all the
atoms in the molecule. In other
words, the surface
bounds

the overlapping atoms in the molecule. Connolly’s method
(Connolly, 1983)

represents each atom as a sphere
whose
radius
is the sum of the

atom’s van der Waals radius and the solvent
radius of the molecule

(Fig.1)
.

It
then
makes an isocontour (level
-
set) surface out of the metaball
spheres
.


PyMol Molecule Visualizer

PyMol

is a molecular
viewer graphics program
(DeLano, 2008)

that a
utomate
s

the importation, visualization,
manipulation,
and animation of the
molecular data

(Fig. 2)
. N
ot only
is it
useful for visualizing

molecules such as
proteins and ligands, but
is also extendible through its Python code base.
It reads in molecule data files and can
access information about the
atoms in the
mol
ecules, such as ordering, ID, position, radius, etc.


I spent
sign
ificant
time figuring out accessible attributes (for example, the radius of an atom is ‘vdw’ for “van der
Waals). From there, manipulate
d

the accessed data

in hom
e
-
made Python functions to implement the DOCK
algorithm

(Kuntz, et al., 1982)
. I
had to figure out how to create
new geometry and molecules
and manipulate
their positions and attributes
to represent my
results
. O
ne of the best things about using PyMol

is that it render
s

those results easily to the PyMol viewer
.
Fina
l result
s are ray
-
traceable to see them in high quality.


T
he PyMol viewer
is
helpful

for
visual
debugging
.
For example,
while generating DOCK spheres,
I was able to
verify both n
umerically and visually that my

DOCK spheres were tangent to the surface.
At one point,
I thought it
work
ed numerically but

it was wrong visually. In digging deeper, I found that the camera and world spaces were
not matching up.
I could also visually
debug my errors in lining up the ligands with the docking sites.


Finally
, Py
Mol allowed me to add some interactivity into my final docking program. The user is able to visually
inspect a protein’s surface and indicate probable docking sites with the stroke of a mouse.
Though it is ideal for
the algorithm to

automatically find th
e most likely surface areas for docking,
it can also help if
the user cull
s

out
unwanted information before the algorithm even starts.




3



Protein and Ligand Data

To find

ligand
and protein
models

to import into PyMol
, I gathered data from
on
-
line small molecule repositories.
(Kirchmair, et al.)

points to various sites with ligand repositories and drug docking applications. The Protein
Data Bank
(Rutgers and UCSD, 1971
-
2009)

has a ligand expo
(RCSB PDB)

where one can s
earch for
single
ligands or
for
groups
of
similar ligands.

I
also used the

search site SuperTarget
(Structural Bioinformatics
Group)
. KEGG BRITE and BRENDA
(Bleakley, et al., 2009)
, PSMDB
(Wallach, et al., 2009)
,

and PDB
-
Ligand
(IDRTech, 2004)

are other possible databases.

I

found ligand data at the

“Protein
-

Small
-
Molecule
DataBase

(Lilien, et al.)

whi
ch is built of the paper by
(Wallach, et al., 2009)
.
For development, I used
Phospholipase A2

(PDB 2pws) and its
accompanying ligand, Ibuprofen (PDB IBP)

(Fig. 2)
.




Fig
ure

1,

Mo l e c u l a r Re p r e s e n t a t i o n i n Py Mo l


a ) S p h e r e r e p r e s e n t a t i o n o f a m o l e c u l e ’ s a t o m s


b ) C o n n o l l y ’ s
s u r f a c e


c ) S e l e c t i n g a t o m s i n a m o l e c u l e


d ) S u r f a c e r e p r e s e n t a t i o n o f t h e s e l e c t i o n

4




Drug Docking Process

I implemented the

DOCK algorithm

(Kuntz, et al., 1982)
,
which I will refer to as DOCK
.
My

implementation

follows

the summary of
DOCK
found
in
(Wolfson, 2002)

and
(Fred)

which are

based on

Kuntz’ DOCK algorithm
.
Though I followed their outline,

some of the
implementation
details
are my own work
and experimentation
.


F
irst,
the user imports a protein into PyMol and interactively selects a subset of the protein, or the amino acids,
where there are possible docking sites. Second, given that recepto
r and a set of ligands,
DOCK
create
s

spheres
adjacent to the receptor and ligand surfaces. Those spheres represent a simplified model of possible docking
sites on the receptor and a
simplified

model of the ligand.
Third
,
it
find
s

all possible matches bet
ween the
spheres representing the ligand and the receptor spheres.

Finally,
it

compute
s

the transformation matrix and
overlap error for each of the matches.
The

transformation matrices
are
applied to the
corresponding
ligand.


I used Python
inside the PyMol viewer to compute each of the steps.
A major part of my problem solving dealt
with figuring out how to take advantage of PyMol’s features

to implement the docking and display the results.


The details
of

implementation
are as follows

(Wolfson, 2002)

(Fred)
:


1.

Receptor and Ligand Model
Representation

Set U
p Molecular Surface Representation with Connolly’s Method

DOCK

and PyMol both use

Connolly’s method

to create
/display

the molecule’s implicit

surface.

PyMol also
queries

the solvent radius and the radius of each atom
from

the i
mported molecule.


PyMol does not give direct
, queriable

access to the points (vertices) of a molecule’s surf
ace display.
DOCK
setup
needs those points, so I figured out that PyMol will write out the surface to an OBJ file (3D model
format). I wrote a function to write out the current selection’s surface then parse it back in. Moreover,
DOCK
needs to know which vertices of the surface

correspond to which atom, so I write out and parse in the

Fig
ure

2,


Receptor Protein Phospholipase A2 (light blue)
and Ligand Ibuprofen (yellow)


a) Receptor as spheres, with ligand


b) Receptor as cartoon, with ligand


c) Side view of receptor and ligand

5


surface for each atom
independently
, which gives

me back exactly what I need

(Fig. 3)
.


A

problem I ran into is that when PyMol writes out surface data, it writes out the vertex positions relative
to the
camera origin and orientation, not relative to the world space coordinates system.
To
parse in the surface
data, I
wrote

a
function that grabs the camera information and converts each vertex position back to world
space.



Fig
ure

3,

A c c e s s i n g
A t o m a n d A mi n o A c i d
Da t a S u r f a c e Da t a i n P y Mo l


a ) U s e r s e l e c t s a n a m i n o a c i d


b)

Atoms in the amino acid


c)

Each atom is singled out


d)

Connolly surface of the atom for export to OBJ

6




I implemented an Atom class to keep track of each atom’s position, radius, surface vertices, and so forth.
The bounding box of the entire molecule is divided into a 3D hash

table, or voxel grid, into which I hash each
atom based on its position. The voxels are 0.9 angstroms (
Å
) across in all dimensions
, which is smaller than
the radii of the atoms, so it keeps the number of vertices in each voxel shallow
.


Set Up “Pseudo
-
At
oms” Spheres

to Represent Docking Sites and Ligand

With the cached Connolly surface, we calculate the
SPHGEN model
(Kuntz, et al., 1982)

to represent

molecule’s cavities
. SPHGEN

fills

cavities of the
receptor’s surface

with spheres
.
Each sphere is tangent
to exactly two points on the Connolly surface, so no sphere can exist
outside of a cavity. The
SPHGEN
-
generated

sp
heres are separated into groups based on c
onnectivity (isolated
chains of touching
spheres).

See Fig.

4.



e)

Another atom isolated, and so forth


f)

Connolly surface


g)

Entire amino acid exported


h)

Surface vertices and normals read back in

7


Figure 4, SPHGEN Spheres
,
diagram by

(Fred)


SPHGEN spheres (blue) generated from Connolly surface (red). Up to
one SPHGEN sphere exists per atom (black).





To find SPHGEN spheres, every vertex is matched with every other vertex.
A sphere
tangent to both
vertices
is created

only

if its center lands on the normal of one of the pair’s vertices

(
normals point outside the
surface
)
.
I ended up matching vertices o
nly with other vertices that are within a certain distance
, stepping
through nearby voxels in the grid to hash out only vertices within that range
.
This

keeps it from having to
match every vertex with
every other vertex (
thousands
)

on the surface.

The re
sulting spheres are stored in
a list.

Each Sphere object remembers the vertices that
created it.
I stored as much i
nformation as possible
to get as much speed out of the algorithm as I could.
For example,
a Sphere
keep
s

track of
its
index and the
angle
between its two vertices. Each Vertex points back to the Sphere(s) it is attached to, if any.


Unwanted SPHGEN spheres are then culled out.

If a vertex has multiple Spheres on its normal, only the
smallest Sphere is kept

(to get rid of Spheres interpenetrating the molecular surface)
.
If the angle between
a
sphere’s
two vertices
at the center
is greater than 90
°
,
that sphere is gone

(this gets rid of spheres that are

n
ot in a cavity). In the end, each atom can have only
one corresponding SPHGEN sphere (to keep
computation times during the matching phase acceptable), so each Atom goes through each of its Vertices
and keeps only the Sphere with the largest radius.

We end up with Fig. 5b,e.



8




Figure 5,

S P HGE N S p h e r e s f o r
Re c e p t o r P r o t e i n a n d L i g a n d


a ) C a c h e d s u r f a c e a n d n o r m a l s o f r e c e p t o r s e l e c t i o n


b ) T a n g e n t S P H G E N s p h e r e s t o r e c e p t o r c a v i t i e s


c ) C a c h e d s u r f a c e a n d n o r m a l s o f l i g a n d


d ) T a n g e n t S P H G E N s p h e r e s
inside

l i g a n d s u r f a c e


e ) C l o s e f r o n t
-
v i e w o f
r e c e p t o r S P H G E N


f ) C l o s e f r o n t
-
v i e w o f l i g a n d S P H G E N

9


The

resulting

receptor
spheres
, or “pseudo
-
atoms,”

are grouped

into groups of touching spheres.
The
molecules surface may have small cavities here and there on its surface that are represented by a
small
number of touching Spheres, but groups with long chains of intersecting Spheres

pinpoint “canyons,” or long
cavities running across the molecule
.

I added to the implementation to get rid of Sphere chains
with only
one

sphere, to speed up processing
time without effect on the results
.


The receptor SPHGEN representation is now set up, but the ligand also needs a
SPHGEN

representation
,
since the matching step fits the ligand Spheres into receptor Spheres
that have similar arrangement.

Representing the

ligand with Spheres
generally
follows the same process as the receptor Spheres, except
the normals on the ligand surface
are inverted to
point inside the surface.


(Fred)

mentions there are other Sphere representations of the ligand that may work be
tter than SPHGEN.
Based on the

idea

of creating Spheres from the actual atom positions
,
I

came up with a
representati
on
.

I
take atoms that are
at the ends of the ligand
molecule or that are connected to more than two other atoms
, so
the Spheres represent the major crossroads of the molecule.



2.

Matching


Once the SPHGEN representation
s

are set up,
the next step is to

compute
all
possible
matches between
the
ligand Spheres
and
any
subset
of the receptor Spheres
.

The closest matches, within epsilon, are
docking
area
s, or where

the ligand surface touches
maximal surface area of
the receptor surface.


The
matching algorithm
take
s

any sphere from the receptor set and any sphere from the ligand set, resulting
in a r
eceptor
-
ligand sphere pair. Then, the algorithm goes through all other possible receptor
-
ligand Sphere
pairs to
find
a match to our first Sphere pair. A
nother pair
matc
hes the original if

the receptor spheres in the
two pairs
are
the same distance

apart as the ligand spheres in the two pairs
, within some epsilon
.

This
creates a
set of matching pairs
.
To
add

additional pairs

to the set
,
we continue through all possible
receptor
-
ligand pairs and add each one

that
has m
atching receptor
-
to
-
receptor and ligand
-
to
-
ligand distances
(within some epsilon)
for each pair in the existing set
.

See Figures 6 and 7.


To have a successful receptor
-
ligand match
-
up, n
ot all ligand
Spheres need to have a matching receptor
sphere, but the number of matching pairs has to exceed some threshold for the pair set to be kept. Our
threshold was set to five.






10


Figure 6, Matching ligand spheres to receptor


a)

Receptor and ligand SPHGEN spheres. b) (R
0
,L
0
) = (3,0). c)
Looking at (R,L) pair (1,1), receptor d
istance
3
-
1 equals ligand distance
0
-
1, so (R
1
,L
1
) = (1,1). d) Looking at (R,L) pair (2,2), receptor distance
2
-
1 equals ligand distance 2
-
1
and

receptor
distance 2
-
3 equals ligand
distance 2
-
0, so (R
2
,L
2
) = (2,2).


11




There are as many initial matching pairs as there are combinations of receptor
-
ligand spheres. Since

this is
a permutations problem, it can have an
exponential complexity
. To speed things up,
first
a distance matrix is
created each for the receptor and the ligand. The distance matrix caches the distances from each Sphere in
the molecule to each other Sphere.
To test the distances between the receptor Spheres and the ligand
Spheres for each pair, on
e only need to do a lookup in each respective distance matrix.


(Wolfson, 2002)

talks about using an interpretation tree or a matching graph.
I
decided to recursively

find
the pair sets
.

I take the fi
rst Sphere of the ligand
(
L
0
)

and iteratively match it with each

of the receptor
Spheres

(
R
0
)
. Each of those initial pairs
(
L
0
,

R
0
)
is

passed separately into the recursive function.


The recursive
function
loops through
the list
,

L
,
of remaining
ligand Spheres
, and for each
L
i

it
also
loops
Figure 7,

Mo l e c u l a r Re p r e s e n t a t i o n i n Py Mo l


a ) O r i g i n a l p o s i t i o n o f l i g a n d S P H G E N s p h e r e s


b)
Ligand spheres (red) mapped to receptor subset (blue)


c)
All together


d)
Another view of
the original position

12


through the list
,
R
, of remaining receptor Spheres

to find if the is an
R
i

such that distance(
L
0
,
L
i
) ==
distance(
R
0
,
R
i
)

for all
already
matching pairs (
L
0
,
R
0
)
.
See Fig. 6.
If a match (
L
i
,

R
i
) is found,
then
it is
added to the set of matching pairs,
L
i

and
R
i

are removed from
L

and
R
, and they are passed into the
recursive function. If no new match
is found given some set of matching pairs,
the recursive function is not
called again, and it returns the
matching pair set.


This algorithm can become exponential at worst, but I take advantage of the fact that considering all pairs can
return multiple matching pair sets that are equal

(ordering of the pairs does not matter)
.
This means that
many permutation
s can be ignored since they are simply reorderings of already

found

pair sets.
To achieve
this, even receptor Spheres
,
R
i
,

that
do not find a matching
L
i

are dropped out of
R

so that they will not be
retested

if
the recursive function again.


We have to
take into account some error in the overlap of our ligand representation onto the receptor, since it
is a simplified model. The value recommended for the distance
-
comparison epsilon is somewhere between
1
Å

and 2
Å
, since
(Kuntz, et
al., 1982)

found that to be a reasonable medium between finding potential
matches and not finding too many false matches.


3.

Doc
king and Culling


Each set

of matching receptor
-
ligand
pairs
guarantees that its receptor Spheres and its ligand Spheres
correspond spatially (within some epsilon)
. However, the orientations
between the receptor and ligand
Spheres
are completely arbitrary
, so the ligand spheres need to be rotated to line up with the receptor
spheres. That rotation is then used to
line up t
he actual ligand to the receptor protein.

See Fig. 8.


The method used to line up two sets of points in 3D space is based on
(Arun, et al., 1987)
.
It computes the
least
-
squares rotation fit of the point sets
. It translates
both point sets to the origin, subtracting out the
average of the point positions (center of their bounding bo
x) for each set respectively. Next,

it passes
the
matrix multiplication of
both point arrays
(results in a 3x3 matrix)
into a
“Singular V
alue
D
ec
omposition


(SVD)
function
. SVD
computes the least
-
squares rotation matrix and
the
error for computing the
Root Mean
Square Deviatio
n
.


I then take the resulting rotation matrix and combine it into a transformation matrix that
moves and rotates

a
copy of the ligand to each matching receptor docking site. Th
is

transformation translates the ligand to the
origin, rotates it to match the receptor docking site orientation,
and then

translates it to the center of the
docking site.




13



Results

Data Structure Set Up

S
etting up data structures
for
the protein
stays within reasonable setup times. It averages 0.087 seconds per
atom, approximately constant time per atom
,

so the setup time is

O(N
a
), where N
a

is the number of atoms.

My
test cases read in somewhere between 100 to 1000 atoms, which keeps the setup time under 2 minutes.


Selecting potential docking sites by manually selecting amino acids works well with smaller proteins, where th
ere
might be only one or two major cavities, but on larger
proteins (e.g. 3DMK), this is not so effective without
selecting the entire protein.



Figure 8, Rece
ptor
-
Ligand Sphere Alignment


a) Ligand SPHGEN spheres at original position


b) Possible subset of receptor SPHGEN spheres that will match
ligand SPHGEN spheres.


c) Receptor (purple) and ligand (red) spheres match within some
epsilon


d) Ligand
copied and repositioned to match ligand spheres

14


SPHGEN

With the voxel hash table I set up, the SPHGEN runs
at O(R
2
)

time, where a vertex searches out
R voxels
into the
voxel grid
,
a distance

related to its atom’s radius, where R << grid size
.
Only vertices in th
ose voxels are tested
against.
In my tests, it averages around 0.01

seconds

to compute spheres for each vertex. This is much
quicker than O(N
v
2
),
where N
v

is the number of vertices
which numbers
in the thousands.


Table 1, Some example numbers of generated data


# Atoms

# Vertices

# Receptor Spheres

# Ligand Spheres

# Receptor Sets

# Ligand Sets

Case 1

261

2599

122

13

6

2

Case 2

333

3112

147

11

8

1


The SPHGEN spheres successfully represented the receptor’s cavities, once I was able to compute a proper
transformation

of the vertices to world space (Fig. 9).

I meticulously made sure this step was working.
The
sphere groupings turned out especiall
y useful because the large groups pinpoint large cavity “canyons,” which
turn out to be the best potential docking sites.

This led me to experiment with getting rid of small groups of
SPHGEN Spheres. When I get rid of groups of 1, it appears to have no e
ffect. Getting rid of groups of 2 got rid
of
an insignificant number of
potential docking areas,
but
I
still
kept them and only

deleted
groups of 1.


Figure 9, SPHGEN Sphere Tangency


Tangency of receptor SPHGEN sphere to the receptor surface.


Matching

The recursive algorithm did not have memory problems since it only has a maximum depth of the number of
spheres in the ligand, which for Ibuprofen is between 11 and 13. However, speed
-
wise,
even though the
number of
ligand
spheres seems
reasonabl
y small,
with exponential run time it takes hours per ligand (I never
15


finished timing it
, though it was aiming for 4
-
6 hours with around 250 receptor atoms
).


With the
speedups,

each
initial
pair passed into the recursive algorithm takes an average of
0.085 seconds

in a
test case of 71 receptor spheres and 13 ligand spheres
,
though most pairs
come out of the recursive function very
quickly.
Pairs finding successful docking sites

took an average of 0.37 seconds to traverse deep enough to find
matches.
In a test case of 129 receptor spheres and 13

ligand spheres, each initial pair took an average of
0.35
seconds. The pairs that find a docking site average at 1.15 seconds.
This is with an epsilon of 1.5
Å
.
As t
he
number of receptor spheres grows
, the
compute time grows faster than linearly.

With ~1300 atoms selected in
PDB 3DMK, there were 740 spheres, so average times were up to ~16 seconds per initial pair. It was too long to
run sufficient multiple
-
ligand tests with the time I had.



One of the
le
ss defined
factors is
the epsilon

that defines allowed
superposition error

when
matching up
the
receptor and ligand spheres.
There will not be perfect match
-
ups of the ligand spheres on any of the subsets of
receptor spheres, so there needs to be an epsilon.
Too high of an epsilon
can
result in
too many false matches.
However, if an epsilon is too low,
it misses some matches of ligands that may still be a go
od fit, even at it culls out
incorrect matches.

(Fig. 10).

I found the range of good epsilons to be
approximately
between 1
Å

and 2
Å
, like
(Fred)
’s results did
, where 1
Å

ended up being a bit too low and 2
Å

too high

in my case
.


Figure 10
,


IBP Best Matches


a)

Best match for
Epsilon = 1.4

Original Ligand = green, Matched ligands = blue


b)
Best match for
Epsilon = 1.5

Original Ligand = green, Matched ligands = blue



The matches with lowest error do not always prove to be the

best matches. For example, matches with lower
sphere counts

may have low error but the end

of the ligand where there were no spheres matching may

penetrate
the surface.

(Fig. 11).

I think this could be resolved by giving higher priority to matches with
more pairs, by
testing for penetration, or by following some suggestions I read in
(Fred)

in the optimization segment but did not
16


have time to implement.

(Table 2)


Table 2, Results with differing epsilons

Drug

Root Mean Sq.
Deviation

Minimum RMSD

Average RMSD

Visually closest matching transformation, E
psilon=1.4

IBP

1.254

1.254

1.343

BFL

0.732, 0.654

0.606

0.798

Visually closest matching transformation, E
psilon=1.
5

IBP

1.01

1.01

1.39

BFL

0.654, 0.663

0.606

0.817



Figure 11, Penetrating Docking Ligand


Ligand penetrates because matching spheres weight to one side



Finding an appropriate epsilon
was

easier if I already have a pro
tein
-
drug pair, such as PDB 2PWS

and
Ibuprofen. I can lower it until I see the best matches disappearing. Of course, this is not ideal
when we do not
know any drugs that fit.


Using the actual ligand atoms

as SPHGEN placement spheres worked better than

the ligand SPHGEN Spheres,
as
(Fred)

mentioned might be the case.

My method of pinpointing atoms at branching intersections of the ligand
represented the general shape of the ligand more closely than SPHGEN and took
fewer

spheres in the 2pws test
17


case

(11 instead of 13 for Ibuprofen)
.

See Fig. 12.


Figure 12, SPHGEN vs. Branching Methods for Ligand Spheres


a)

Ligand spheres generated with SPHGEN
.
The original ligand
(blue) and matched (yellow)

b) Ligand spheres generated with branching method
.
The original
ligand (blue) and matched (yellow)


Until the last minute,

there was
a fallacy in my logic of always having the 0
th

ligand sphere as part of the initial
receptor
-
ligand
match
pair, since the 0
th

sphere may not always
included
in the best mat
c
h. However
,
it was
enough to get an idea

of how well it work
s.

I fixed it to loop through every sphere in the ligand set as part of the
initial pair, with the similar

results

as seen in the following image
.

It used the
branching

method to generate
ligand s
pheres and had an epsilon of 1.5
.

This takes longer time to simulate, of course, since
run time

is
multiplied by the number of
ligands spheres.

See Figure 13.


Figure 13, Matching Algorithm Fix




18


I implemented DOCK so that

it could compare multiple ligands, but Trisomy 21 protein PDB 3DMK ran out of
memory with 12 ligands, though it got close to the end.


Future Work

After all my work, I only scratched the surface of the experimental code stage, but here are some ideas of
where
the research might go next, based on others’ ideas as well as my own
.


T
here are other methods of narrowing down potential
cavities for
docking sites before running the DOCK
algorithm, besides manually selecting areas or
depending completely on the l
ong chains of SPHGEN spheres.
Another method

of finding docking sites
uses a flood
-
filled voxel grid to determine where pockets are
(Li, et al.,
2008)
. Dr. Clement came up with the idea of using similar DNA sequences for a pro
tein that appear in a variety
of species.

There may also be some experimentation to see what happens when docking sites are found
without taking into account the surface solvent distance.


For future work, one could also play with how the receptor SPHGEN
spheres are used
. Right now, we match
ligand sphere positions to receptor sphere positions. However, other possibilities would be to align them
completely based on the skeleton of the receptor spheres.

Maybe more than one sphere per atom could be
allowe
d to produce a more faithful skeleton.


As for additional matching ideas, it would be good to play with a more intelligent method to produce an epsilon
value besides hard coding it in. Also, I
wonder if the voxel grid could be useful during matching, simi
lar to how it
helps narrow down possible docking sites.
More work could be done

culling out more SPHGEN spheres, such
as those that are too far away from the major canyons.


I was not able to finish up the final
optimization
step
,

where
possible matches a
re culled out to the most
appropriate one. It is well
-
documented in
(Fred)
.


Memory management must be addressed more, since it is necessary for reasonable implementation speed.


Conclusion

The Dock algorithm is a strong base for docking ligands to proteins. It is comparatively fast, though the tradeoff
of memory for speed
leaves something to be desired on the memory end.
However, with the base

implementation,
I feel there are many possibil
ities for progress in the realm of drug docking.



19


Works Cited


Arun K. S., Huang T. S. and Blostein S. D.

Least
-
Squares Fitting of Two 3
-
D Point Sets [Journal]

// IEEE Transactions on
Pattern Analysis and Machine Intelligence.

-

[s.l.]

: IEEE, Sep 1987.

-

5

: Vol. 9.

-

pp. 698
-
700.

Bleakley Kevin and Yamanishi Yoshihiro

Supervised prediction of drug
-
target in
teractions using bipartite local models
[Journal].

-

[s.l.]

: Bioinformatics, 2009.

-

18

: Vol. 25.

Connolly M. L.

Solvent
-
Accessible Surfaces of Proteins and Nucleic
-
Acids" [Journal].

-

[s.l.]

: Science, 1983.

-

Vol. 221.

-

pp.
709
-
713.

DeLano W. L.

The P
yMOL Molecular Viewer [Computer Program].

-

San Carlos, CA, USA

: DeLano Scientific, 2008.

-

http://pymol.sourceforge.net/.

Fred

The Docking Algorithm [Online].

-

http://fred.bioinf.uni
-
sb.de:9180/AH/Members/Dock/script/part4.pdf/download.

Gerber Paul

Molo
c Home Page
-
> Molecular Surfaces [Online]

// UCSF Macromolecular Structure Group.

-

University of
California, San Francisco, Jan 1998.

-

http://www.msg.ucsf.edu/local/programs/moloc/home.html.

IDRTech

[Online]

// PDB
-
Ligand.

-

IDRTech Inc., 2004.

-

2009.

-

http://www.idrtech.com/PDB
-
Ligand/.

Kirchmair Johannes [et al.]

PDB Tools [Online]

// Universitat Innsbruck.

-

2009.

-

http://www.uibk.ac.at/pharmazie/phchem/camd/pdbtools.html.

Kuntz Irwin D. [et al.]

A Geometric Approach to Macromolecule
-
ligand Interactions [Journal]

// Journal of Molecular Biology.

-

[s.l.]

: Elsevier, Oct 25, 1982.

-

2

: Vol. 161.

-

pp. 269
-
288.

Li Bin [et al.]

Characterization of Local Geometry of Protein Surfaces with the Visibili
ty Criterion [Journal]

// Proteins.

-

2008.

-

Vol. 71.

-

pp. 670
-
683.

Lilien Ryan and Wallach Izhar

Downloads Page [Online]

// UofT PSMDB.

-

University of Toronto.

-

http://compbio.cs.toronto.edu/psmdb/.

Orengo C. A., Jones D. T. and Thornton J. M.

Bioinfo
rmatics: Genes, Proteins, and Computers [Book].

-

Oxford

: BIOS
Scientific Publishers Limited, 2003.

Periwal Vipul

Computational Constraints on Modeling in Systems Biology [Book Section]

// System Modeling in Cellular
Biology: From Concepts to Nuts and Bol
ts

/ book auth. Szallasi Zoltan, Stelling Jorg and Periwal Vipul.

-

Boston

: The MIT
Press, 2006.

RCSB PDB

[Online]

// Ligand Expo.

-

http://ligand
-
expo.rcsb.org/.

Rutgers and UCSD

[Online]

// RCSB Protein Data Bank.

-

Research Collaboratory for Structural

Bioinformatics (non
-
profit
consortium), 1971
-
2009.

-

2009.

-

http://www.rcsb.org/pdb/home/home.do.

Schneidman
-
Duhovny Dina, Wolfson Haim J. and Nussinov Ruth

Predicting Molecular Interactions in silico: II. Protein
-
Protein
and Protein
-
Drug Docking [Journa
l]

// Current Medicinal Chemistry.

-

2004.

-

pp. 91
-
107.

Structural Bioinformatics Group Institute of Molecular Biology and Bioinformatics

[Online]

// SuperTarget.

-

Universitätsklinikum Berlin.

-

2009.

-

http://bioinf
-
tomcat.charite.de/supertarget/.

Walla
ch Izhar and Lilien Ryan

The Protein

Small
-
Molecule Database (PSMDB), A Non
-
Redundant Structural Resource for
the Analysis of Protein
-
Ligand Binding [Journal].

-

[s.l.]

: Bioinformatics, 2009.

-

5

: Vol. 25.

-

pp. 615
-
20.

Wolfson Haim J.

Rigid and Flexible

Docking [Online]

// BioInfo3D.

-

Tel Aviv University, January 2002.

-

October 2009.

-

http://bioinfo3d.cs.tau.ac.il/Education/CS99b/class_notes/class6.html.

Yang Chao
-
Yie [et al.]

Importance of Ligand Reorganization Free Energy in Protein
-
Ligand Binding
-
A
ffinity Prediction
[Journal].

-

[s.l.]

: American Chemical Society, 2009.