BMC Bioinformatics

jamaicanabsorbingBiotechnology

Dec 5, 2012 (4 years and 6 months ago)

169 views

Thinking Outside the Box:
Applications Including Finding
Off
-
targets for Major
Pharmaceuticals

Philip E. Bourne

pbourne@ucsd.edu

Agenda


Overall Theme
-

Thinking differently
about proteins:


Spherical harmonics and phylogeny


The Gaussian Network Model and new
modes of motion


The Geometric Potential for Describing
Ligand Binding Sites


SOIPPA for finding off
-
site targets

The Curse of the Ribbon

The conventional view
of a protein (left) has
had a remarkable
impact on our
understanding of
living systems, but
its time for new
views It is not how a
ligand sees a
protein after all.

7
8
Limitations


A local viewpoint


does not capture the
global properties of the protein


A local viewpoint does not capture the
global properties of a protein


Cartesian coordinates do not
necessarily capture the properties of the
protein


Comparative analysis is limited

Agenda


Overall Theme

-

Thinking differently
about proteins:


Spherical harmonics and phylogeny


The Gaussian Network Model and new
modes of motion


The Geometric Potential for Describing
Ligand Binding Sites


SOIPPA for finding off
-
site targets

Protein Kinase A


Open Book
View

Scheeff & Bourne 2005
PLoS Comp. Biol.

1(5): e49

Superfamily Members


The
Same But Different

Scheeff & Bourne 2005
PLoS Comp. Biol.

1(5): e49


Roots in spherical harmonics


Parameter space and boundary
conditions can be a variety of properties


Order of the multipoles defines the
granularity of the descriptors


Bottom line


interpreted as shape
descriptors




An Alternative Approach:


Multipolar Representation


Gramada & Bourne 2006
BMC Bioinformatics

7:242

Geometric Comparison Does
Not Reflect Biological Reality

Gramada & Bourne 2006
BMC Bioinformatics

7:242

Results


Protein Kinase Like Superfamily
Alignment



Clear distinction
between families.


Some clustering
seen inside TPKs
that resemble
various groups,
even though there is
little shape
discrimination at this
level.

Gramada & Bourne 2006
BMC Bioinformatics

7:242

Results


Protein Kinase Like Superfamily
Alignment


Gramada & Bourne 2006
BMC Bioinformatics

7:242

Possibilities


Structure Based Phylogenetic
Analysis

Scheeff & Bourne

Multipoles

Gramada & Bourne 2007
PLoS ONE
submitted

Agenda


Overall Theme

-

Thinking differently
about proteins:


Spherical harmonics and phylogeny


The Gaussian Network Model and new
modes of motion


The Geometric Potential for Describing
Ligand Binding Sites


SOIPPA for finding off
-
site targets

Protein Motion

Ordered

Structures

Disordered

Structures

Structures exist in a spectrum from

order to disorder

Gu, Gribskov & Bourne 2006
PLoS Comp. Biol
. 2(7) e90

Obtaining Protein Dynamic Information

Protein Structures Treated as a

3
-
D Elastic Network


Bahar, I., A.R. Atilgan, and B. Erman

Direct evaluation of thermal fluctuations in proteins using a single
-
parameter harmonic potential.


Folding & Design, 1997. 2(3): p. 173
-
181.


Gaussian Network Model


Each C
a

is a node in the network.


Each node undergoes Gaussian
-
distributed
fluctuations influenced by neighboring interactions
within a given cutoff distance. (7Å)


Decompose protein fluctuation into a summation of
different modes.


Functional Flexibility Score


Utilize correlated movements to help
define regional flexibility with functional
importance.

Functionally Flexible
Score



For each residue:

1.
Find Maximum and
Minimum Correlation.

2.
Use to scale normalized
fluctuation to determine
functional importance.


Gu, Gribskov & Bourne 2006
PLoS Comp. Biol
. 2(7) e90

Identifying FFRs in HIV Protease

Gu, Gribskov & Bourne 2006
PLoS Comp. Biol
. 2(7) e90

Other Examples BPTI and Calmodulin

Gu, Gribskov & Bourne 2006
PLoS Comp. Biol
. 2(7) e90

Side Note: Gaussian Network
Model vs Molecular Dynamics


GNM relatively course grained


GNM fast to compute vs MD


Look over larger time scales


Suitable for high throughput

Agenda


Overall Theme

-

Thinking differently
about proteins:


Spherical harmonics and phylogeny


The Gaussian Network Model and new
modes of motion


The Geometric Potential for Describing
Ligand Binding Sites


SOIPPA for finding off
-
site targets

Motivation


What if we can characterize a protein
-
ligand binding site from a 3D structure
(primary site) and search for that site on
a proteome wide scale?


We could perhaps find alternative
binding sites (secondary sites) for
existing pharmaceuticals?


We could use it for lead optimization
and possible ADME/Tox prediction

Background


PDB Contains Major
Pharmaceuticals Bound to Receptors

Generic Name

Other Name

Treatment

PDBid

Lipitor

Atorvastatin

High cholesterol

1HWK, 1HW8…

Testosterone

Testosterone

Osteoporosis

1AFS, 1I9J ..

Taxol

Paclitaxel

Cancer

1JFF, 2HXF, 2HXH

Viagra

Sildenafil citrate

ED, pulmonary
arterial
hypertension

1TBF, 1UDT,
1XOS..

Digoxin

Lanoxin

Congestive heart
failure

1IGJ

Background


Superfamily
(Derived from Structure) Covers
38% of the Human Proteome

http://supfam.mrc
-
lmb.cam.ac.uk/SUPERFAMILY

Background


Advantage to Using Functional
Site Similarity


Protein

Sequence/Structure

Similarity


Protein

Functional Site

Similarity

Small molecule

Similarity

.
Not adequately reflecting


functional relationship

.
Not directly addressing


drug design problem




Poor correlation
between structure and
activity



Infinite chemical space


. Build closer structure
-


function relationships

. Limit chemical space


through co
-
evolution

Overview of Algorithm




Protein structure is represented with C
a

atoms only and is
characterized with a
geometric potential


tolerant to protein flexibility and model uncertainty




Optimum superimposition is achieved with a maximum
weighted sub
-
graph algorithm with geometric constraints


sequence order independent to detect cross
-
fold
relationships


to identify sub site similarity




Functional site similarity is measured with both
evolutionary correlation and physiochemical similarity


to distinguish divergent and convergent evolution

Xie and Bourne 2007
BMC Bioinformatics,

8(Suppl 4):S9

Characterization of the Ligand Binding
Site
-

Conceptual

1.
Represent the protein
structure

2.
Determine the
environmental
boundary

3.
Determine the protein
boundary

4.
Computation of the
geometric potential

5.
Computation of the
virtual ligand

1

2

3

4

5

a

b

c

Xie and Bourne 2007
BMC Bioinformatics,

8(Suppl 4):S9


Initially assign C
a

atom with
a value that is the distance
to the environmental
boundary



Update the value with those
of surrounding C
a

atoms
dependent on distances and
orientation


atoms within a
10A radius define i








0
.
2
0
.
1
)
cos(
0
.
1






i
Di
Pi
P
GP
neighbors
a




Conceptually similar to hydrophobicity


or electrostatic potential that is


dependant on both global and local


environments

Characterization of the Ligand Binding
Site
-

Conceptual

Xie and Bourne 2007
BMC Bioinformatics,

8(Suppl 4):S9

Discrimination Power of the Geometric
Potential

0
0.5
1
1.5
2
2.5
3
3.5
4
0
11
22
33
44
55
66
77
88
99
Geometric Potential
binding site
non-binding site



Geometric
potential can
distinguish
binding and
non
-
binding
sites




100

0

Geometric Potential Scale

Boundary Accuracy of Ligand Binding Site
Prediction

0
5
10
15
20
25
10
20
30
40
50
60
70
80
90
100
Sensitivity (%)
Distribution (%)
0
10
20
30
40
50
60
70
10
20
30
40
50
60
70
80
90
100
Specificity (%)
Distribution (%)

~90% of the binding sites can be identified with above 50%
sensitivity


The specificity of ~70% binding sites identified is above 90%

So Far…


Geometric potential dependant on local
environment of a residue


relative to other
residues and the environmental boundary


Geometric potential reasonably good at
discriminating between ligand binding sites
and non
-
ligand binding sites


Boundary of the binding site reasonably well
defined


How to compare sites ???

Agenda


Overall Theme

-

Thinking differently
about proteins:


Spherical harmonics and phylogeny


The Gaussian Network Model and new
modes of motion


The Geometric Potential for Describing
Ligand Binding Sites


SOIPPA for finding off
-
site targets



Geometric and graph characterization of the
protein structure



Chemical similarity matrix and evolutionary
relationship with profile
-
profile comparison



Optimum alignment with maximum
-
weight sub
-
graph algorithm



Identification of Functional Similarity with

Local Sequence Order Independent Alignment



Xie and Bourne 2007
PNAS
, Submitted

Similarity Matrix of Alignment

Chemical Similarity


Amino acid grouping: (LVIMC), (AGSTP), (FYW), and
(EDNQKRH)


Amino acid chemical similarity matrix


Evolutionary Correlation


Amino acid substitution matrix such as BLOSUM45


Similarity score between two sequence profiles


i
a
i
i
b
i
b
i
i
a
S
f
S
f
d




f
a
,
f
b

are the 20 amino acid target frequencies of profile
a

and
b
, respectively

S
a
,
S
b

are the PSSM of profile
a

and
b
, respectively



Xie and Bourne 2007
PNAS
, Submitted

Local Sequence
-
order Independent Alignment
with Maximum
-
Weight Sub
-
Graph Algorithm

L E R

V K D L

L E R

V K D L

Structure A

Structure B



Build an associated graph from the graph representations of two
structures being compared. Each of the nodes is assigned with a
weight from the similarity matrix


The maximum
-
weight clique corresponds to the optimum alignment
of the two structures

Efficient Functional Site Comparison with
Evolutionary and Geometric Constraints




The search space is segmented with the residue
clusters determined from the
geometric potential


The nodes and edges are greatly reduced with the
robust residue
boundary orientation and neighbors


a

b

c

1

2

a

a

2

1

b

b

1

2

c

c

2

1

a

a

2

1

b

b

1

2

c

c

2

1

+



The time complexity is almost linearly dependant on


the number of residues

Improved Performance of Alignment Quality
and Search Sensitivity and Specificity



0
10
20
30
40
50
60
70
80
90
<1.0
<3.0
<5.0
<7.0
<9.0
<11.0
RMSD (Angsgroms)
Frequency (%)
Amino Acid Grouping
Chemical Similarity
Substitution Matrix
Profile-Profile
0
0.005
0.01
0.015
0.02
0.025
0.03
0
0.04
0.08
0.12
0.16
0.2
True Positive Ratio
False Positive Ratio
Amino Acid Group
Chemical Similarity
Substitution Matrix
Profile-Profile

RMSD distribution of the aligned common fragments of ligands from
247 test cases showing four scores: amino acid grouping, chemical
similarity, substitution matrix and profile
-
profile.

.

So What is the Potential of
this Methodology?

Lead Discovery from Fragment
Assembly


Privileged molecular moieties
in medicinal chemistry



Structural genomics and high
throughput screening generate
a large number of protein
-
fragment complexes



Similar sub
-
site detection
enhances the application of
fragment assembly strategies
in drug discovery

1HQC: Holliday junction migration motor protein


from Thermus thermophilus

1ZEF: Rio1 atypical serine protein kinase


from A. fulgidus

Lead Optimization from

Conformational Constraints


Same ligand can bind to
different proteins, but with
different conformations



By recognizing the
conformational changes in the
binding site, it is possible to
improve the binding specificity
with conformational constraints
placed on the ligand

1ECJ: amido
-
phosphoribosyltransferase


from E. Coli

1H3D: ATP
-
phosphoribosyltransferase


from E. Coli

Finding Secondary Binding Sites
for Major Pharmaceuticals


Scan known binding sites for major
pharmaceuticals bound to their
receptors against the human proteome


Try and correlate strong hits with known
data from the literature, databases,
clinical trials etc. to provide molecular
evidence of secondary effects

A Case Study

Selective Estrogen Receptor Modulators
(SERM)


One of the largest
classes of drugs


Breast cancer,
osteoporosis, birth
control etc.


Amine and benzine
moiety

Xie, Wang and Bourne 2007
Nature Biotechnology
, Submitted.

Adverse Effects of SERMs

cardiac abnormalities

thromboembolic

disorders

ocular toxicities

loss of calcium

homeostatis

?????

Xie, Wang and Bourne 2007
Nature Biotechnology
, Submitted.

Ligand Binding Site Similarity Search

On a Proteome Scale


Searching human proteins covering ~38% of the
drugable genome against SERM binding site


Matching
Sacroplasmic Reticulum

(SR) Ca2+ ion
channel ATPase (SERCA) TG1 inhibitor site


ER
a

ranked top with p
-
value<0.0001 from reversed
search against SERCA


ER
a

0
20
40
60
80
0.00
0.02
0.04
0.06
Score
Density
SERCA

Xie, Wang and Bourne 2007
Nature Biotechnology
, Submitted.

Structure and Function of SERCA


R
egulating cytosolic
calcium levels in
cardiac and skeletal
muscle



Cytosolic and
transmembrane
domains



Predicted SERM
binding site locates in
the TM, inhibiting Ca2+
uptakes

Xie, Wang and Bourne 2007
Nature Biotechnology
, Submitted.

Binding Poses of SERMs in SERCA from
Docking Studies


Salt bridge
interaction between
amine group and
GLU



Aromatic
interactions for both
N
-
, and C
-
moiety

6 SERMS A
-
F (red)

Off
-
Target of SERMs

cardiac abnormalities

thromboembolic

disorders

ocular toxicities

loss of calcium

homeostatis

SERCA !




in vivo

and
in vitro

Studies



TAM play roles in regulating calcium uptake activity of cardiac SR



TAM reduce intracellular calcium concentration and release in the
platelets



Cataract results from TG1 inhibited SERCA up
-
regulations



EDS increases intracellular calcium in lens epithelial cells by
inhibiting SERCA




in silico

Studies



Ligand binding site similarity



Binding

affinity correlation

Conclusion


By thinking differently about how to
represent proteins we have seen
potential value in:


Phylogenetic analysis


The study of the dynamics of proteins


Improvements to the drug discovery
process

Acknowledgements

Jenny Gu

Protein Motions

Apostol Gramada

Multipole Analysis

Support Open Access

Lei Xie


Jian Yang

Swiss
-
Prot
-

20 Year Celebration

www.pdb.org • info@rcsb.org

Implications on Drug Development

Affinity (
ER Site
)

Affinity (
SERCA
)

Affinity Difference

Bazedoxifene(BAZ)

-
9.44 +/
-

0.54

-
7.23 +/
-

0.13

2.21

Lasofoxifene(LAS)

-
8.66 +/
-

0.40

-
6.54 +/
-

0.20

2.12

Ormeloxifene(ORM)

-
8.67 +/
-

0.18

-
5.84 +/
-

0.33

2.83

Raloxifene(RAL)

-
8.08 +/
-

0.64

-
5.78 +/
-

0.23

2.30

4
-
hydroxytamoxifen(OHT)

-
7.67 +/
-

0.47

-
5.40 +/
-

0.15

2.27

Tamoxifen(TAM)

-
7.30 +/
-

0.28

-
5.64 +/
-

0.28

1.66


Taking account of both target and off
-
target for
lead optimization


Drug delivery and administration regime

A Protein is More than the Union
of its Parts


Breaking the protein into
parts changes the object
of the comparison


This is interpreted in
many cases to imply that
the rmsd measure is
inadequate.


The reality is that it is the
aligning of structure that
breaks the triangle
inequality and not the
measure per se. The
reason for failure is that
we effectively compare
different objects then we
say we do.



From R
ø
gen & Fain (2003), PNAS
100
:119
-
124

New Tricks


Protein Representation

An Alternative Approach:


Multipolar Representation

Roots in Spherical Harmonics


Parameterization

+ boundary conditions

g
Charge distribution

(i.e. structure)

Ð
f
q
l
m
out
;
M
l
m
in
;
q
i
lm
;
M
i
lm
g
Scalar

potential

Gramada & Bourne 2006
BMC Bioinformatics

7:242

New Tricks


Protein Representation

Spatial distribution of

a scalar quantity


“Out” Multipoles

q
l
m
=
P
i
=
1
N
r
l
i
Y
ã
l
m
(
ò
i
;
þ
i
)
;
l
=
0
;
á
á
á
;
1
;
m
=
à
l
;
á
á
á
;
l

For a given rank
l
, they form a
2l+1

dimensional vector
under 3D rotations

q
l
=
f
q
l
;
m
g
m
=
à
l
;
á
á
á
;
l

Vector algebra applies => metric properties

Gramada & Bourne 2006
BMC Bioinformatics

7:242

An Alternative Approach:


Multipolar Representation

New Tricks


Protein Representation


The multipoles can be interpreted as shape descriptors



In principle, from the entire series of multipoles one can
reconstruct the scalar field and therefore the density, i.e
the entire set of Cartesian coordinates, i. e. of the
structure with a geometric level of detail



The partitioning of the multipole series according to
various representation of the rotational group allows for a
multi
-
scale description of the structure

An Alternative Approach:


Multipolar Representation

Gramada & Bourne 2006
BMC Bioinformatics

7:242

New Tricks


Protein Representation