Chemical Data and Computer-Aided Drug Discovery

wonderfuldistinctAI and Robotics

Oct 16, 2013 (4 years and 22 days ago)

106 views

Chemical Data and

Computer
-
Aided Drug Discovery

Mike Gilson

School of Pharmacy

mgilson@ucsd.edu

2
-
0622

Outline

Overview of drug discovery



Structure
-
based computational methods

When we know the structure of the targeted protein




Ligand
-
based computational methods

When we don’t know the protein’s structure

What is a drug?

Small Molecule Drugs

Aspirin

Sildenafil

(Viagra)

Glipizide

(
Glucotrol
)

Taxol

Digoxin

Darunavir

Nanoparticle
s

(e.g., packaged small
-
molecule drugs)

Doxil

(liposome package,

extended circulation
time,milder

toxicity)

Abraxane

(albumin
-
packaged
taxol
)

http://www.doxil.com/about_doxil.html

http://www.abraxane.com/professional/nab
-
technology.aspx

Biopharmaceuticals

Erythropoietin (EPO)

Stabilized variant of a natural protein hormone

Etanercept

(
Enbrel
)

Protein with TNF receptor +
Ab

Fc

domain

Scavenges TNF, diminishes inflammation

http://www.ganfyd.org/index.php?title=Erythropoietin_beta

http://en.wikipedia.org/wiki/File:Enbrel.jpg

How are drugs discovered?

Digoxin

Foxglove

Aspirin

Taxol

Willow

Pacific Yew

Natural Products

How Aspirin Works

inflammation

platelet activation

Aspirin

platelet inactivation

Biomolecular Pathways and Target Selection

E.g. signaling pathways

http://www.isys.uni
-
stuttgart.de/forschung/sysbio/insulin/index.html

Target protein

Empirical Path to Ligand Discovery

Compound library

(commercial, in
-
house,

synthetic, natural)

High throughput screening

(HTS)

Hit confirmation

Lead compounds

(e.g., µM
K
d
)

Lead optimization

(Medicinal chemistry)

Potent drug candidates

(nM
K
d
)


Animal and clinical

evaluation

Compound Libraries

Commercial (also in
-
house pharma)

Government (NIH)

Academia

Computer
-
Aided Ligand Design

Aims to reduce number of compounds synthesized and assayed


Lower costs


Less chemical waste


Faster progress

HIV Protease/KNI
-
272 complex

Scenario 1

Structure
of Targeted
Protein Known:
Structure
-
Based Drug Discovery

Protein
-
Ligand Docking

Structure
-
Based Ligand Design

VDW

Dihedral

Screened Coulombic

+

-

Potential function

Energy as function of structure

Docking software

Search for structure of lowest energy

Energy Determines Probability (Stability)

Boltzmann distribution

Energy

Probability

x

Structure
-
Based Virtual Screening

Compound database

3D structure of target

(crystallography, NMR, modeling)

Virtual screening

(e.g.,
computational docking
)

Candidate ligands

Experimental assay

Ligands

Ligand optimization

Med
chem
, crystallography, modeling

Drug candidates

Fragmental Structure
-
Based Screening

“Fragment” library

3D structure of target

(crystallography, NMR, modeling)

Fragment
docking

Compound design

http://www.beilstein
-
institut.de/bozen2002/proceedings/Jhoti/jhoti.html

Experimental assay and ligand optimization

Med
chem
, crystallography, modeling

Drug
c
andidates

Physics
-
Based


Knowledge
-
Based

Potential Functions for Structure
-
Based Design

Energy as a function of structure

Physics
-
Based Potentials

Energy terms from physical theory

Van der Waals interactions (shape fitting)

Bonded interactions (shape and flexibility)

Coulombic interactions (charge
-
charge
complementarity
)

Hydrogen
-
bonding

Common Simplifications
Used in

Physics
-
Based Docking

Quantum effects approximated classically


Protein often
held rigid


Configurational entropy neglected


Influence of
water
treated crudely

Proteins

and Ligand are Flexible

+

Ligand

Protein

Complex

D
G
o

Binding Energy and Entropy

Unbound states

Bound states

E
Free

E
Bound

Energy part

Entropy part

Structure
-
Based Discovery

Physics
-
oriented approaches

Weaknesses

Fully physical detail becomes computationally intractable

Approximations are unavoidable

Parameterization still required


Strengths

Interpretable, provides guides to design

Broadly applicable, in principle at least

Clear pathways to improving accuracy


Status

Useful, far from perfect


Multiple groups working on fewer, better
approxs

Force fields, quantum

Flexibility, entropy

Water effects

Moore’s law: hardware improving

Knowledge
-
Based Docking Potentials

Ligand

carboxylate

Aromatic

stacking

Probability


Energy

Boltzmann:

Inverse Boltzmann:

Example: ligand
carboxylate

O
to protein
histidine

N


1.

Find all protein
-
ligand structures in the PDB with a ligand
carboxylate

O

2.

For each structure, histogram the distances from
O

to every
histidine

N

3.

Sum the histograms over all structures to obtain p(
r
O
-
N
)

4.

Compute E(
r
O
-
N
) from p(
r
O
-
N
)

“PMF”,
Muegge

& Martin, J. Med. Chem. 42:791, 1999

Knowledge
-
Based Docking Potentials

A few types of atom pairs, out of several hundred total

Atom
-
atom distance (Angstroms)

Nitrogen
+
/Oxygen
-

Aromatic carbons

Aliphatic carbons

Structure
-
Based Discovery

Knowledge
-
based potentials

Weaknesses

Accuracy limited by availability of data

Accuracy may also be limited by overall approach


Strengths

Relatively easy to implement

Computationally fast


Status

Useful, far from perfect


May be at point of diminishing returns

Limitations of Knowledge
-
Based Potentials

1. Statistical limitations

(e.g., to pairwise potentials)














2. Even if we had infinite statistics, would the results be accurate?

(Is inverse Boltzmann quite right? Where is entropy?)

r
1

r
2

r
10



10

bins for a histogram of
O
-
N

distances

r
O
-
N

r
O
-
C

100

bins for a histogram of
O
-
N & O
-
C

distances

r
O
-
N

e.g. MAP Kinase Inhibitors

Using knowledge of
existing inhibitors to
discover more

Scenario 2

Structure
of Targeted
Protein Unknown:
Ligand
-
Based Drug Discovery

Why Look for Another
L
igand if You
A
lready
H
ave Some?

Experimental screening generated some ligands, but they don’t bind tightly


A company wants to work around another company’s chemical patents


An high
-
affinyt

ligand is toxic, is not well
-
absorbed, etc.

Ligand
-
Based Virtual Screening

Compound Library

Known Ligands

Molecular similarity

Machine
-
learning

Etc.

Candidate ligands

Assay

Actives

Optimization

Med
chem
, crystallography, modeling

Potent drug candidates

Sources of Data on Known Ligand

Journals, e.g., J. Med. Chem.

Some Binding and Chemical Activity Databases

PubChem

(NIH)
pubchem.ncbi.nlm.nih.gov

ChEMBL

(EMBL)
www.ebi.ac.uk/chembl

BindingDB (UCSD)
www.bindingdb.org

BindingDB

www.bindingdb.org

Finding Protein
-
Ligand Data in BindingDB

e.g., by Name of Protein “Target”

e.g., by Ligand Draw


Search

Sample Query Results

BindingDB to PDB

PDB to BindingDB

Download data in

machine
-
readable

format

Sample Query Results

Machine
-
Readable Chemical Format

Structure
-
Data File (SDF)

PDB Format Lacks Chemical Bonding

SDF Format Defines Chemical Bonds

There are Many Other Chemical File Formats

Interconvert with Babel

Chemical Similarity

Ligand
-
Based Drug
-
Discovery

Compounds

(available/synthesizable)

Similar

Test experimentally

Don’t bother

Chemical Fingerprints

Binary Structure Keys

Molecule 1

Molecule 2



Chemical Similarity from Fingerprints

Tanimoto

Similarity or
Jaccard

Index, T

N
I
=2

Intersection

N
U
=8

Union

Molecule 1

Molecule 2

Hashed Chemical Fingerprints

Based upon paths in the chemical graph

1
-
atom paths:
C
F

N
H

S
O

2
-
atom paths:
F
-
C
C
-
C

C
-
N
C
-
S

S
-
O
C
-
H

3
-
atom paths:
F
-
C
-
C
C
-
C
-
N

C
-
N
-
H
C
-
S
-
O

C

S
-
O

etc.

Each path sets a pseudo
-
random bit
-
pattern in a very long molecular fingerprint

Maximum Common Substructure

N
common
=34

Potential Drawbacks of Plain Chemical Similarity

May miss good ligands by being overly conservative



Too much weight on irrelevant details

Scaffold Hopping

Zhao, Drug Discovery Today 12:149, 2007

Identification of synthetic
statins

by scaffold hopping

Abstraction and Identification of

Relevant Compound Features

Ligand shape


Pharmacophore

models


Chemical descriptors


Statistics and machine learning

+
1

Bulky
hydrophobe

Aromatic

3.2
±
0.4 Å

Pharmacophore

Models

Φάρμακο

(drug) +
Φορά

(carry)

A 3
-
point
pharmacophore

Molecular Descriptors

More abstract than chemical fingerprints

Physical descriptors


molecular weight


charge


dipole moment


number of H
-
bond donors/acceptors


number of rotatable bonds


hydrophobicity

(log P and
clogP
)


Topological


branching index


measures of linearity
vs

interconnectedness


Etc. etc.

Rotatable bonds

A High
-
Dimensional “Chemical Space”

Each compound is at a point in an
n
-
dimensional space

Compounds with similar properties are near each other

Descriptor 2

Descriptor 3

Point representing a
compound in
descriptor space

Statistics and Machine Learning

Some examples

Partial least squares


Support vector machines


Genetic algorithms for descriptor
-
selection

Summary

Overview of drug discovery


Computer
-
aided methods


Structure
-
based


Ligand
-
based


Interaction potentials


Physics
-
based


Knowledge
-
based (data driven)


Ligand
-
protein databases, machine
-
readable chemical formats


Ligand similarity and beyond

Mike Gilson, School of Pharmacy,
mgilson@ucsd.edu
, 2
-
0622

Activities and Discussion Topics

BindingDB:
Advil


Machine
-
readable format, Binding activities


PDB/BindingDB


2ONY at PDB


BindingDB


Substructure search


Related data







Similarity search


Combined computational approaches

(physics + knowledge)
-
based docking potentials

(ligand + structure)
-
based computational discovery


Other data
-
driven methods where it may be hard to get enough statistics


Validation of computational methods


Protein
-
ligand databases: getting data and assessing data quality





Drug Discovery Pipeline

(
One Model)

Target
identification

Target
validation

Assay
development

Animal
Pharmacokinetics,
Toxicity

Phase I Clinical

(safety,
metab
, PK)

Phase II Clinical

(efficacy)

Phase III Clinical

(comparison with
existing therapy)

Lead
optimization

Lead
compound

(ligand)
discovery

Muegge

J. Med. Chem. 49: 5895, 2006

Updated Knowledge
-
Based PMF Potential