Slides

underlingbuddhaΒιοτεχνολογία

2 Οκτ 2013 (πριν από 4 χρόνια και 9 μέρες)

75 εμφανίσεις

Lecture 13

Structural Aspects of
Protein
-
Protein Interactions


PHAR 201/Bioinformatics I

Philip E. Bourne

Department of Pharmacology, UCSD


Slides modified from JoLan Chung






Agenda



Understand the importance of studying
protein
-
protein interactions at the
structural level


Classify the various types of interactions


Look in detail at one structure
-
based
method for predicting protein
-
protein
interactions


LINK

Still seems the best review

Types of protein
-
protein interactions (PPI)

Obligate PPI


the protomers are
not

found as stable
structures on their
own
in vivo

Non
-
obligate PPI


Obligate
homodimer

P22 Arc
repressor
DNA
-
binding

Obligate
heterodimer

Human cathepsin D

1LYB

Non
-
obligate homodimer

Sperm lysin

Non
-
obligate heterodimer

RhoA

and
RhoGAP

signaling complex

Types of protein
-
protein interactions (PPI)

Obligate PPI


usually
permanent

the protomers are
not

found as stable
structures on their
own
in vivo

Non
-
obligate PPI


Obligate
heterodimer

Human cathepsin D

Non
-
obligate transient
homodimer, Sperm lysin
(interaction is broken and
formed continuously)

Permanent

(many enzyme
-
inhibitor
complexes)

dissociation constant
K
d
=[A][B] / [AB]

10
-
7

-

10
-
13

M

Transient

Weak

(electron
transport
complexes)

K
d

mM
-

M


Non
-
obligate
permanent
heterodimer

Thrombin

and
rodniin
inhibitor


Intermediate

(antibody
-
antigen, TCR
-
MHC
-
peptide,
signal transduction PPI),
K
d


M
-
nM

Strong

(require a molecular
trigger to shift the
oligomeric
equilibrium)

K
d

nM
-
fM


Bovine G protein dissociates into
G


and
G



subunits
upon GTP, but forms a stable trimer upon GDP

Types of protein
-
protein interactions (PPI)

Obligate PPI


usually
permanent


the protomers are
not

found as stable
structures on their
own
in vivo

Non
-
obligate PPI


Obligate
heterodimer

Human cathepsin D

Non
-
obligate transient
homodimer, Sperm lysin
(interaction is broken and
formed continuously)

Permanent

(many enzyme
-
inhibitor
complexes)

dissociation constant
K
d
=[A][B] / [AB]

10
-
7

÷

10
-
13

M

Transient

Weak

(electron
transport
complexes)

K
d

mM
-

M


Non
-
obligate
permanent
heterodimer

Thrombin

and
rodniin
inhibitor


Intermediate

(antibody
-
antigen, TCR
-
MHC
-
peptide,
signal transduction PPI),
K
d


M
-
nM

Strong

(require a molecular
trigger to shift the
oligomeric
equilibrium)

K
d

nM
-
fM


Bovine G protein dissociates into
G


and
G



subunits
upon GTP, but forms a stable trimer upon GDP

One Approach Using Structural Bioinformatics:


Exploiting Using Sequence and Structure
Homologs to Identify Protein
-
Protein Binding Sites

Work of Jo
-
Lan Chung (Former Graduate
Student Chemistry/Biochemistry)

Co
-
mentored by Wei Wang

Proteins: Structure Function and
Bioinformatics

2006 62:630
-
640 [
PDF
]

Current Situation



We have a relatively small number of protein
complexes



We have a large number of predominantly apo form
structures being determined by structural genomics
and functionally driven structure determination



Exploiting the information obtained from these apo
structures to identify functional sites, such as protein
-
protein binding sites, is an important question.

General Properties of Protein
-
Protein Binding Sites


More hydrophobic than the rest of the protein surface.



Relatively flat (except for enzyme
-
substrate binding
sites).



The binding free energy is not distributed equally
across the protein interfaces: a small subset of
residues at the interfaces forms energy hot spots,
enriched in tyrosine, tryptophan, and arginine.



Structurally conserved residues, especially polar
resides, correspond to the energy hot spots.


Previous Methods to Identify Protein
-
Protein
Binding Sites



Sequence conservation e.g. Consurf


Docking


Threading and homology modeling


Evolutionary tracing


Correlated mutations


Properties of patches


Hydrophobicity


Neural networks and support vector machines



The above mentioned methods use evolutionary
information from sequence alignments, and/or
residue properties, and/or the geometric
information from structures.



It is difficult to judge the predictive powers of
these various methods due to the differences of
learning sets, the different definitions of binding
sites and the lack of benchmarks.





Characteristics of Previous Methods



None of the above methods consider the
residues which are spatially conserved on the
surfaces among their structure homologs.



These residues are reported to have a
correspondence to the energy hot spots on
protein interfaces and can be derived from
multiple structure alignments.



Structurally Conserved Surface Residues?

Approach



Survey the structurally conserved surface
residues in the non
-
redundant chains of hetero
complexes from the Protein Data Bank (PDB).




Incorporate the structurally conserved surface
residues with the information derived from
sequence alignments and single structures to
identify protein
-
protein binding sites.




Dataset


All non
-
redundant hetero
-
complexes were
collected from the PDB (<30% sequence identity)



A pair of chains was then selected if the reduction of
the accessible solvent area (ASA) was at least 450 Å
2
upon binding
.


The pairs containing chains belong to SCOP class >=
8 (9 and 10 did not exist) or with length less than 80
amino acids were discarded



Dataset


Retrieve structure homologs at SCOP family level from Astral
database at 40% sequence purge level.


Align each chain with its homologs with CEMC.


A chain was selected if at least 4 members in its alignment were
aligned together over 60% of its length and with Z score > 4.


Discard the chains with less than 20 interfacial contacts or with
constant B
-
factors.


The final dataset was composed of 274 non
-
redundant chains of

hetero
-
complexes. Each of these chains was accompanied with a

structure alignment with at least 4 members.

Derive the Structurally Conserved Residues




The structural conservation scores were
derived from the multiple structural
alignments.



Each position in the alignment has a
structural conservation score, which
represents the conservation in 3D space.



A position has a high conservation score
if the aligned residues are spatially
conserved.








The Structural Conservation Score


Raw structural conservation score





where









if a is not gap and b is not gap


otherwise





where

N

is

the

total

number

of

aligned

structures,

s
i
(
x
)

is

the

amino

acid

at

position

x


in

the

i
th

structure

in

the

alignment,

M

is

a

modified

PET

substitution

matrix

calculated



by

Valdar

et

al
.

d

is

the

distance

between

Calpha

atoms
.









The Structural Conservation Score




The B
-
factors determined by X
-
ray crystallographic
experiments provide an indication of the degree of mobility and
disorder of an atom in a protein structure



Raw structural conservation scores were weighted by the
normalized B
-
factors (
B
norm, i
) to consider the structure flexibility





where











Does the Structural Conservation Score Discriminate
Interface from Non
-
interface Residues?

60,834 Residues from 274 Chains

Incorporating the Structural Conservation Scores to Predict the
Interface Residues



A surface residue



Sequence profile + ASA + Structural conservation score


in a window of 13 residues

(The residue to be predicted and 12 spatially nearest surface residues)



Support vector machine classifier trained to distinguish surface

non
-
interface residues vs surface residues



Interface or non
-
interface residue ?



Clustering of sites

Predictor 1: Sequence profile + ASA.

Predictor 2: Sequence profile + ASA + structural conservation score

The Performances of the Predictors

Clustering of Results


Clustering was used to remove isolated interface
residues and include non
-
interface residues
surrounded by several interface residues


Any 2 residues were clustered if the distance
between the C beta

s was less than 6.5A


Clusters with only 1or 2 residues were removed


A non
-
interface residue was converted to an
interface residue if the C beta of the residue and
at least 3 interface residues were located within
6A

Precise prediction: at least 70% interface residues were identified.

Correct prediction: at least 50 % interface residues were identified.

Partial prediction: some but less than 50 % interface residues were identified.

Wrong prediction: no interface residues were identified.

The Performances of the Predictors



The Predicted Binding Sites (Example 1)


Protein : domain 1 of the human coxsackie and adenovirus receptor


(CAR D1)
-

yellow


Mediate adenoviruses and coxsackie virus B infection.


CAR is an integral membrane protein expressed in a broad range of human and
murine cell type. CAR D1 is one of its two extracellular domains.


Binding partner: knob domain of the adenoviruses serotype 12 (Ad12)
-

blue

Predictor 1

Predictor 2


The Predicted Binding Sites

(example 1)


Each CAR D1 binds at the interface between two adjacent Ad12 knob domain


Consistent with the observation that most neutralizing antibodies to knob
are directed against the trimer, rather than monomer.



The Predicted Binding Sites

(example 2)


Protein : adrendoxin (Adx)



In mitochondria of the adrenal cortex, the steroid hydroxylating system
requires the transfer of electrons from the membrane
-
attached flavoprotein
AR via the soluble Adx to the membrane
-
integrated cytochrome P450 of
the CYP 11 family.



Binding partner: adrenodoxin reductase (AR)
-

blue

Predictor 1

Predictor 2

The Predicted Binding Sites

(example 3)


Protein : fibroblast growth factor receptor 2 (FGFR2) Ser252Trp Mutant (Gray)


Apert syndrome (AS) is caused by substitution of one of two adjacent
residues, Ser252Trp or Pro253Arg (green).


Binding partner: fibroblast growth factor (FGF2) (blue)

Predictor 1

Predictor 2

The Predicted Binding Sites

(example 4)


Protein : Nitrogenase molybdenum
-
iron protein of Clostridium pasteurianum

ß subunit

Binding partner: Nitrogenase molybdenum
-
iron protein of Clostridium

pasteurianum


s畢畮楴



Predictor 1

Predictor 2

Is the Method Sensitive to the Multiple
Structure Alignment Algorithm?

Hemoglobin


chain

Left: automatic multiple structure alignments (CE
-
MC)

Right: expert curetted

multiple structure alignments (HOMSTRAD)

Murine t
-
cell receptor variable domain

Left: automatic multiple structure alignments (CE
-
MC)

Right: expert curetted

multiple structure alignments (HOMSTRAD)

Is the Method Sensitive to the Multiple
Structure Alignment Algorithm?

Adrendoxin

Left: automatic multiple structure alignments (CE
-
MC)

Right: expert curetted

multiple structure alignments (HOMSTRAD)

Is the Method Sensitive to the Multiple
Structure Alignment Algorithm?

What Effects the Performance of
Predictor 2?



Poor structure alignment


eg.
More than 30% of the residues of the methane monooxygenase
hydroxylase


subunit were in the gapped positions of its structure
alignment.



Binding residues located in flexible loops


eg.
The deteriorated prediction of the interfaces on VP 1 in
p1/mahoney poliovirus may be caused by its long and flexible loops
contacting the other two coat proteins, VP 2 and VP 3.


Conclusions

1.
Analysis of the structurally conserved surface residues



The conserved residues were measured by the structural
conservation scores derived from the multiple structure alignments
followed by a weighting process with the B
-
factors.



Although derived from the alignments of single structures, these
structurally conserved residues did differentiate the protein
interfaces and the rest of the surfaces.


Conclusions

2. Incorporating the structural conservation score improved the
prediction significantly.


Before clustering, the recall increases about 10% at precision 50%,
increases about 18% at precision 40%.



After clustering, at precision about 50 %, the number of correctly
predicted binding sites increase close to 16%, the number of
precisely predicted binding sites increase close to 13%.



53.0% of the binding sites were precisely predicted, 75.9% of the
binding sites were correctly predicted, and 20.4% of the binding
sites were partially predicted.




Conclusions

3. This study was an initial trial that exploits multiple structure
alignments on a large scale for the prediction of functional
regions.


4. A more suitable scoring function or weighting function could be
developed in the future.


5. This method can be used to guide experiments, such as site
-
specific mutagenesis or combined with docking procedures to
limit the search space.

Acknowledgments



Dr. Wei Wang

Dr. Chittibabu Guda

Dr. BVB Reddy

All the members in Bourne group

Dr. Philip E. Bourne


This work was supported by

the Molecular Biophysics Training Grants at UCSD