Protein Interface Identification

hordeprobableBiotechnology

Oct 4, 2013 (3 years and 6 months ago)

73 views

Identification

of
protein
-
protein

binding
motifs

Felipe
Leal
Valentim

felipe.lealvalentim@wur.nl

Aalt
-
Jan van Dijk

aaltjan.vandijk@wur.nl

Plant Research
International

Applied Bioinformatics

Protein
-
protein binding
interfaces

Protein
-
protein binding
interfaces

Surface

Core

Surface

Core

Interface

Ligand binding site

DNA
-
binding site

Properties:


Exposed in the protein surface;


Functionally/Structurally
important residues are more highly
conserved;

Core structural residues

[van Dijk AD et al., PLoS Comput Biol.
2010]
-

Sequence Motifs in MADS
Transcription Factors Responsible for Specificity and Diversification of Protein
-
Protein Interaction

Changing the specificity of the protein interaction

Protein
-
protein binding
motifs

Interface

Protein
-
protein binding
motifs

Protein binding interfaces are composed by residues highly conserved and
exposed in the surface;





The interface can be represented by short sequence motifs; which are
thought to be overrepresented in pairs of interacting proteins.


Identification binding interfaces from structures

[Hubbard SJ, Thornton JM] Naccess V2.1.1
-

Atomic Solvent Accessible Area Calculations

Protein 1

Protein 2

Complex 1
-
2

Protein 1

Binding interface

Protein 2

Binding interface

Arabidopsis
Histidine

Kinase4

Arabidopsis Trans
Zeatin

Interface

Structural
information available in the PDB

Sequence
-

and interactome
-
based
pipeline
to locate binding
sites
in Arabidopsis proteins


Sequences
-
> The evolutionary conservation;





Sequences
-
> Residue surface accessibility;





Interactome
-
> Overrepresented
motifs
;

Motif

that

are
:

likely

to

be

exposed

in

the

surface
;

conserved

across

species
;

and

overrepresented

in

pairs

of

interacting

proteins
.

Sequence
-

and interactome
-
based
pipeline
to locate binding
sites
in Arabidopsis proteins

SHY2

IAA16

IAA7

IAA18

TPL

IAA1

IAA2

IAA11

Sequence
-

and interactome
-
based
pipeline
to locate binding
sites
in Arabidopsis proteins

Input fasta
sequences

>Protein

sequence1

>Protein

sequence2

...

>Protein

sequenceN

Input Interacting list

Protein1
-
Protein2

Protein2
-
Protein4

...

ProteinN
-
ProteinM

Calculate conservation score

Al2CO
3

Find
orthlogs

from each protein sequence

OrthoMCL
1

Best blast reciprocal hint
2

Conservation

Conservation Protein 1

Conservation Protein 2

..
.

Conservation Protein N

Predict residue surface accessibility (RSA)

SABLE
4

RSA

RSA Protein 1

RSA Protein 2

..
.

RSA Protein N


Non
-
interface
motifs





Interface
motif



Predicted
motifs


False Positives (FP)


True Positives (TP)


Precision = TP/(TP + FP)


Assessment
of the pipeline's performance

Assessment
of the pipeline's performance


Coverage: up
to 42%, 22%
and 42%, respectively for
the human, yeast and
Arabidopsis subsets.









Precision
: up to 58%, 96%
and 100%.

Locating
interaction binding sites in Arabidopsis sequences at a
large
scale


Overview


Predicted motifs
: 1498
interactions
among
985
proteins


36
% of the proteins in the interactome
and
~5.5
% of all Arabidopsis proteins

Validation and bioinformatics analysis

Comparison with single nucleotide polymorphism (SNP) data

nsSNP’s


Predicted protein
-
protein binding sites

Protein

sequence

nsSNPs
(protein sequence):2.2%

>
nsSNPs
(binding sites):1.6%

Functional constraints

Intermolecular
coevolution

Comparison with annotation of amino acid mutagenesis

amino acid mutagenesis

Protein
-
protein binding sites

Others functionally important sites

DNA
binding

sites

Protein

sequence

Proteins with a predicted motif

Mutagenesis annotation (
UniProt
)

(
n
=38)

n
=985

16 cases:
predicted motifs
overlap
the
mutated amino acid

Some interesting cases

Master's Project
Proposal:
Cross
-
species analysis of protein
-
protein binding motifs

Question???????

Practical assignment


Perl scripting for