Identification
of
protein
-
protein
binding
motifs
Felipe
Leal
Valentim
felipe.lealvalentim@wur.nl
Aalt
-
Jan van Dijk
aaltjan.vandijk@wur.nl
Plant Research
International
Applied Bioinformatics
Protein
-
protein binding
interfaces
Protein
-
protein binding
interfaces
Surface
Core
Surface
Core
Interface
Ligand binding site
DNA
-
binding site
Properties:
Exposed in the protein surface;
Functionally/Structurally
important residues are more highly
conserved;
Core structural residues
[van Dijk AD et al., PLoS Comput Biol.
2010]
-
Sequence Motifs in MADS
Transcription Factors Responsible for Specificity and Diversification of Protein
-
Protein Interaction
Changing the specificity of the protein interaction
Protein
-
protein binding
motifs
Interface
Protein
-
protein binding
motifs
Protein binding interfaces are composed by residues highly conserved and
exposed in the surface;
The interface can be represented by short sequence motifs; which are
thought to be overrepresented in pairs of interacting proteins.
Identification binding interfaces from structures
[Hubbard SJ, Thornton JM] Naccess V2.1.1
-
Atomic Solvent Accessible Area Calculations
Protein 1
Protein 2
Complex 1
-
2
Protein 1
Binding interface
Protein 2
Binding interface
Arabidopsis
Histidine
Kinase4
Arabidopsis Trans
Zeatin
Interface
Structural
information available in the PDB
Sequence
-
and interactome
-
based
pipeline
to locate binding
sites
in Arabidopsis proteins
Sequences
-
> The evolutionary conservation;
Sequences
-
> Residue surface accessibility;
Interactome
-
> Overrepresented
motifs
;
Motif
that
are
:
likely
to
be
exposed
in
the
surface
;
conserved
across
species
;
and
overrepresented
in
pairs
of
interacting
proteins
.
Sequence
-
and interactome
-
based
pipeline
to locate binding
sites
in Arabidopsis proteins
SHY2
IAA16
IAA7
IAA18
TPL
IAA1
IAA2
IAA11
Sequence
-
and interactome
-
based
pipeline
to locate binding
sites
in Arabidopsis proteins
Input fasta
sequences
>Protein
sequence1
>Protein
sequence2
...
>Protein
sequenceN
Input Interacting list
Protein1
-
Protein2
Protein2
-
Protein4
...
ProteinN
-
ProteinM
Calculate conservation score
Al2CO
3
Find
orthlogs
from each protein sequence
OrthoMCL
1
Best blast reciprocal hint
2
Conservation
Conservation Protein 1
Conservation Protein 2
..
.
Conservation Protein N
Predict residue surface accessibility (RSA)
SABLE
4
RSA
RSA Protein 1
RSA Protein 2
..
.
RSA Protein N
Non
-
interface
motifs
Interface
motif
Predicted
motifs
False Positives (FP)
True Positives (TP)
Precision = TP/(TP + FP)
Assessment
of the pipeline's performance
Assessment
of the pipeline's performance
Coverage: up
to 42%, 22%
and 42%, respectively for
the human, yeast and
Arabidopsis subsets.
Precision
: up to 58%, 96%
and 100%.
Locating
interaction binding sites in Arabidopsis sequences at a
large
scale
–
Overview
Predicted motifs
: 1498
interactions
among
985
proteins
36
% of the proteins in the interactome
and
~5.5
% of all Arabidopsis proteins
Validation and bioinformatics analysis
Comparison with single nucleotide polymorphism (SNP) data
nsSNP’s
Predicted protein
-
protein binding sites
Protein
sequence
nsSNPs
(protein sequence):2.2%
>
nsSNPs
(binding sites):1.6%
Functional constraints
Intermolecular
coevolution
Comparison with annotation of amino acid mutagenesis
amino acid mutagenesis
Protein
-
protein binding sites
Others functionally important sites
DNA
binding
sites
Protein
sequence
Proteins with a predicted motif
Mutagenesis annotation (
UniProt
)
(
n
=38)
n
=985
16 cases:
predicted motifs
overlap
the
mutated amino acid
Some interesting cases
Master's Project
Proposal:
Cross
-
species analysis of protein
-
protein binding motifs
Question???????
Practical assignment
–
Perl scripting for
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο