Bioinformatics Research and Resources at the University of ...

underlingbuddhaBiotechnology

Oct 2, 2013 (3 years and 9 months ago)

63 views

JM
-

http://folding.chmcc.org

1

Knowledge
-
based protocols for protein structure prediction
:

from protein threading to solvent accessibility prediction and
back to protein structure prediction by threading


Jarek Meller


Division of Biomedical Informatics,

Children’s Hospital Research Foundation

& Department of Biomedical Engineering, UC

JM
-

http://folding.chmcc.org

2

Outline of the talk




Protein structure and complexity of conformational
search: from
de novo

structure prediction to similarity
based methods


Protein structure prediction by sequence
-
to
-
structure
matching (threading and fold recognition)


Secondary structure and solvent accessibility prediction


Improving fold recognition and
de novo

simulations with
accurate solvent accessibility prediction


A story from our backyard: predicting interaction
between pVHL and RNA Pol II


JM
-

http://folding.chmcc.org

3

Polypeptide chains: backbone and side
-
chains

C
-
ter

N
-
ter

JM
-

http://folding.chmcc.org

4

Distinct chemical nature of amino acid side
-
chains

ARG

PHE

GLU

VAL

CYS

C
-
ter

N
-
ter

JM
-

http://folding.chmcc.org

5

Hydrogen bonds and secondary structures

a
-
helix

b
-
strand

JM
-

http://folding.chmcc.org

6

Tertiary structure and long range contacts: annexin

JM
-

http://folding.chmcc.org

7

Domains, interactions, complexes: VHL

JM
-

http://folding.chmcc.org

8

Multiple alignment and PSSM

JM
-

http://folding.chmcc.org

9

Protein folding problem


The protein folding problem consists of
predicting three
-
dimensional structure of a
protein from its amino acid sequence


Hierarchical organization of protein structures
helps to break the problem into secondary
structure, tertiary structure and protein
-
protein
interaction predictions


Computational approaches for protein
structure prediction: similarity based and
de
novo

methods

JM
-

http://folding.chmcc.org

10

Ab initio

(or
de novo
) folding simulations




Ab initio

folding simulations consist of conformational
search with an empirical scoring function (“force field”)
to be maximized (minimized)


Computational bottleneck: exponential search space
and sampling problem (global optimization!)


Fundamental problem: inaccuracy of empirical force
fields and scoring functions (folding potentials)


Importance of mixed protocols, such as Rosetta by D.
Baker and colleagues (Monte Carlo fragment
assembly)

JM
-

http://folding.chmcc.org

11

Similarity based approaches to structure prediction:
from sequence alignment to fold recognition




High level of redundancy in biology:
sequence similarity

is often
sufficient to use the “guilt by association” rule: if similar sequence
then similar structure and function


Multiple alignments and family profiles can detect evolutionary
relatedness with much lower sequence similarity, hard to detect
with pairwise sequence alignments:
Psi
-
BLAST

by S. Altschul
et. al.


Many structures are already known (see PDB) and one can
match sequences directly with structures to enhance structure
recognition:
fold recognition

(not for new folds!)


For both, fold recognition and de novo simulation, prediction of
intermediate attributes such
secondary structure

or
solvent
accessibility

helps to achieve better sensitivity and specificity


JM
-

http://folding.chmcc.org

12

Why “fold recognition”?



Divergent (common ancestor) vs. convergent
(no ancestor) evolution



PDB: virtually all proteins with 30% seq.
identity have similar structures, however most
of the similar structures share only up to 10%
of seq. identity !


JM
-

http://folding.chmcc.org

13

Going beyond sequence similarity:
threading and fold recognition

When sequence similarity is not

detectable use a library of known

structures to match your query

with target structures.



One needs a scoring (“energy”) function

that measures compatibility

between sequences and structures.

JM
-

http://folding.chmcc.org

14

Scoring alternative conformations with
empirical (knowledge
-
based) folding potentials

misfolded

native

E

Ideally, each misfolded structure should have
an energy higher than the native energy, i.e. :



E
misfolded

-

E
native

> 0

JM
-

http://folding.chmcc.org

15

Simple contact model for protein structure prediction

Each amino acid is represented by a point in 3D space and two amino acids are
said to be in contact if their distance is smaller than a cutoff distance, e.g. 7 [Ang].



JM
-

http://folding.chmcc.org

18

Sequence
-
to
-
structure matching with contact models


Generalized string matching problem: aligning a string
of amino acids against a string of “structural sites”
characterized by other residues in contact



Finding an optimal alignment with gaps using inter
-
residue pairwise models:


E =
S

k< l

e

k l

,


is NP
-
hard because of the non
-
local character of scores
at a given structural site (identity of the interaction
partners may change depending on location of gaps in
the alignment)


R.H. Lathrop, Protein Eng. 7 (1994)

JM
-

http://folding.chmcc.org

19

Hydrophobic contact model and
sequence
-
to
-
structure alignment

H
P
H
PP

-

Solutions to this yet another instance of the global optimization problem:

a)
Heuristic (e.g. frozen environment approximation)

b)
“Profile” or local scoring functions (folding potentials)

JM
-

http://folding.chmcc.org

20

Implementing threading protocols: LOOPP


LOOPP in CAFASP4


About average for all fold recognition targets


(missing some easy targets, recognized by PsiBlast)



Third best server in the category of difficult targets



Best predictions among the servers for 3 difficult


targets



Further improvements necessary to make the


predictions more robust


Joint work with Ron Elber

JM
-

http://folding.chmcc.org

21

Using sequence similarity, predicted secondary structures
and contact potentials: fold recognition protocols


In practice fold recognition methods are often mixtures
of sequence matching and threading, with compatibility
between a sequence and a structure measured by:


i)
sequence alignment

ii)
contact potentials

iii)
predicted secondary structures (compared to the
secondary structure of a template)


JM
-

http://folding.chmcc.org

22

Predicting 1D protein profiles from sequences:
secondary structures and solvent accessibility

SABLE server

http://sable.cchmc.org


POLYVIEW server

http://polyview.cchmc.org


a) Multiple alignment and family profiles improve prediction of local

structural propensities


b) Use of advanced machine learning techniques, such as Neural

Networks or Support Vector Machines improves results as well


B. Rost and C. Sander were first to achieve more than 70%

accuracy in three state (H, E, C) classification, applying a) and b).

JM
-

http://folding.chmcc.org

23

Predicting 1D protein profiles from sequences:
secondary structures and solvent accessibility

PDB

Sable

PsiPred

Prof

Relative solvent accessibility prediction is typically cast as a classification problem

JM
-

http://folding.chmcc.org

24

Variability in surface exposure for structurally
equivalent residues does not support classification

JM
-

http://folding.chmcc.org

25

Neural Network
-
based regression for relative
solvent accessibility (RSA) prediction

JM
-

http://folding.chmcc.org

26

Accuracy of predictions depends on the level of
surface exposure: error measures and fine tuning

JM
-

http://folding.chmcc.org

27

Overall accuracy of different regression models



S163

cc / MAE / RMSE

S156

cc / MAE / RMSE

S135

cc / MAE / RMSE

S149

cc / MAE / RMSE

SABLE
-
a

0.65 / 15.6 / 20.8

0.64 / 15.9 / 21.0

0.66 / 15.3 / 20.5

0.64 / 16.0 / 21.0

SABLE
-
wa

0.66 / 15.5 / 21.2

0.64 / 15.7 / 21.3

0.67 / 15.3 / 20.9

0.65 / 15.8 / 21.4

LS

0.63 / 16.3 / 21.0

0.62 / 16.5 / 21.1

0.65 / 15.9 / 20.5

0.62 / 16.5 / 21.2

SVR1

0.62 / 15.9 / 21.3

0.61 / 16.1 / 21.4

0.64 / 15.6 / 20.8

0.62 / 16.2 / 21.5

SVR2

0.62 / 16.6 / 22.8

0.61 / 16.7 / 22.7

0.64 / 16.4 / 22.5

0.61 / 16.9 / 23.0

Non
-
linear models: Rafal Adamczak; Linear models: Michael Wagner;

Datasets and servers: Aleksey Porollo and Rafal Adamczak

JM
-

http://folding.chmcc.org

28

Regression vs. two
-
class classification

Method

S163

S156

S135

S149

ACCpro server 25%

70.4% / 0.41

69.8% / 0.41

70.6% / 0.42

71.1% / 0.43

SABLE
-
wa BS62

71.7% / 0.43

71.1% / 0.42

72.2% / 0.44

72.2% / 0.44

SABLE
-
wa binary

71.4% / 0.42

70.9% / 0.41

71.9% / 0.43

72.1% / 0.44

SABLE
-
2c 25%

76.7% / 0.53

75.8% / 0.52

77.1% / 0.54

76.4% / 0.53

SABLE
-
wa

77.3% / 0.54

76.5% / 0.52

77.3% / 0.54

76.6% / 0.53

JM
-

http://folding.chmcc.org

29


Predicting transmembrane domains

JM
-

http://folding.chmcc.org

30


Predicting transmembrane domains

JM
-

http://folding.chmcc.org

31

Now back to threading and folding simulations



Applications in filtering out incorrect models in
both
de novo

simulations and fold recognition


Domain structure prediction, protein
-
protein
interactions


Better sensitivity in finding correct matches in
threading: one story as an example


JM
-

http://folding.chmcc.org

32

Modeling the RNA Polymerase II Interaction with
the von Hippel
-
Lindau Protein
:
from experimental
clues to structure prediction and back to experiment.


Jarek Meller

Children’s Hospital Research Foundation


Joint work with
M. Czyzyk
-
Krzeska and her group
,
College of Medicine,

University of Cincinnati

JM
-

http://folding.chmcc.org

33

A play of life (script and beyond):


Stage: protein society or proteosome


Rules of life: proteins are assembled and degraded:


nursery (
ribosome
) vs. police and gillotine (
ubiquitination and
proteasome
)


Social order: one look at the equilibrium in the system:

Holy scriptures (DNA)

Army of scribers (middle class proteins)

Temple priests (selected proteins)

Transcription

Translation

“I think we need to adjust

the interpretation of the script … “

(regulation of replication and transcription)

Law and oppression

JM
-

http://folding.chmcc.org

34

Hypoxia
-
induced stabilization of Hif
-
1a

Graphics from R.K. Bruick and S.L.McKnight, Science 295

JM
-

http://folding.chmcc.org

35

Experimental clues:


Observation: correlation between pVHL levels
and transcript elongation of the tyrosine
hydroxylase gene (M. Czyzyk
-
Krzeska)


Could pVHL influence the transcription by
interaction with elongation complex co
-
factors ?


Where to start? Experiment without a model is
usually not a very good idea. Could
in silico

study and bioinformatics help?

36

Searching for pVHL interaction targets:


Hif
-
1a ODD interacts with pVHL


other pVHL
targets should have domains structurally
resembling that of Hif1
-
a ODD


Use the Hif
-
1a ODD sequence as a query in
order to find other structures that are compatible
with it

Rpb1

Rpb6


Hif
-
1a ODD

Pro
-
OH

pVHL

JM
-

http://folding.chmcc.org

37

RNA Polymerase II in the act of transcription,

Gnatt, Kornberg et. al., Science 292 (2001)

JM
-

http://folding.chmcc.org

38

C
-
ter Rpb1

Rpb6

The C
-
terminal of Rpb1 and Rpb6 form a pocket on the surface of
RNA Polymerase II complex.
C
-
ter of Rpb1 and Rpb6 represented by cartoons.


JM
-

http://folding.chmcc.org

39

Could the Hif ODD fragment resemble C
-
terminal
fragment of RNA Polymerase II ?


A motif similar to that of ODD found, but that could occur by chance.
We used sequence alignments and threading to measure similarity
between these fragments.


Sequences about 25% identical for a short fragment of about 50 aa


not significant.


Predicted secondary structures similar.


Suggestive but still not significant similarity.


However, a weak match between the adjacent Rpb6 and the
consecutive part of the Hif
-
1a sequence was observed in threading
(3D
-
PSSM, Loopp).


Prediction: the ODD shares 3D structure with C
-
ter fragment of
Rpb1 and Rpb6.


Implication: VHL is likely to interact with Rpb1/Rpb6!

JM
-

http://folding.chmcc.org

40

Experimental results (MCK):


RNA Pol II peptides suggested by
computational analysis do bind to pVHL and
this binding is controlled by hydroxylation of the
critical PRO residue.


Co
-
immunoprecipitations of hyper
-
phosphorylated RNA Pol II and pVHL observed:
interaction confirmed.


Ubiquitination of Rpb1 confirmed.


Biological meaning?