Intrinsically Disordered Proteins (IDPs) - mat.edu.bioinform...

dasypygalstockingsΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

112 εμφανίσεις

C
enter For
C
omputational

B
iology and
B
ioinformatics

Bioinformatics and

Intrinsically Disordered Proteins (IDPs)




A. Keith Dunker

Biochemistry and Molecular Biology &

Center for Computational Biology / Bioinformatics

Indiana University School of Medicine

Presented at:


October 22, 2010

Outline


What are “Intrinsically Disordered Proteins” ?


Bioinformatics Applications to IDPs


Why don’t IDPs form structure?


Predicting IDPs from amino acid sequence


Some important results from IDP prediction


An improved order / disorder amino acid scale


Predicting phosphorylation sites


Disorder and function: two examples


Importance of bioinformatics to IDP research

Definitions:

Intrinsically Disordered
Proteins (IDPs)
and

ID Regions (IDRs)


Whole proteins and regions of proteins are
intrinsically disordered if they lack stable 3D
structure under physiological conditions,



But exist instead as highly dynamic, rapidly
interconverting ensembles without
particular equilibrium values for their
coordinates or bond angles and with non
-
cooperative conformational changes.


Outline


What are “Intrinsically Disordered Proteins” ?


Bioinformatics Applications to IDPs


Why don’t IDPs form structure?


Predicting IDPs from amino acid sequence


Some important results from IDP prediction


An improved order / disorder amino acid scale


Predicting phosphorylation sites


Disorder and function: two examples


Importance of bioinformatics to IDP research

Why are
IDPs / IDRs unstructured?


From the 1950s to now,

>> 1,000 IDPs / IDRs

studied and characterized



Visit:
http://www.disprot.org




Why do
IDPs

&
IDRs

lack structure?


Lack a ligand or partner?


Denatured during isolation?


Folding requires conditions found inside cells?


Lack of folding encoded by amino acid sequence?

Amino Acid Compositions

Surface

Buried

Why are
IDPs / IDRs unstructured?


To a first approximation, amino acid composition
determines whether a protein folds or remains
intrinsically disordered.



Given a composition that favors folding, the
sequence details determine which fold.



Given a composition that favors not folding, the
sequence details provide motifs for biological
function.

Outline



What are “Intrinsically Disordered Proteins” ?


Bioinformatics Applications to IDPs


Why don’t IDPs form structure?


Predicting IDPs from amino acid sequence


Some important results from IDP prediction


An improved order / disorder amino acid scale


Predicting phosphorylation sites


Disorder and function: two examples


Importance of bioinformatics to IDP research

Prediction of
Intrinsic Disorder


Predictor Validation on Out
-
of
-
Sample Data

Prediction

Attribute Selection or Extraction

Separate Training and Testing Sets

Predictor Training

Ordered / Disordered Sequence Data

Aromaticity,

Hydropathy,

Charge,

Complexity

Neural Networks,

SVMs, etc.

First Machine
-
learning Predictor

SDR/MDR/LDR Predictors

1.
Short Disordered Regions (SDR):

7


21 missing AA


Medium Disordered Regions (MDR):

22


44


Long Disordered Regions (LDR):


45 or more



2.
SDR / MDR / LDR predictors: Neural networks


3.
Training dataset: proteins with missing AA



SDR: 34 proteins, 11,050 AA, 38 IDR, 411 IDAA

MDR: 20 proteins, 4,764 AA, 22 IDR, 464 IDAA

LDR: 7 proteins, 2,069 AA, 7 IDR, 465 IDAA


4.

Feature selection: standard sequential forward selection


5.
Accuracy: 59


67% estimated by 5
-
cross validation


6.
Better than chance; Better on self than on not self

Romero P, et.al. Proc. IEEE International Conference on Neural Networks. 1:90
-
95 (1997)

Next: PONDR®VL
-
XT

XN
(1)

XC
(1)

VL1
(2)

VL
-
XT
(2)

11

14

N
-
11

N
-
14

XN, VL1, and : neural networks

(1)
Li X et al.,
Genome Informat
. 9:201
-
213 (1999)

(2)
Romero P et al.,
Proteins

42:38
-
48 (2001)

Input features:

XN: 8

VL1: 10

XC: 8

Inputs for PONDR
®
VL
-
XT

XN

Coordination
No.

V

VIYFW

M

N

H

D

PEVK

-

-

VL1

Coordination
No.

Net charge

WFY

W

Y

F

D

E

K

R

XC

Coordination
No.

Hydropathy

VIYFW

M

T

H

-

PEVK

-

R

Accuracy (ACC) = (% Corr
-
O)/2 + (%Corr
-
D)/2

ACC ( estimated by cross
-
validation ) ~ 72
±

4%


Li X. et.al. Genome Informat. 9:201
-
213(1999)

Romero P. et.al. Proteins 42:38
-
48(2001)

Disorder Prediction in CASP


Critical Assessment of Structure Prediction


http://predictioncenter.org



CASP1(1994) to CASP9 (2010)


Experimentalists provide amino acid sequences
as they are determining the structures of proteins


Groups register and make structure predictons


After structures determined, predictions evaluated


Disorder predictions

introduced in CASP5 (2002)



CASP PREDICTIONS ARE TRULY BLIND!!!

Disorder Prediction in CASP

CASP5 (2002), sensitivity replaced AUC

VSL2

VSL2

PreDisorder

Our Performance in CASP


Used VL
-
XT, poor on short disordered regions in
CASP5, but very well on long disordered regions.




VL trained mainly on long disordered regions.



Changed predictor in CASP6 and CASP7, new
predictor ranked #1. Big improvement !!



Did not participate in CASP 8, but
would not

have
ranked #1 with current predictors.



What was change that led to large improvement in
CASP6??

Predictors of Natural Disordered Regions

PONDR
®
VL
-
XT and PONDR
®
VSL2

(1)
Li X et al.,
Genome Informat
. 9:201
-
213 (1999)

(2)
Romero P et al.,
Proteins

42:38
-
48 (2001)

(3)
Peng K et al.,
BMC Bioinfo.

7:208 (2006)

N
(1)

C
(1)

VL1
(2)

VL
-
XT
(2)

11

14

N
-
11

N
-
14

VL2
(3)

VS2
(3)

VSL2
(3)

O
M

1
-
O
M

O
L

O
S

VSL2 Score = O
L
×
O
M

+ O
S
×
(1
-
O
M
)


M1
(3)

N, VL1, and C are neural networks

N
-
term: 8 inputs

VL1:


10 inputs

C
-
term: 8 inputs

M1, VSL2
-
L, and VSL2
-
S are

support vector machines

M1:

54 inputs

VL2:

20 inputs

VS2:

20 inputs

Comparison on CASP 8 Dataset

Zhang P, et.al. (unpublished results; not quite same as CASP evaluation)

ACC = 80%

ACC = (%Corr
-
O)/2 + (%Corr
-
D)/2

AUC = Area Under Curve

AUC =

0.89

(+) Disordered




XPA



(

) Structured

PONDR
®
VL
-
XT,
PONDR
®
VSL2B

and PreDisorder

Iakoucheva L et al.,
Protein Sci

3: 561
-
571 (2001)

Dunker AK et al.,
FEBS J

272: 5129
-
5148 (2005)

Deng X., et al., BMC Bioinformatics 10:436 (2009)

Published Predictors of

Disordered Proteins


He B, et al.,
Cell Res
19: 929
-
949 (2009)

Year

PONDRs:


-

VSL1: Ranked #1

in CASP 6 (2004);


-

VSL2: Ranked #1

in CASP 7 (2006);



PONDRS

# +,
-

/ # phobics

CASP

5

6

7

8

Outline


What are “Intrinsically Disordered Proteins” (IDPs)


Bioinformatics Applications to IDPs


Why don’t IDPs form structure?


Predicting IDPs from amino acid sequence


Some important results from IDP prediction


An improved order / disorder amino acid scale


Predicting phosphorylation sites


Disorder and function: two examples


Importance of bioinformatics to IDP research

How Abundant are

IDRs/IDPs?



To Estimate Abundance of

IDPs/IDRs
:
predict on whole proteomes from many
organisms.



ALERT!!


Lack of membrane
-
protein
-
specific
disorder
predictors

means that



Estimates of
disorder

will be too low by a
small percentage.


Organisms

#

Orgs.

#

Proteins

Avg. #
Proteins

%
Disordered

AA

%Proteins

IDR >30

%Proteins

Natively

Unfolded





Archaea

73

536



4234

2199

12.5



37.2%

0



60.0%

3.2




31.5%

Bacteria

951

182



9320

3331

12.0



36.1%

11.5



53.7%

2.7




29.2%

Single
-
cell

Eukarya

58

1909



16365

9098

22.3



49.9%

17.0



76.8%

16.8




47.6%

Multi
-
cell

Eukarya

51

1775



35942

11295

10.4



49.0%

4.4



66.5%

6.9




48.7%

VSL2 Prediction of Abundance**

of
Intrinsically Disordered Proteins

**Are organism
-
specific predictors sometimes needed?

Archaea Phylogenetic Tree

>30%
>21%

>14%

Todd Lowe (http://archaea.ucsc.edu/)

>17%

<14%

Predicted
Disorder

vs. Proteome Size

Why So Much Disorder?

Hypothesis:
Disorder

Used for Signaling




Sequence


Structure


Function







Catalysis,




Membrane transport,






Binding small molecules.



Sequence


Disordered Ensemble


Function




Signaling,





Regulation,
Dunker AK, et al.,
Biochemistry

41: 6573
-
6582 (2002)




Recognition,
Dunker AK, et al.,
Adv. Prot. Chem
. 62: 25
-
49 (2002
)




Control.


Xie H, et al.,
Proteome Res.

6: 1882
-
1932 (2007)

Outline


What are “Intrinsically Disordered Proteins” (IDPs)


Bioinformatics Applications to IDPs


Why don’t IDPs form structure?


Predicting IDPs from amino acid sequence


Some important results from IDP prediction


An improved order / disorder amino acid scale


Predicting phosphorylation sites


Importance of bioinformatics to IDP research

A New
Order

/
Disorder
AA Scale, Part 1


Collect equal numbers of
O

and
D

windows of length 21.


Calculate the value of attribute, x, for each window.


For each interval of x, count how many windows are
O

and
D
; from this, determine P (
O

I x) and P (
D

I x)


Plot P (
O

I x) and P (
D

I x) versus x.


Determine the areas between the two curves.


Area Ratio Value = (area between curves / total area)


Apply to 517 aa scales:
http://www.genome.jp/aaindex

.


Rank scales from smallest to largest

Campen A, et al Protein Pept Lett 15: 956
-
963 (2008)

A New
Order

/
Disorder
AA Scale, Part 2


Overall idea: make random changes to a scale, test for
higher ARV, repeat until no larger value is found.


Genetic Algorithm Pseudocode:


Choose initial population


Repeat


Evaluate the fitness of each individual


Select a certain portion of best
-
ranking individuals


Breed new population through crossover + mutation


Until terminating condition


ARV value improved from 0.69 for best of 517 scales to
0.76 for new scale, called TOP
-
ID

Campen A, et al Protein Pept Lett 15: 956
-
963 (2008)

P (
D

l x) and P (
O

I x) Versus x Plots:

Area Between Curves Used to Rank Attributes, X

Flexibility

Positive Charge

Extracellular Protein

AA Composition

TOP
-
IDP

Campen A et al.,
Protein & Peptide Lett

15: 956
-
963 (2008)

ARV = 0.69, Rank = #1/517

ARV = 0.07, Rank #517/517

ARV = 0.36, Rank = #238/517

ARV = 0.76

Analysis of the disorder propensity in p53 by

Top
-
IDP

(
A), PONDR® VLXT (B) and PONDR® VSL1

(
C).

Chronology of Amino Acid Evolution


DISORDER
TO
ORDER
, NON
-
LIFE TO LIFE


Di Mauro E, et al., in Genesis: Origin of Life on Earth and
Other Planets (In press)

Outline


What are “Intrinsically Disordered Proteins” (IDPs)


Bioinformatics Applications to IDPs


Why don’t IDPs form structure?


Predicting IDPs from amino acid sequence


Some important results from IDP prediction


An improved order / disorder amino acid scale


Predicting phosphorylation sites


Disorder and function: two examples


Importance of bioinformatics to IDP research

New Phosphorylation
Predictor

KNN


similarity to known
sites (+ /
-
) of phosphorylation


Disorder Scores


used VSL2


AA frequencies


at sequence
positions before and after
phophorylation sites

Gao J et al Mol and Cell

Proteomics (In press)

Disorder

Score vs. Phosphorylation

Gao J et al., Mol & Cell Proteomics 9 (Epub) (2010)

0

0.2

0.4

0.6

0.8

1

0

5000

10000

(A) Phospho
-
S/T in

H. sapiens

0

0.2

0.4

0.6

0.8

1

0

1

2

x 10

5

(B) Non
-
phospho
-
S/T in

H. sapiens

0

0.2

0.4

0.6

0.8

1

0

500

1000

1500

(C) Phospho
-
S/T in

A. thaliana

Occurence

0

0.2

0.4

0.6

0.8

1

0

5

10

15

x 10

4

(D) Non
-
phospho
-
S/T in

A. thaliana

0

0.2

0.4

0.6

0.8

1

0

200

400

(E) Phospho
-
Y in

H. sapiens

0

0.2

0.4

0.6

0.8

1

0

2

4

6

x 10

4

(F) Non
-
phospho
-
Y in

H. sapiens

Disorder score





-
6

-
5

-
4

-
3

-
2

-
1


0

+1

+2

+3

+4

+5

+6

Residue Positions

91.3% > 0.5

87.6% > 0.5

54.9% > 0.5

50.5% > 0.5

Outline


What are “Intrinsically Disordered Proteins” (IDPs)


Bioinformatics Applications to IDPs


Why don’t IDPs form structure?


Predicting IDPs from amino acid sequence


Some important results from IDP prediction


An improved order / disorder amino acid scale


Predicting phosphorylation sites


Disorder and function: two examples


Importance of bioinformatics to IDP research

Signaling Example 1:

Calcineurin and Calmodulin

A
-
Subunit

B
-
Subunit

Autoinhibitory

Peptide

Active Site

Kissinger C et al.,
Nature

378:641
-
644 (1995)

Meador W et al.,
Science


257: 1251
-
1255 (1992)

Example 2:

p27kip1: A Disordered Domain

Cyclin A

CDK

p27kip1

3D Structure:

Russo AA et al.,
Nature

382: 325
-
331 (1996)

DD:

Tompa P et al.,
Bioessays

4: 328
-
340 (2008)

(69 residues)

The p27kip1
Disordered Domain:


Used for Signal Integration

Y
88

T
187


pY
88

T
187

ATP

pY
88

pT
187

Ub’n

pY
88

pT
18
7









?

?

?

1

2

3

4

1. NRTK phosphorylation @ Y88, signal #1.

2. Intra
-
molecular phosphorylation @ T187, #2.

3. Ubiquitination @ several possible loci, #3.

4. Proteasome digestion of p27, then cell cycle








progression.

Galea CA et al.,
J Mol Biol

376: 827
-
838 (2008)

Dunker AK & Uversky VN,
Nat Chem Biol

4: 229
-
230 (2008)

Outline


What are “Intrinsically Disordered Proteins” (IDPs)


Bioinformatics Applications to IDPs


Predicting IDPs from amino acid sequence


Some important results from IDP prediction


An improved order / disorder amino acid scale


Predicting phosphorylation sites


Disorder and function: two examples


Importance of bioinformatics to IDP research

Importance of Bioinformatics to


IDP and Protein Research


Thousands of
IDPs

and
IDRs

have been found.


Not one
IDP

or
IDR
is discussed in any current
biochemistry textbook!


Why?
-

IDPs and IDRs don’t fit


Sequence


Structure


Function


New paradigm developed from bioinformatics


Sequence


Disordered Ensemble



Function

IDP prediction is changing fundamental views of
structure
-
function relationships!











Thank You ! ! !

Collaborators








Indiana University

Bin Xue

Jake Chen

Bill Sullivan

Predrag Radivojac

Jennifer Chen

Pedro Romero

Marc Cortese

Derrick Johnson

Chris Oldfield

Amrita Mohan

Yunlong Liu

Ann Roman

Tom Hurley

Anna DePaoli
-
Roach

Yuro Tagaki

Siama Zaidi

Jingwei Meng

Wei
-
Lun Hsu

Hua Lu

Fei Huang

Vladimir Uversky

UCSD

Lilia Iakoucheva Sebat

Temple University

Zoran Obradovic

Slobodan Vucetic

Vladimir Vacic

Kang Peng

Hiongbo Xie

Siyuan Ren

Uros Midic

Enzyme Institute

Peter Tompa

Zsuzsanna Dosztanyi

Istvan Simon

Monika Fuxreiter

USU

Robert Williams


Harbin Engineering University

Bo He

Kejun Wang

University of Idaho

Celeste J. Brown

Chris Williams

Molecular Kinetics

Yugong Cheng

Tanguy LeGall

Aaron Santner


Plant and Food Research

Xaiolin Sun

USF

Gary Daughdrill

Wright State University

Oleg Paliy