Third Indo-French Bioinformatics Meeting - DSIMB - Inserm

disgustedtukwilaInternet and Web Development

Dec 14, 2013 (3 years and 6 months ago)

74 views

Analysis and Prediction of Short Loops in
terms of a Structural Alphabet


Alexandre G. de Brevern


INSERM U726
-

Équipe de Bioinformatique Génomique et
Moléculaire (EBGM)

Université Paris 7

75251 PARIS Cedex 05


FRANCE

Third Indo
-
French Bioinformatics Meeting

Third Indo
-
French Bioinformatics Meeting

1
-

quick presentation of
EBGM

team


2
-

Some questions about secondary structures


3
-

Another concept : the structural alphabet


4
-

the short loops

Third Indo
-
French Bioinformatics Meeting

~15 permanent positions


2 professors

3 full time researchers

6 assistant professors

3 research engineers

5 PhD students

Background


Cell Biologist

Mathematician

Statistician

Biochemist

Chemico
-

Physician (
theoretical
)

Bioinformatician

Équipe de Bioinformatique Génomique et
Moléculaire (EBGM) INSERM U726


Université Paris 7


Third Indo
-
French Bioinformatics Meeting

Temps

g2

g3

g1

g5

g4

log2(Ratio)

Expression Profil

Gene

Expression

Similitarity

Graph

Axis 1
-

Transcriptome


Functions

3 dimensions

X

Y

Z

Gene

Expression

Similitarity

Lelandais
et al

(2006)
Bioinformatics

New methods


kinetic data





biclustering


Application to biological questions

Third Indo
-
French Bioinformatics Meeting

Axis 3
-

structural
genomics

Mechano
-
sensitive
Channel Large
conductance (MscL)

Homology modelling

M. Tuberculosis

E. Coli

Third Indo
-
French Bioinformatics Meeting

Axis 3
-

structural
genomics


Molecular dynamics


Normal Mode Analysis


docking

Valadié
et al

(2003)
J Mol Biol

Third Indo
-
French Bioinformatics Meeting

Axis 3
-

structural
genomics


E.coli

M.tuberculosis

10ns

0ns

Third Indo
-
French Bioinformatics Meeting




Helicoidal state

Extended state

Coil state


(28%


35%) (18%


26 %) (40%


50 %)


Secondary structures

Third Indo
-
French Bioinformatics Meeting

Secondary structures

b
-
strand

a
-
helix


h
-
bond

Random

coil

Protein

3D structure


distance

angles

Dihedral angles

distance

angles

Dihedral angles

Volume

Volume

Third Indo
-
French Bioinformatics Meeting

Different assignment methods :


Greer & Levitt (1977)




Distance


DSSP

(Kabsch & Sander, 1983).



H
-
bond

DEFINE

(Kundrot & Ridchards, 1988).


Distance

PCURVE

(Sklenar, Etchebest and Lavery, 1989). Axis

CONCENSUS

(Colloc’h, Etchebest
et al
., 1993). Mean

STRIDE

(Frishmann & Argos, 1995).


H
-
bond / dihedral

PSEA

(Labesse
et al
., 1997). Distance / angle

PROSS

(Srinivasan & Rose, 1999). Dihedral

XTLSSTR

(King &
Johnson

, 1999). Distance / angle

DSSPcont

(Andersen
et al
., 2001).



H
-
bond / dihedral

SECSTR

(Fodje &
Al
-
Karadaghi, 2002
).
H
-
bond / dihedral

VORO3D

(Dupuis
et al.
, 2004
).


Volume

KAKSI

(Martin
et al.
, 2005
).




Distance / dihedral

SEGNO
(Cubellis
et al.
, 2005
).


angle / multiple

Beta
-
Spider
(2005),
PALSSE

(2005),
Delaunay tessalation

(2005)


Secondary structures

Third Indo
-
French Bioinformatics Meeting

Secondary structures

Different approaches == does it make difference
?

Systematical comparison of secondary structure
assignment methods

1 non
-
redundant structural protein databank


assignments with all available (w/ software)
secondary structure assignment methods


Third Indo
-
French Bioinformatics Meeting

AA WDKYAQEVYEMNFGEKPEGDITQVNEKTIPDHDILCAGFP

DSSP3 CC
HHHHHHHHHHH
CCCCCCC
HHH
CCCCCCCCCC
EEEEE
CC

STRID3 CC
HHHHHHHHHHH
CCCCCCCCCCCCCCCCCCCC
EEEEE
CC

PSEA
EE
HHHHHHHHHH
CC
EEEEE
CCCCCCCCCCCCCC
EEEE
CCC

DEFINE
EE
HHHHHHHHHHHH
EEEEEE
HHHHHHHH
EEEEEEEEEEEE

PCURVE CC
HHHHHHHHHH
CC
EEEE
CCCCCCCCCCCCCC
EEEEEEEE

cons. ..**********.....................****...

PB bfklmmmmmmmnopacdedfklpcfklpccdfbdcddddf

[C93] CC
HHHHHHHHHHH
C
EEEE
CC
HHH
CCCCCCCCC
EEEEEEEE

XTLSS. C
HHHHHHHHHHHH
EEE
PP
C
NNNN
C
GGGG
PPP
C
EEEE
CC
PP

SECSTR CC
HHHHHHHHHHH
CCCC
E
CC
GGG
CCCCCCCCCC
EEEEE
CC

DSSP CC
HHHHHHHHHHH
S
CCC
B
CC
GGG
S
C
TTTS
CCC
S
EEEEE
CC

STRIDE CC
HHHHHHHHHHH
CCCC
B
C
TTTTTTTTTT
CCCC
EEEEE
CC



Example of secondary structure assignments for the
protein 10MH with DSSP, STRIDE, PSEA, DEFINE,
PCURVE, XTLSSTR and SECSTR.


Fourrier, Benros & de Brevern (2004)
BMC Bioinformatics
,
5
, 58.

Secondary structures

Third Indo
-
French Bioinformatics Meeting


PDB

87%
-
91%

KAKSI

83%
-
84%

XTLSSTR

78%
-
80%

76%

80%

PSEA

79
-
81%

83%

C
3

values

:


percentage

of

agreement

between

two

methods

DSSP STRIDE


SECSTR

95%

93%

92%

No consensus between
methods

J. Martin, G. Letellier,

J.F. Taly, A. Martin, A.G. de Brevern & J.F. Gibrat,
(2005)
BMC Structural Biology
,
5
, 17.

Hydrogen bonds

Secondary structures

Third Indo
-
French Bioinformatics Meeting

DSSP

STRI
DE

XTLSSTR

PSEA

DEFINE

SECSTR

KAKSI

SEGNO

DSSP

--

89.03

76.39

85.48

59.30

84.28

74.70

87.55

STRIDE

94.49

--

79.33

88.53

59.78

85.12

79.07

91.22

XTLSSTR

81.32

79.55

--

83.76

58.46

74.50

77.22

85.53

PSEA

69.27

67.53

63.81

--

59.75

65.77

73.26

76.16

DEFINE

40.80

39.87

38.71

50.52

--

38.56

46.38

44.51

SECSTR

90.73

86.20

75.17

87.50

60.11

--

77.14

87.03

KAKSI

71.78

72.40

68.53

88.12

59.40

67.68

--

77.40

SEGNO

84.11

82.77

77.30

89.83

59.29

77.64

76.74

--

Also consequences on
b
-
turns

A. Bornot & A.G. de Brevern
(2006)
Bioinformation
,
in press
.

And repercussion on analysis of molecular dynamics

Secondary structures

Third Indo
-
French Bioinformatics Meeting

Secondary structures

Structural disagreement == sequence
disagreement ?

Third Indo
-
French Bioinformatics Meeting

Secondary structures

Some major works on capping regions :

Richardson, J.S. and Richardson, D.C. (1988)
Amino acid preferences for specific locations at
the ends of alpha helices
Science
,
240
, 1648

1652.


Aurora, R. and Rose, G.D. (1998) Helix
capping
Protein Sci.
,
7
, 21

38.


Kumar, S. and Bansal, M. (1998) Dissecting
alpha
-
helices: position
-
specific analysis of
alpha
-
helices in globular proteins
Proteins
,
31
,
460

476.


Kruus et al., (2006)
NAR

Third Indo
-
French Bioinformatics Meeting

Secondary structures

E. Kruus, C. Thumfort, C. Tang, N.S. Wingreen (2005)
Gibbs sampling
and helix
-
cap motifs
. Nucleic Acids Res.
33
, 5343
-
53

Third Indo
-
French Bioinformatics Meeting

Secondary structures

Structural disagreement == sequence
disagreement ?

1 non
-
redundant structural protein databank

Assignment of secondary structure

Computation of occurrence for helix and sheet

central and
capping regions

Do the different assignments lead to significant different aa propensities ?

Third Indo
-
French Bioinformatics Meeting

10 non
-
redundant databanks

Stability of the databanks in terms of
secondary structure assignments

Third Indo
-
French Bioinformatics Meeting

DSSP (
wrongly
) taken as standard

DSSP CC
HHHHHHHHHH
CCCCCCCCCCCCCCCCCCCCC
EEEEE
CC

ANOTHER CC
HHHHHHHHHHH
CCCCCCCCCCCCCCCCCCC
EEEEEE
CC

Ccap
a
Frameshift +1

Secondary structures

Third Indo
-
French Bioinformatics Meeting

80%

Third Indo
-
French Bioinformatics Meeting

34%

7%

Third Indo
-
French Bioinformatics Meeting

Secondary structures

Over
-
representation

Under
-
representation

Analysis of the
"

informativity"

"

Third Indo
-
French Bioinformatics Meeting

Ncap
a

N3'

N2'

N1'

Ncap

N1

N2

DSSP

(+)

G

M

PSTND

WPE

AQDE

QDE

STRIDE

(+)

PG

M

PGSTND

P

ADE

QDE

SECSTR

(+)

PG

MP

STND

PE

ADE

AQDE

XTLSSTR

(+)

G

P

STND

PSTND

APE

QDE

PSEA

(+)

G

MG

GSTND

PE

AQDE

AQDE

DEFINE

(+)

D

PSD

AE

E

AE

KAKSI

(+)

P

G

P

PSTND

APE

ADE

SEGNO

(+)

G

MP

GSTND

WPE

AQDE

AQDE

PBs

(+)

PSTND

PD

DE

QDE

ILAF

LAQERK

DSSP

(
-
)

IVLMAFYQERK

GN

IVLFG

PGN

STRIDE

(
-
)

IVLAFYWQERK

GN

IVLFG

PGN

SECSTR

(
-
)

IVLMAFYERK

GN

IVLFG

PGN

XTLSSTR

(
-
)

VAEK

IVLAF

VGTN

IPGN

PSEA

(
-
)

IVLAFQERK

GN

IVLG

PGN

DEFINE

(
-
)

IV

N

PG

KAKSI

(
-
)

IVLMAFK

GN

IVG

SEGNO

(
-
)

IVLMAFYQERK

GTN

IVLG

PGN

PBs

(
-
)

IVLAFQERK

IVL

IVLC

IP

PGTD

PGS

Secondary structures

G

E

Third Indo
-
French Bioinformatics Meeting

a
-
helix

N cap

C cap

Third Indo
-
French Bioinformatics Meeting

Conclusion

Not

so

simple




A

structural

frameshift

does

not

imply

a


sequence


frameshift



Simplified

results

:

a
-
helix

:

Ncap



clear

frameshift

+
1

for

KAKSI

&

XTLSSTR
.




Ccap



clear

frameshift

-
1

for

KAKSI
.


b
-
strand

:

Ncap



only

STRIDE

&

SECSTR

are

equivalent

to

DSSP




mixed

of

-
1

/

+
1

frameshift

for

PSEA,

XTLSSTR,





SEGNO

and

KAKSI

[N
2


to

N
1

positions

really

distinct]
.


Ccap



STRIDE,

SECSTR

&

KAKSI

are

equivalent

to





DSSP,




Major

differences

with

positions

C
1


for

the

others
.

Third Indo
-
French Bioinformatics Meeting

Third Indo
-
French Bioinformatics Meeting

But …

See also Manoj Tyagi … soon and Bernard Offmann later

Third Indo
-
French Bioinformatics Meeting

A

structural

alphabet

is

a

set

(or

library)

of

small

prototypes

which

approximate

every

part

of

the

protein

structures
.



They

are

composed

by

a

limited

number

of

recurrent

structural

elements

of

proteins
.



The

associations

between

these

structural

"letters"

are

governed

by

logic

rules

and

form

the

words

of

protein

structures
.



A

structural

alphabet

has

no

a

priori

in

regards

to

the

secondary

structures,

i
.
e
.

it

is

not

a

categorization

of

the

coil

state
.

de

Brevern

A
.
G
.
,

Camproux

A
.
C
.
,

Hazout

S
.
,

Etchebest

C
.
,

and

Tuffery

P
.

(
2001
),

Protein

structural

alphabets
:

beyond

the

secondary

structure

description
,

Recent

Adv
.

In

Prot
.

Eng
.
,

1
:
319
-
331

.

Structural alphabet


de Brevern A.G.,
Etchebest C. & Hazout, S.
(2000),
Bayesian
probabilistic approach for
prediction backbone
structures in terms of
protein blocks
,
Proteins
,
41(3):271
-
287.


Third Indo
-
French Bioinformatics Meeting

PB
d

PB
m

Structural alphabet

Third Indo
-
French Bioinformatics Meeting

Structural alphabet

>153L
[
structural alphabet]

ZZmnopfklpccebjafklmmmnopabecjklmmmmmmmmm

mmmmmmmmmmmmmmmmnopafklmmmmmmmmnooolapgeh

iafkopagcjkopafklmccehjfklmklmmmmmmmmmmmm

mmmmmmmmmbcfklmmmmmmmmmmnomklmnbfklmmgoia

hilmmmmmmmmmmmmmmnoZZ


>153L
[secondary structures]

HHHHCCCCCCCCCCCCCCCHHHHHHCCCCCCCHHHHHHH

HHHHHHHHHHHHHHHHCCCCCCCHHHHHHHHHCCCCCCC

CCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHH

HHHHHHHHHCCCCCHHHHHHHHHHHCCCCEEEEECCCCC

CCCHHHHHHHHHHHHHHCCCC


Third Indo
-
French Bioinformatics Meeting



local protein structure approximation

(
Proteins

2000,
ISB

2005)



local structure prediction

(
Proteins

2000, 2005,
ISB

2004,
J Biosc

2006)




local protein structure approximation for longer fragments

(
Protein Science

2002,
Bioinformatics

2003)



local protein structure prediction for longer fragments

(
Proteins

2006a)




Superimposition of protein structures

(
Proteins
2006b,
Nucleic Acid Res

2006)




Molecular modeling


(
Bioch Biophys Acta 2005
)


Structural alphabet

Third Indo
-
French Bioinformatics Meeting

Series of 2 to 6 PBs between

two series of PBs
mm

and/or
dd

Structural alphabet

Follow a previous work:

Fourrier, Benros & de Brevern (2004)
BMC Bioinformatics
,
5
, 58.

Third Indo
-
French Bioinformatics Meeting

Bayesian prediction

Local folds (PB)

SYARMDIGTTHDDYA

RMDIGTTHDDYANDV

IHEVLAPGCLDAFPL

GRDTSVEGSEMVPGK

VIGLLEPMKKSMVPV

CVMLKSRGSRGHVRF

GRLGLGEGAEEKSIP

HLWVHQEGIYRDEYQ

LMWQLYPEERYMDNN

MWQLYPEERYMDNNS

QIAKYFDRKQIGNAM ...


Occurrence matrix


Amino

acids




Positions



New Sequence

SFITPVPGGVGPMTVFLEMDLT
NKNVIFVADKRKGGPGGIIANI
CVHTFNSWLDVEPRVAIEANKN
GAIWKLDLAIWKLDLGTLEAIE
WWDSHIGAFLDKPKMENAQGQG
NGLRYGLSSDAHTAVIGLPSGL
ESAVIGLPSGLESWSFFFAVYD
GHAGSQVAKY...


Baye
s

theorem

Prediction


Confidence index

Structural alphabet

Third Indo
-
French Bioinformatics Meeting

10 non
-
redundant structural protein databanks

Structural alphabet

1000 independent simulations to randomize the databanks

between learning and validation sets

Prediction rate :

number of times the true Protein Block is well predicted,

i.e. equivalent to
Q
3

for secondary structures.

Third Indo
-
French Bioinformatics Meeting

Third Indo
-
French Bioinformatics Meeting

Structural alphabet

Results :


The prediction rates does not increase greatly with the increase of maximum
identity rate,
i.e.

prediction with NR 20 is close to predictions with NR 90.


The over
-
training for short loops became very important, but the prediction rate
for the validation set is correlated to the over
-
training, i.e. better is the prediction
rate is lower is the over
-
training.


The prediction of short loops done per kind of short loops give a better
prediction rate both is terms of PBs and in terms of
rmsd

values.


No particular bias in the prediction (PB frequencies).


Third Indo
-
French Bioinformatics Meeting

Structural alphabet



More analyses on PB prediction results … (best and average)



Details of amino acid specificities.



http://condor.ebgm.jussieu.fr/~debrevern/LOOPS/


In a second step :



SVMs + PSSMs do they really improve the prediction



Third Indo
-
French Bioinformatics Meeting

… Manoj Tyagi

(who must finish to write his PhD …)

Pictures from IFBM 2004

B. Offmann

N. Srinivasan

Third Indo
-
French Bioinformatics Meeting

Pr. Serge Hazout

Pr. Catherine Etchebest


Patrick Fuchs,
Ass. Prof.




Cristina Benros, PhD (postdoc @IBCP)



Juliette Martin, PhD (postdoc @EBGM)


Aurélie Bornot, Master


Joelle Hochez,
Ms. Network

Acknowledgments (2)

http://www.ebgm.jussieu.fr/~debrevern/

Third Indo
-
French Bioinformatics Meeting

Thank you

Third Indo
-
French Bioinformatics Meeting

Third Indo
-
French Bioinformatics Meeting

Third Indo
-
French Bioinformatics Meeting

SUP. SLIDES

Third Indo
-
French Bioinformatics Meeting

Structure

Sequence

Function

Third Indo
-
French Bioinformatics Meeting

Axis 2
-

local protein
structures


See also Manoj Tyagi … soon and Bernard Offmann later

Third Indo
-
French Bioinformatics Meeting

Folding (11)

REDOX control (7)

TRR1, GLR1, GRX1



Transcription
factors (5)

YRR1, CIN5



Benomyl

Transmembrane
proteins (3)

YCF1, PCA1,
MRS4

wall (1)

ECM4

Permeases MFS (4)

ABC (2)

Response to the chemical stress
S. cerevisae

Lelandais
et al

(2005)
Molecular
& Cellular Biology

Axis 1
-

Transcriptome



y
i
-
2
-
f
i
-
1
-
y
i
-
2
-
f
i
-
y
i
-
f
i+1
-
y
i+1
-
f
i+2



fragment
f


(5 aa, 8 angles
dièdres)


y
i
-
2
-
f
i
-
1
-
y
i
-
2
-
f
i
-
y
i
-
f
i+1
-
y
i+1
-
f
i+2



fragment
f


(5 aa, 8 angles
dièdres)

Calcul de distance => 16 scores

Le plus faible

>153L

ZZmnopfklpccebjafklmmmnop
a
becjklmmmmmmmmm

mmmmmmmmmmmmmmmmnopafklmmmmmmmmnooolapgeh

iafkopagcjkopafklmccehjfklmklmmmmmmmmmmmm

mmmmmmmmmbcfklmmmmmmmmmmnomklmnbfklmmgoia

hilmmmmmmmmmmmmmmnoZZ


Exemple BP
a

Le plus faible

Third Indo
-
French Bioinformatics Meeting

35%

34%

12%

Third Indo
-
French Bioinformatics Meeting

58%

3%

Third Indo
-
French Bioinformatics Meeting

Secondary structures

Third Indo
-
French Bioinformatics Meeting

b
-
sheet

N cap

C cap

Third Indo
-
French Bioinformatics Meeting

Why am I here … ?

Paris

Saint
-
Denis

de la Réunion

Bangalore