Bioinformatics Tools

underlingbuddhaBiotechnology

Oct 2, 2013 (4 years and 1 month ago)

131 views

Proteins Secondary

Structure Predictions


Structural Bioinformatics

2

In 12.12.2012 there were

80,480
protein structures in the protein
structure database.

Great increase but still a magnitude lower then
the total number
of

protein sequences in Uniprot (
over 500,000
)

Was solved in
1958

by Max Perutz John Kendrew of Cambridge University.


(Won the 1962 and
Nobel

Prize in Chemistry
)

The first high resolution structure of a protein
-
myoglobin

3


Predicting the
three dimensional structure
from sequence
of a protein is very hard

(some times impossible)


However we can predict with relative high
precision the
secondary structure

MERFGYTRAANCEAP….

What can we do to bridge the gap??

What do we mean by


Secondary Structure
?

Secondary structure are the building blocks of
the protein structure:

=

5

What do we mean by


Secondary Structure
?

Secondary structure is usually divided into
three categories:

Alpha helix

Beta strand (sheet)

Anything else


turn/loop

6

3.6
residues

5.6
Å

Alpha Helix
:
Pauling (
1951
)



A consecutive stretch of
5
-
40
amino
acids (average
10
).





A right
-
handed spiral conformation.




3.6
amino acids per turn.




Stabilized by Hydrogen bonds


7

Beta Strand
:
Pauling and Corey (
1951
)

> An extended polypeptide chains


is called
β


strand


(consists of
5
-
10
amino acids

> The chains are connected together


by Hydrogen bonds to form b
-
sheet

β

-
strand

β

-
sheet

8

Loops



Connect the secondary
structure elements
(alpha helix and beta
strands).



Have various length
and shapes.




9

The different secondary structures are
combined together to form the

Tertiary Structure of the Proteins

10

RBP

Globin

Tertiary

Secondary

?

?

?

11

How do the (secondary and tertiary)
structures relate to the primary
protein sequence??

12


-
Early experiments have shown that the
sequence of the protein is sufficient to
determine its structure (Anfisen)


-

Protein structure is more conserved than
protein sequence and more closely related
to function.



SEQUENCE

13

How
(CAN)

Different Amino Acid
Sequence Determine Similar Protein
Structure ??

Lesk and Chothia
1980

14

The Globin Family

15

Different sequences can result in similar structures


1
ecd

2
hhd

16


We can learn about the important features
which determine structure and function by
comparing the sequences and structures ?


17

The Globin Family

18

Why is Proline
36
conserved in all the globin family ?

19

Where are the gaps??

The gaps in the pairwise alignment are mapped to the loop regions

20

How are remote homologs related in terms of their structure?

retinol
-
binding

protein

odorant
-
binding

protein

apolipoprotein D

b
-
lactoglobulin

RBD

21

PSI
-
BLAST alignment of RBP and
b
-
污捴潧汯扵汩渺 楴敲慴e潮o
3

Score =
159
bits (
404
), Expect =
1
e
-
38

Identities =
41
/
170
(
24
%), Positives =
69
/
170
(
40
%), Gaps =
19
/
170
(
11
%)


Query:
3
WVWALLLLAAWAAAERD
--------
CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ
54


V L+ LA A + S V+ENFD ++ G WY + K

Sbjct:
1
MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE
-
KG
59


Query:
55
DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ
114


+ I A +S+ E G + K V + ++ +PAK +++++ +

Sbjct:
60
NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE
--
AKQSNVSEPAKLEVQFFPL
-----

112


Query:
115
KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA
164


+WI+ TDY+ YA+ YSC + ++ R+P LPPE

Sbjct:
113
MPPAPYWILATDYENYALVYSCTTFFWL
--
FHVDFFWILGRNPY
-
LPPET
159


22

The Retinol Binding Protein

b
-
lactoglobulin

Secondary Structure Prediction


Given a primary sequence

ADSGHYRFASGFTYKKMNCTEAA

what secondary structure will it adopt

(alpha helix, beta strand or random coil) ?


23

24

Secondary Structure Prediction
Methods


Statistical methods


Based on amino acid frequencies


Machine learning methods


SVM , Neural networks


HMM (Hidden Markov Model)


25

Chou and Fasman (
1974
)

Name P(a) P(b) P(turn)

Alanine
142 83 66

Arginine
98 93 95

Aspartic Acid
101 54 146

Asparagine
67 89 156

Cysteine
70 119 119

Glutamic Acid
151 037 74

Glutamine
111 110 98

Glycine
57 75 156

Histidine
100 87 95

Isoleucine
108 160 47

Leucine
121 130 59

Lysine
114 74 101

Methionine
145 105 60

Phenylalanine
113 138 60

Proline
57 55 152

Serine
77 75 143

Threonine
83 119 96

Tryptophan
108 137 96

Tyrosine
69 147 114

Valine
106 170 50

The propensity of an amino
acid to be part of a certain
secondary structure (e.g.


Proline has a
low

propensity of being in an
alpha helix or beta sheet


breaker
)

Success rate of
50
%


26

Secondary Structure Method
Improvements

‘Sliding window’ approach


Most alpha helices are ~
12
residues long

Most beta strands are ~
6
residues long


Look at all windows of size
6
/
12


Calculate a score for each window. If >threshold


predict this is an alpha helix/beta sheet

TGTAGPOLKCHIQWMLPLKK

27

Improvements since
1980
’s


Adding information from conservation in
MSA


Smarter algorithms (e.g. Machine learning,
HMM).

Success
-
>
75
%
-
80
%

28

Machine learning approach for predicting
Secondary Structure (PHD, PSIpred)

Step
1
:

Generating a multiple sequence
alignment


Query

SwissProt

Query

Subject

Subject

Subject

Subject

29

Step
2
:

Additional sequences are added using a
profile.
We end up with a MSA which
represents the protein family.

Query

seed

Query

Subject

Subject

Subject

Subject

MSA

30


The sequence profile of the protein family
is compared (by machine learning
methods) to sequences with known
secondary structure
.

Query

seed

Query

Subject

Subject

Subject

Subject

MSA

Machine

Learning

Approach

Known


structures

Step
3
:

31


HMM enables us to calculate the
probability of assigning a sequence to a
secondary structure

TGTAGPOLKCHIQWML


HHHHHHHLLLLBBBBB

p = ?

HMM approach for predicting

Secondary Structure

32

The probability of observing a residue which belongs to an
α
-
helix followed by a residue belonging to a turn =
0.15

The
probability of
observing
Alanine as
part of a
β
-
sheet

Table built according to large database of known secondary
structures

α
-
helix
followed by
α
-
helix

Beginning
with an
α
-
helix

33


Example

What is the probability that the sequence TGQ
will be in a helical structure??

TGQ

HHH

p =
0.45
x
0.041
x
0.8
x
0.028
x

0.8
x
0.0635
=

0.0020995


What can we learn from secondary structure
predictions??

Mad Cow Disease

PrP
c

to
PrP
sc

PRP
c

PRP
sc

NMR

MODEL