An Introduction to Bioinformatics

fleagoldfishBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

90 views

An Introduction to Bioinformatics

Protein Structure Prediction

Aims



Understand the use of algorithms




Recognize different approaches




Understand the limitations

Objectives



Predict occurrence of aspects of structure




To select appropriate tools

Introduction


Structure has several levels


1


primary


2


secondary


3


tertiary


4


quaternary


1


primary


Amino acid sequence

NH
2
-
MRLSWYDPDFQARLTRSNSKCQGQLEV YLKDGWHMVC

SQSWGRSSKQWEDPSQASKVCQRLNCGVPLSLGPFLVTYTP

QSSIICYGQLGSFSNCSHSRNDMCHSLGLTCLE
-
COOH

2


secondary


Localized organisation

-
helices and

-
sheets

3


tertiary

Three
-
dimensional
organisation

4


quaternary

Multi protein
assembly

The problem…..



The best way is by X
-
ray crystallography or NMR etc…




Structure databases only hold about 10,000 + structures




Therefore devise programs to deduce structural solutions




Complex!

Secondary Structure prediction



Signal peptides




Intracellular targeting



Trans
-
membrane

-
helices





-
helices and

-
sheets



Super
-
secondary structure (motifs)



Signal peptides


Short N
-
terminal amino acid sequences


Direct to membrane


Cleaved after translocation



SignalP


Nobel Prize 1999
Günter Blobel

SignalP predicts
signal peptide
cleavage sites


Only first 50
-
70



Using neural
networks

Is the sequence a signal peptide?


# Measure Position Value Cutoff Conclusion


max. C

25 0.910 0.37

YES


max. Y

25 0.861 0.34

YES


max. S

12 0.960 0.88

YES


mean S

1
-
24 0.892 0.48

YES

# Most likely cleavage site between pos. 24 and 25: SRA
-
LE

Intracellular targeting


TargetP



Predict subcellular location of eukaryotic
protein



Presequences


Chloroplasts


Mitochondria


signal peptide

Transmembrane Domains


Lots of programs



TMHMM



-
helices


hydrophobic





helix topology


R or K +ve charge
cytoplasmic side


Hidden Markov Modelling

Paste as FASTA file


e.g Serotonin Receptor

Predicts the transmembrane
domains and orientation


-
helices and

-
sheets



GOR algorithim



Assigns each residue
to one conformational
state of

-
helix,
extended chain, reverse
turn or coil



64.4% accurate



Many other sites



most use multiple
alignments



-
helices and

-
sheets


10 20 30 40 50 60 70


| | | | | | |

MKFSWRTALLWSLPLLVVGFFFWQGSFGGADANLGSNTANTRMTYGRFLEYVDAGRITSVDLYENGRTAI

cccc
eeeeee
cccc
eeeeeeee
ccccccccccccccccccc
hhhh
cc
eeee
ccccc
eeeeee
ccccc
eee

VQVSDPEVDRTLRSRVDLPTNAPELIARLRDSNIRLDSHPVRNNGMVWGFVGNLIFPVLLIASLFFLFRR

ee
ccccccc
hhhh
ccccccccc
hhhhhhhhh
ccccccccc
ee
cccc
eeeee
cccccc
hhhhhhhhh
eee
c

SSNMPGGPGQAMNFGKSKARFQMDAKTGVMFDDVAGIDEAKEELQEVVTFLKQPERFTAVGAKIPKGVLL

cccccccccc
hhhh
cc
hhhhhhhh
ccc
eeee
cc
hhhhhhhhhhhhhhhhhh
cccc
hhhhh
cccccc
eeee

VGPPGTGKTLLAKAIAGEAGVPFFSISGSEFVEMFVGVGASRVRDLFKKAKENAPCLIFIDEIDAVGRQR

e
cccccc
hhhhhhhhh
cccccc
eee
cccccc
eeeeee
cccc
hhhhhhhhh
cccccc
eeee
cc
hhhh
cccc

GAGIGGGNDEREQTLNQLLTEMDGFEGNTGIIIIAATNRPDVLDSALMRPGRFDRQVMVDAPDYSGRKEI

ccccccccc
hhhhhhhhhhhhh
cccccccc
eeeeee
ccccc
hhhhhh
ccccccc
eeeee
cccccccc
hhh

LEVHARNKKLAPEVSIDSIARRTPGFSGADLANLLNEAAILTARRRKSAITLLEIDDAVDRVVAGMEGTP

hhhhhhhh
ccccccc
hhhh
ccccccccc
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
eee
cccccc

LVDSKSKRLIAYHEVGHAIVGTLLKDHDPVQKVTLIPRGQAQGLTWFTPNEEQGLTTKAQLMARIAGAMG

cccccccc
hhhhh
cccc
eeeeee
cccccccc
eeee
cccccccc
ee
ccccccccc
hhhhhhhhhhhhhhhh

GRAAEEEVFGDDEVTTGAGGDLQQVTEMARQMVTRFGMSNLGPISLESSGGEVFLGGGLMNRSEYSEEVA

hhhhhhh
cccccc
eee
ccccc
hhhhhhhhhhhhhhh
ccccccccccccccc
eeee
cccccccccc
hhhhh

TRIDAQVRQLAEQGHQMARKIVQEQREVVDRLVDLLIEKETIDGEEFRQIVAEYAEVPVKEQLIPQL

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
cccccc
hhhhhhhhhh
cccccccccccc


Super
-
secondary Structure


Secondary structure elements
combined into specific geometric
arrangements known as motifs




Beta corner

Super
-
secondary Structure


Several programs/websites for specific
domains e.g.


PAIRCOIL and MULTICOIL
-

detect coiled
-
coiled regions


regions separating domains


TRESPASSER
-

detects Leucine Zippers


Leu
-
X6
-
Leu
-
X6
-
Leu
-
X6
-
Leu protein interaction
domain


NPS@nalysis Helix
-
Turn
-
Helix


Protein interaction/DNA binding

Integrated stucture prediction


One stop shop!


Predict Protein at EBI


secondary structure


solvent accessibility

globular regions


transmembrane helices

coiled
-
coil regions


a multiple sequence alignment

P
roSite sequence motifs


low
-
complexity retions


ProDom domain assignments

Tertiary Structure Prediction


Homology modelling


Fold recognition


Threading


Model building

Protein sequence

(primary structure)

Database searching

for homologues

Homologue of

known structure

No homologue of

known structure

Comparative

modelling

3D
-
structure

Fold prediction,

ab initio

methods etc.

Homology Modelling


Method of choice following BLAST
search



SWISS

Model is a

good WWW

Interface

URL: http://www.expasy.ch/swissmod/SWISS
-
MODEL.html



Requires at least one sequence of known 3D
-
structure
with significant similarity to the target sequence.




Compare the target sequence with database
-

FastA and
BLAST.




Sequences with a FastA score 10.0 standard deviations
above the mean of the random scores or a P(N) lower than
10
-
5 (BLAST) considered for the model building




Restrict to those which share at least 30% residue
identity

Homology Modelling

Homology Modelling



Framework construction



compare atom positions
-

C

s




Build non
-
conserved loops




Complete backbone
-

add other atoms




Add side chains




Refine

Insulin like gene from
C.elegans

Red = Insulin

Blue = ILGF1

What if I have no homologue?

Ab initio

methods
-

Threading



Sequence of unknown structure




Thread through a through a sequence of known
structure




Move query sequence through residue by resudue
and compare computationally



include thermodynamic criteria, solvent accessibility,
secondary structure information




Computing intensive

http://www.cs.bgu.ac.il/~bioinbgu/form.html