proteins – one structure(?), multiple functions Glyceraldehyde 3

gooseliverΒιοτεχνολογία

22 Οκτ 2013 (πριν από 3 χρόνια και 1 μήνα)

76 εμφανίσεις

PREDICTING PROTEIN STRUCTURE
AND BEYOND ….

P. V. Balaji

Biotechnology Center

I.I.T., Bombay

Organization of the talk

1. Why predict the structure?

2. Methods for structure prediction

3. What next?

Genome Size is not Proportional to
the Complexity of the Organism

Size of the Genome

Complexity

Molecular Logic of Life is Same

Biochemically
,
all things living



animals, plants, bacteria, viruses, etc.


are remarkably similar

English



26
-
Letter alphabet



Only one grammar



Extremely diverse literature

Genome



4
-
Letter alphabet



Only one grammar



Extremely diverse organisms

Genome Sequencing and Analysis:
One of the Key Steps in Deciphering
the Logic of Life

Even minute details have to be analyzed

Hang him
,

not let him go

Hang him not
,

let him go

Humans: NeuN
Ac

Chimpanzees: NeuN
Gc


CH
3


CH
2
OH

Innovations in Technology Have Made
Genome Sequencing a Routine Affair

Genome sequencing

Completed: ~70 organisms

In the pipeline: Several more

“ … it is unlikely that the base sequence of more
than a few percent of such a complex DNA will
ever be determined …”

C W Schmid & W R Jelinek, Science, June 1982

One Aspect of Genome Sequence Analysis
is to Assign Functions to Proteins

(Reverse Genetics)

Proteins are workhorses of the cell

Are involved in every aspect of living systems

Function of a Protein can be Defined
at Different Levels

Example: Lysozyme

Biochemical level
: Hydrolyzes C

O bond

Physiological level
: Breaks down the cell wall

Cellular level
: Defense against infection

Different Analysis Tools Provide
Functions at Different Levels

Hallmark of Proteins: Specificity

Know exactly which small molecule (ligand)
they should bind to or interact with

Also know which part of a macromolecule
they should bind to

Origin of Specificity

1ruv.pdb

Function is
critically
dependent
on
structure

Structure

Structure


Key to Dissect Function

Interaction
Interfaces

Crystal
Packing

Functional
Oligomerization

Location of Mutants
Conserved Residues
SNPs

Evolutionary
Relationships

Fold

Relative
Juxtaposition

Catalytic Clusters

Motifs

Catalytic Mechanism

Clefts
(active sites)

Antigenic Sites,
surface patches

Surface Shape
& Charge

Dynamics
(breathing)

Christian B. Anfinsen
: Nobel Prize in Chemistry (1972)

1
KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHES

LADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTT

QANKHIIVACEGNPYVPVHFDASV
124

Sequence Determines Structure

1ruv.pdb

Sequence

Structure

Function

How Does Sequence Specify Structure?

Structure has to be determined experimentally

The Protein Folding Problem

(second half of the genetic code)

?

Functional
Genomics

Experimental Methods of
Structure Determination

Provides a static picture

X
-
ray crystallography

Obtaining crystals that diffract

Solubilization of the over
-
expressed protein

Nuclear Magnetic Resonance spectroscopy

Provides a Dynamic picture

Size
-
limit is a major factor

Solubilization of the over
-
expressed protein

Annotated proteins in the databank: ~ 100,000

Limitations of Experimental Methods:
Consequences

Proteins with known structure: ~5,000 !

Total number including ORFs: ~ 700,000

ORF, or
O
pen
R
eading
F
rame, is a region of genome that
codes for a protein

Have been identified by whole genome sequencing efforts

ORFs with no known function are termed
orphan

Dataset for
analysis

Structural Biology Consortia:

Brute Force Approach Towards
Structure Elucidation

Employ battalions of Ph.Ds & Post
-
doctorals

Aim to solve about 400 structures a year

Large
-
scale expression & crystallization attempts

+



Basic strategies remain the same

No (known) new tricks

*

Enhances the statistical base for inferring
sequence


structure relationships

“Unrelenting” ones will be ignored

?

KQFTKCELSQNLYDIDGYGRIALPELICTMF
HTSGYDTQAIVENDESTEYGLFQISNALWCK
SSQSPQSRNICDITCDKFLDDDITDDIMCAK
KILDIKGIDYWIAHKALCTEKLEQWLCEKE

Predicting Protein Structure:

1. Comparative Modeling

(formerly, homology modeling)

Use as template
& model

8lyz

1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK
FESNFNTQATNRNTDGSTDYGILQINSRWWCND
GRTPGSRNLCNIPCSALLSSDITASVNCAKKIV
SDGNGMNAWVAWRNRCKGTDVQAWIRGCRL

Share
Similar
Sequence

Homologous

Structure is much more conserved than
sequence during evolution

Comparative Modeling

Basis

*

Higher the similarity, higher is the
confidence in the modeled structure

*

Limited applicability

A large number of proteins and ORFs have no
similarity to proteins with known structure

*

Predicting Protein Structure:

Alternative Methods

Threading or Fold Recognition

Both these methods depend heavily on
the analysis of known protein structures

*

Ab initio

In addition, establishing sequence


獴s畣瑵牥r
牥污瑩潮獨楰 楳⁡汳i 業灯牴慮r

*

Input from people trained in statistics,
pattern recognition and related areas of
computer science is very critical

*

Statistical Analysis of Protein Structures:
Microenvironment Characterization

Atom based properties

Residue based properties

Chemical group

Secondary structure

Other properties

Type, Hydrophobicity, Charge

Type, Hydrophobicity

Hydroxyl, Amide, Carbonyl, etc.

a
-
Helix,
b
-
却牡湤ⰠT畲測⁌潯

噄W 癯汵v攬eB
-
晡f瑯tⰠ
Mobility, Solvent accessibility

Describe structures at multiple levels of detail using
a comprehensive set of properties

Predicting Protein Structure:

2. Threading or Fold Recognition

Basis

It is estimated there are only around 1000 to
10 000 stable folds in nature

*

Irrespective of the amino acid sequence, a
protein has to adopt one of these folds

*

Fold recognition is essentially finding the best
fit of a sequence to a set of candidate folds

*

Select the best sequence
-
fold alignment using a
fitness scoring function

*

NP
-
complete problem

*

Fold of a Protein

Refers to the spatial arrangement of its secondary
structural elements (
a
-
helices and
b
-
strands)

1l45.pdb

4bcl.pdb

1mbl.pdb

a
/
b
-
barrel

b
-
barrel

a
/
b
-
sandwich

Threading: Basic Strategy

Sequence

Template

Spatial
Interactions

dhgakdflsdfjaslfkjsdlfjsdfjasd

Library
of folds

Query

Scoring & selection

Predicting Protein Structure:

3.
Ab Initio

Methods

Sequence

Secondary
structure

Prediction

Tertiary
structure

Low energy
structures

Predicted
structure

Energy
Minimization

Validation

Mean field
potentials

Predicting the structure of such proteins
is an entirely different challenge

1a6g.pdb

Small molecules and/or metal ions
are an integral part of certain proteins

Proof of the Pudding: CASP Meetings

Community Wide Experiment on the
C
ritical
A
ssessment
of Techniques for Protein
S
tructure
P
rediction


4

Predictions; not Post
-
dictions

Easy and medium targets: ~100% success

Hard targets: ~50% success

Significant increase from CASP3

OK, I can predict the structure correctly! is that it?

Strict structure


function correlation exists only for a
subset of proteins

Some folds (ferredoxin, TIM barrel, …) are very
popular


several protein families, with diverse
functions, adopt these folds

Well, no!!

Detailed biochemical characterization is required

Despite high similarity in sequence and structure, may
act on different substrates (hence different functions)


due to subtle changes in active site (
b
1

3
-
GalT and
b
1

3
-
GlcNAcT)

Similar structure, mutually exclusive function: Lysozyme
&
a
-
污捴慬扵浩b

Inferring Function from Structure: Caveats

Same function, completely different structures: Carbonic
anhydrases from
M. thermophila

and mouse

8lyz.pdb, 1alc.pdb

1thj.pdb

1dmx.pdb

“Moonlighting” proteins


one structure(?), multiple functions

Glyceraldehyde 3
-
phosphate dehydrogenase

Glycolysis

Binding protein for plasmin, fibronectin and lysozyme

Transcriptional control of gene expression, DNA replication
and repair

Flocculation

Gal1p


Kinase as well as regulator of Gal
-
gene expression

Gal3p


70% similar; does not have kinase activity

Same fold, different oligomerization

Dimerization

Tetramerization

ConA

ConA

PNA

PNA,
GSIV

Ligand Induced Conformational
Changes are Quite Common

Binding of first substrate
redefines

the active site and creates
the binding pocket for the second substrate and the metal ion

Flexible loop

Before

After

Take Home Message

Predicting Protein Structure is a key
component of genome sequence analysis

Structure is a very important link in
deciphering the function

New tools are required? Or larger training
dataset is required?

Acknowledgement

Organizers for giving me this opportunity

Sujatha and Jayadeva Bhat for helping me put
together this talk

http://guitar.rockefeller.edu/modeller/modeller.html

Few Useful Links

http://www.biochem.ucl.ac.uk/bsm/cath
-
new/index.html

http://predictioncenter.llnl.gov/

http://insulin.brunel.ac.uk