Medicago Genomics and Bioinformatics

raviolirookeryBiotechnology

Oct 2, 2013 (4 years and 1 month ago)

135 views

Protein Structure Analysis
-

I

Liangjiang (LJ) Wang

ljwang@ksu.edu



April 8, 2005

PLPTH 890 Introduction to Genomic Bioinformatics



Lecture 20

Outline


Basic concepts.



How protein structures are determined?


X
-
ray crystallography.


NMR spectroscopy.



Protein structure databases (PDB, MMDB).



Protein structure visualization (RasMol,
Cn3D, etc).



Protein structure classification (SCOP and
CATH).

Structural Bioinformatics


A subdiscipline of bioinformatics that
focuses on the representation, storage,
visualization, prediction and evaluation of
structural information.



References:


Baxevanis and Ouellette. 2005. Bioinformatics
-

A
practical guide to the analysis of genes and proteins.
3
rd

edition. Chapter 9 and part of chapter 8.



Pevsner. 2003. Bioinformatics and functional
genomics. Chapter 9.



Bourne and Weissig. 2003. Structural bioinformatics.

Protein Primary Structures


Amino acid sequence of a
polypeptide chain.




20 amino acids, each with a
different side chain (R).



Peptide units are building
blocks of protein structures.



The angle of rotation around
the N

C
α

bond is called phi
(

⤬ 慮a 瑨攠慮a汥l慲潵湤 瑨攠
C
α

C

bond from the same
C
α

atom is called psi (



(Brandon and

Tooze, 1998)


R


R

Protein Secondary Structures


Local substructures as a result of hydrogen
bond formation between neighboring amino
acids (backbone interactions).



The amino acid side chains affect secondary
structure formation.



Types of secondary structures:




helix
,




sheet,


Loop or random coil.



Helix


Most abundant secondary structure.



3.6 amino acids per turn, and hydrogen bond
formed between every fourth residue.



Often found on the surface of proteins.



Sheet


Hydrogen bonds formed between adjacent
polypeptide chains.



The chain directions can be same (parallel
sheet), opposite (anti
-
parallel), or mixed.

Loop or Coil


Regions between


桥h楣敳⁡湤n


獨s整s
.



Various lengths and 3
-
D configurations.



Often functionally significant (
e.g.
, part of
an active site).


(Brandon and Tooze, 1998)

The

active

site

of

open


/

-
扡牲敬

獴牵捴畲敳





a

捲敶楣e

潵瑳楤o

瑨t

捡牢潸c

敮摳



瑨t



獴牡湤s
.


Protein Tertiary Structure


The 3
-
D structure of a protein is assembled from
different secondary structure components.



Tertiary structure is determined primarily by
hydrophobic interactions between side chains.



Different classes of protein structures:

Hemoglobin (3HHB)

All


T cell CD8 (1CD8)

All


Thermolysin (7TLN)

Mixed

Protein Tertiary Structure (Cont’d)


Fold: a certain type of 3
-
D arrangement of
secondary structures
.



Protein structures evolves more slowly
than primary amino acid sequences.

E. coli cytochrome

b562 (256B)

Four
-
helix bundles

Human growth

hormone (1HUW)

Three
-
helix bundle

Drosophila engrailed

homeodomain (1ENH)

Protein Quaternary Structure


Two or more independent tertiary structures
are assembled into a larger protein complex.



Important for understanding protein
-
protein
interactions.

E. coli

ribosome

(1ML5)

Horse spleen ferritin (1IES)

Biological Knowledge from Structures

(Bourne, 2004)

X
-
Ray Crystallography


Basic steps:


Advantages:


High
-
resolution structures.


Large protein complexes or membrane proteins.



Disadvantages:


Molecules in a solid
-
state (crystal) environment.


Requirement for crystals.

Gene

targets

Expression,

purification

Proteins

Crystallization

X
-
ray

diffraction

Structure

solution

Nuclear Magnetic Resonance (NMR)


NMR reveals the neighborhood information of
atoms in a molecule, and the information can
be used to construct a 3
-
D model of the
molecule.



Advantages:


No requirement for crystals.


Proteins in a liquid state (near physiological state).



Disadvantages:


Limited by molecule size (up to 30 kD).


Membrane proteins may not be studied.


Inherently less precise than X
-
ray crystallography.


Protein Data Bank (PDB)


The primary repository for protein structures.




Established in 1971 (the first bioinformatics
database, set up with 7 protein structures).



Contains
30,179 structures by March 22, 2005.



Supports services for structure submission,
search, retrieval, and visualization.



Search options:


SearchLite: PDB ID and key word search.


SearchFields: advanced search.

(PDB can be accessed at
http://www.rcsb.org/pdb/
)

PDB Content Growth

structures

year

Last updated: 06
-
Mar
-
2005

2005

1972

30,000


5,000

Access to Structures through NCBI


MMDB (Molecular Modeling Database):


Structures obtained from PDB.


Data in NCBI’s ASN.1 format.


Integrated into NCBI’s Entrez system.



Cn3D (“see in 3D”): NCBI’s 3
-
D protein
structure viewer
.



VAST (Vector Alignment Search Tool):

for
direct comparison of 3
-
D protein structures.

(NCBI at
http://www.ncbi.nlm.nih.gov/
)

Ramachandran Plot



sheet



桥汩x

PHI

PSI

Used to assess
the quality of
structures.


Good structures


tight clustering
patterns.

Thioredoxin

(2TRX)

(Baxevanis and Ouellette, 2005)

3
-
D Visualization Tool
-

RasMol


An open source software package, and the
most popular tool for viewing 3
-
D structures.



RasMol represented a major break
-
through in
software
-
driven 3
-
D structure visualization.



Structure file formats supported by RasMol:


PDB file format: outdated but human
-
readable.


mmCIF: a new and robust data representation,
but supported by few software tools.



RasTop: provides
a user
-
friendly graphical
interface to RasMol. RasTop is available at
http://www.geneinfinity.org/rastop/
.

Cn3D: NCBI’s Structure Viewer


Cn3D (“see in 3D”): allows
interactive
exploration of 3
-
D structures, sequences
and alignments.



Can be used to produce high
-
quality
molecular images.



Limitation: only accepts structure files in
NCBI’s ASN.1 format (from MMDB).



Cn3D is available at
http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
.

Other 3
-
D Visualization Tools


Chime: a Netscape plug
-
in for 3
-
D structure
visualization; based on RasMol source code.



Protein Explorer (
http://
www.proteinexplorer.org/
):


A Chime
-
based software package.


Particularly user friendly and feature
-
rich.



Swiss
-
Pdb Viewer (Deep View, available at
http://us.expasy.org/spdbv/
):


Probably the most powerful, freely available
molecular modeling and visualization package.


Supports homology modeling, site
-
directed
mutagenesis, structure superposition, etc.

Protein Structure Comparison


Why is structure comparison important?


To understand structure
-
function relationship.


To study the evolution of many key proteins
(structure is more conserved than sequence).



Comparing 3
-
D structures is much more
difficult than sequence comparison.



Protein structure classification:


SCOP: Structure Classification Of Proteins.


CATH: Class, Architecture, Topology and
Homology.



Protein structure alignment: DALI and VAST.

SCOP


SCOP is based on expert definition of protein
structural similarities, and is manually curated.



Classification hierarchy:




Class


Fold


Superfamily


Family



SCOP has 7 major classes:
all

Ⱐ慬氠



/



+


multi
-
domain proteins (


慮搠

⤬敭扲慮攠慮搠
cell surface proteins, and small proteins.



Domain is the base unit of the SCOP hierarchy,
and proteins with multiple domains may
appear at different places in the hierarchy.



SCOP at
http://scop.mrc
-
lmb.cam.ac.uk/scop/
.

An Example
of the SCOP
Hierarchy

(Bourne, 2004)


SCOP fold definition:


Same major
secondary structures.


Same arrangement.


Same topology.

CATH


Classification hierarchy:



Class (C)


Architecture (A)


Topology (T)





Homologous superfamily (H)



Based on secondary structure content (for C),
literature (for A), structure connectivity and
general shape (for T, using the SSAP
algorithm), and sequence similarity (for H).



Multi
-
domain proteins are partitioned into their
constituent domains before classification.



CATH at
http://www.biochem.ucl.ac.uk/bsm/cath/
.

An Example
of the CATH
Hierarchy

(Pevsner, 2003)


CATH classes:


mainly

.


mainly

.


mixed


慮搠

.


Few secondary
structures.

Summary


Protein structures are important for
addressing many biological questions.



Protein Data Bank (PDB) is the primary
repository for protein structures.



Powerful software tools (
e.g.
, RasMol) are
available for viewing 3
-
D protein structures.



SCOP and CATH are two manually curated
databases for structure classification.



Next: structure alignment and prediction.