doc

thingyoutstandingBiotechnology

Oct 1, 2013 (4 years and 10 days ago)

106 views

Bioinformatics Project: Visualization of
Conserved Regions of Proteins


30

Points


This exercise was created by
Wendy Shuttleworth at L
ewis
and Clark State College,
and Celeste Brown
at
the
University of Idaho. It was a
dapted for the NGBW by
Celeste Brown

and Mark Miller


Intended Audience:
Upper

division Biology
/Biochemistry

Courses.


Flash walkthrough of the laboratory:
www.ngbw.org/labs/myoglobin/myoglobin.htm





Goal
:
The goal of this pr
oject is to introduce biochemistry students to some easily
accessible protein data base tools and allow the students to explore these databases so
they get a glimpse of what is available. In this exercise
,

the protein myoglobin is explored
since it and hem
oglobin, were discussed in detail in the
previous class
lectures. Other
proteins with structures in PDB could be used

as well
. Multiple myoglobin sequences are
lined up against human myoglobin to determine which residues are conserved across all
species. T
hese conserved residues are then located on the human myoglobin structure.
This exercise uses the Next Generation Biology Workbench

(
http://www.ngbw.org
) for MSA
tools and for viewing structures. It
therefore only requir
es internet access

and

a
relatively
current java
installation (Java 1.4 or greater; this is
st
andard equipment)
. The exercise was
adapted for the NGBW by Dr. Mark Miller, UCSD.


Overview
:
One of the most powerful bioinformatics tools for the study of prote
ins is the
ability to search databases for similar sequences and to make multiple alignments of those
sequences. Increased computing power in the last few years has resulted in these
techniques being readily available to anyone with a PC and internet acces
s. Study of these
alignments

identifies
conserved regions
; these
can reveal much about
the structure:function
relationships in a given protein. Moreover, it can also be used to explore t
he relationship
between the organisms (to be addressed in an
other bioi
nformatics exercise).


In this exercise
,

the protein myoglobin will be explored as this, along with hemoglobin, has
been discussed in detail in the CHEM 481 lectures. Multiple myoglobin sequences will be
lined up against human myoglobin to determine which
residues are conserved across all
species. These conserved residues will then be located on the human myoglobin structure.











Procedure
:


Part 1 Alignment


1.

It is assumed here that you can access the NGBW site, create an account, and log in.
Log onto

San Diego Supercomputer Center Next Generation Biology Workbench by
typing in the URL:
http://www.ngbw.org



2.

Set up a free account by clicking the register button. Simply follow the instructions, be
sure to remember you
r user name and password.

If you have any trouble, you can
access flash presentations showing how to do this under the help section
(
http://www.ngbw.org/help.php
).


3.

The NGBW is based on the use of folders for d
ata and tasks. Once you login
, the folders
will appear
on the left hand side of the screen.

Click on
Create a New Folder

for this
project
, and then give your folder a name, and a description if you like
. The NGBW also
has flash help files to assist you in
undertaking the analyses described below.


4.

In this exercise, we will explore conserved residues in myoglobin, so first we will retrieve
some myoglobin sequences from
one of the public databases stored in the NGBW
. To
do this,
click on

the Data
icon attach
ed to the
folder

you just created
, and when the Data
Management pane appear
s, click the “
Search for Data
” button. When the Data Search
pane appears, type the exact string
human myoglobin
into the query window. Use the
two drop
-
down menus to specify that yo
u wish to search for a Protein (the NGBW calls
this the “Entity Type”) Sequence (and this is the “Data Type”). You will be presented with
a
third drop down menu with
a list of “Data Sets” that contain Protein Sequences.
Select
Swissprot and click “
Submit

Search
”.
A list of results will be displayed. Select:


P02144 MYG_HUMAN Myoglobin Homo sapiens 82 SWISSPROT


by checking in the box to the left of the sequence, and then click the “Save Results”
button. You will receive a
green
success message

at the

top of the page

when the
sequence is transferred to your data

area


5.

Now you want to search for related my
oglobin sequences. In this step you will

do this by
comparing the sequence of Human Myoglobin to all
Protein sequence
s (currently

5,324,740
)

in the
S
wissprot
database, and select the ones that are similar, in order of
relative similarity. Swissprot is a highly curated sequence database that doesn’t have
much of the “junk” sequences that are found in less heavily vetted DBs.
Comparisons
between sequence
s are
accomplished
using
algorithms that measure similarity. Today
you will use
BLAST (Bas
ic Local Alignment Search Tool), but there are other tools to do
this as well, such as FASTA. Each has its own algorithm for comparing sequences and
measuring similar
ity. BLAST happens to be one of the fastest ones.


6.

To run a BLAST search
to compare
a protein sequence

to a set of protein sequences,
one uses the BlastP tool. To do this in the NGBW, click on the Tasks folder for this
project. When the Task Management pa
ne opens, click the “
Create a New Task
” button.
When the Task Creation pane opens, enter some
descriptive
text in the
“Description”

box, and click the “
Set Description
” button
.


7.

Now click on the
“Select
Input Data”
button. Find the Myoglobin data file, and

select it
by checking the box on the left of the sequence, and then clicking the
“Select Data”

button at the bottom of the page. This will return you again to the task creation pane.


8.

Now click the
“Select Tool”

button. Under the toolkit pane, find and cl
ick on the
“BlastP”

tool. It is under the
“Protein Tools
” tab. This will return you to the Task
Creation pane. The most important part of creating a BLAST job is to specify the
Database you will be searching. To do this, click on the
“Set Parameters”

butto
n, and
when the Parameters pane opens, find the
“protein db”

dropdown, and satisfy yourself
that it is set to search
“SWISSPROT

.


9.

While you are on the parameters pane,

set
the
“Expect value”
to

“0.1”
(this redu
ces
the number of poor matches).

It is just b
elow the protein database

dropdown.

Now
expand the
Advanced Parameters
section by clicking the link.

Check under
Scoring
Options

to see that
the default setting for the M
atrix
(
-
M)
is “
Blosum62
”. The other
settings are left at the default values. Click
“Sa
ve Parameters”
(at the bottom of the
page).


10.

When this happens, click on the
“Save and Run Task”

button at the bottom of the
page. This will deploy your job, and return you to the Task Management pane. On this
pane you can watch your job progress (it will
take a few minutes to complete this job).
While you are waiting, you can begin creating the next job, or just click the
“Refresh
Tasks”

button until you see the text on the right
-
most column change from
“View

Status”

to

View Output
”.


11.

To view your results
, click the

View Output


button, and this will expose all the results
produced by your search. Click on the link to
“blast2.txt”,
and this will expose the list of
sequences with strong similarity to the Human Myoglobin sequences.
Glance briefly at
the hea
der of this file, which tells you what analysis you ran. Find how many sequences
you compared yours to, and confirm that the sequence you searched with is the one you
intended. These checks are part of good practice. Now scroll down slightly and you will
s
ee a list of the top matches, with the measure of similarity (e
-
value) on the right hand
column. Satisfy yourself that there is a steady decrease in similarity as you down the
list.


12.

To select individual sequences for additional exploration, please click
on the “
View”
link
at the top of the page. It is inside a black and white box. The program will display a list of
sequences identified by the BLAST search; the first sequence is the protein whose
sequence was used.


13.

Select a number of the sequences from t
he search. You can do this as follows: change
the number of sequences displayed from 20 (default) to 200, using the drop
-
down at the
top of the results pane. Choose sequences at random from the list, or go down and pick
your favorite critters. Stick to myo
globin (at the lower part of the list you will see
hemoglobins
,
etc), and bear in mind that you want to align a good cross section of
myoglobin molecules, so choose some of those with the least similarity to human
myoglobin.
Avoid selecting entries that s
ay “partial sequences” in their description.
Now
click on the “
Save Results
” button to transfer the data to your personal data area.

NOTE: If you select the Human Myoglobin sequence at the top of this list, it will be
listed in your data area twice. You mu
st be careful not to select it twice in the next
step, or the CLUSTALW program will fail.


14.

Now return to the Task Management pane by clicking on the “
Tasks
” folder. Create a
task, just as above, except when you choose your data, select all of the Myoglobi
n
sequences you saved

(but be careful not to save Human Myoglobin twice
,
CLUSTALW

will not accept sequences with the same name, so be careful not to load the same
sequence twice
.
)
, then click the “
Select Data
” button, and select “
CLUSTALW_P
” from
the
“Prot
ein
S
equence


tool list. Once the task is constructed, “
Save and Run Task
”.
CLUSTALW is a multiple sequence

alignment tool; each sequence is aligned with every
other sequence to give a best fit, then it aligns the most similar sequences together first
unti
l all sequences are aligned to each other.


15.

It may take a short while for the alignment to run especially if the server is busy. When
the job is complete, go to the Results pane, and eyeball your alignment by clicking on
the
“outfile.aln”

link. You can se
e changes,

l
arge areas that are identical
, and
sometimes blank spaces are inserted in some sequences to optimize the alignment
.
Note: if any of the sequences you chose are fragments you will need to remove these
from the alignment as the program will not s
how consensus in the regions where there
are no amino acids; I ran into trouble with a giant panda sequence that was not complete
so it messed up the alignment in the missing areas.


16.

You can color code the conserved residues to make them easier to view, us
ing the tool
Boxshade. Start by saving the alignment to your data area by clicking the
“Save to
Current Folder”

button. Give the data item a name, tell the application it is a

Protein



Sequence Alignment


in

Clustal


format.


17.

Create the
Boxshade
task ju
st as you created the others. Click the “
Return to Task
List”
button, then “
Create New Task
”, and follow the procedure for task creation,
selecting your alignment as
Input Data
, and
Boxshade
, which is under

Phylogeny/Alignment Tools”
,
as the

Tool
. Open th
e
Parameter

pane and you can
set the coloring yourself. Choose
HTML

output. Be sure to check the box for

Special
label for identical residues in all sequences
, and it is a good idea to check the boxes
for a consensus line, and a ruler. For colors, typicall
y blue is reserved for completely
conserved residues, green is for identical residues, yellow is for residues that are similar,
i.e. the chemical properties are similar, black on white is for residues that show no
consensus. However, play with these until
you feel you can glance at the data, and
understand basically what patterns are found there. There are asterisks at the bottom of
the completely conserved residues.


18.

Once the job is done, use
“View Results”
to see the output.
You should now
S
ave

file
as
“P
rotein” “Sequence Alignment” “HTML” (or “
Unknown
”), since the NGBW cannot use
this data for anything except display.
Now you are ready to move on to the second part
of this exercise where you will examine the human myoglobin structure and look to see
where

the completely conserved residues are found on this structure. You can start by
using the ruler on the Boxshade output to construct a list of all the residues that are
conserved across all myoglobins you chose.


Part 2 Protein Structure Visualization


1.

The

Protein Data Bank (PDB) is an international resource for protein structure models
that have been determined experimentally using X
-
ray crystallography and/or NMR.
Many people get protein structures by going to the PDB site at
www.rcsb.org
. You might
want to take a couple of minutes to explore the site; there is usually a “molecule of the
month” on the front page.


2.

In the NGBW, we access PDB data directly (via web services) with our structure viewing
tool called Sirius.
As a result, you need not leave the NGBW to analyze a protein
structure. For this exercise, we want to look at the conserved regions of a myoglobin
molecule in the context of the three dimensional structure of the molecule. To do this, go
to the Structure
Tools pane of the Toolkit, and click on the Sirius

link. This will open the
Sirius manual page. Clicking the Sirius link on the manual page will start Sirius. When
the Sirius tool loads, click “
File
”, then “
Load from PDB

.

Enter the
PDB ID

of the
molecule

of interest (2MM1 for human myoglobin) into the search box at the top of the
page. If you feel like playing around, go back to the PDB site and search on myoglobin.
Collect some IDs of other animal myoglobins you are interested in, and use as well.
Obviou
sly, since we are looking at invariant residues, it doesn’t really matter which one
you pick, but it is convenient to use human, since the numbering of the residues will be
the same as that in the Boxshade figure.


3.

Once you enter the ID number, the molecul
e will appear with all of its non
-
hydrogen
atoms shown. There are many options for how to display the molecule, and some of
these are illustrated in the Flash file at
http://www.ngbw.org/help/sirius_
bsics.htm

. Look
for the heme group: this is a non
-
protein component of myoglobin that gives it the red
color you know so well. It is composed of a large planar aromatic ring system, and with
an iron atom in the center, held in place by 4 pyrrole nitroge
n atoms.
Under

physiological
conditions, the iron is Fe(II), and in this state it carries molecular oxygen reversibly. In
this oxygenated state, it is bright red, but once outside the body, the iron goes to Fe(III),
which turns the pigment brown.


4.

Once the

molecule is displayed, you can also display the protein sequence in a second
window. To do this, click “
Tools
” in the top menu bar, then “
Sequence Viewer
”. This
opens a second window containing the sequence. Use this sequence viewer to locate
the conserv
ed residues. To highlight a residue of interest, just click on its corresponding
letter in the sequence viewer, and the entire residue will turn yellow in the structure
viewer.


5.

Start by paying attention to Histidine residues. For some of these, you can te
ll
immediately why they are important, and with others it will be harder to tell. The heme for
oxygen binding proteins is nearly always bound to the protein by a single His residue.
This residue is known as the “proximal” His. The oxygen molecule binds to
the iron on
the opposite face of the heme from the proximal ligand. Typically, but not always, a His
residue is positioned near the open iron coordination site, and it interacts with the bound
oxygen molecule

to control its reactivity. The

His is referred
to as the “distal” His.


6.

Play with your display of conserved residues to create a figure displaying as many
conserved residues as possible. The figure might get too busy if you add them all but put
in a number to see where they lie. Try to produce a nice f
inal image with a number of
them shown; hopefully you can find the proximal and distal histidines and place these on
your molecule. Manipulate the image to best show off the conserved residues. If you
prefer, prepare a couple of final images, one with e.g.

the histidines and a second one
with other conserved residues shown.


Report


Please turn in a report containing the following items in an easy to read format:


1. Your sequence alignment as a texshade or boxshade.


2. A list of the conserved residues fro
m your alignment.


3. One or more images showing the positions of the conserved residues on the human
myoglobin molecule.


4. Are the conserved residues clustered in any specific areas of the protein?