Second bioinformatics lab:Exercise on disease:

moredwarfBiotechnology

Oct 1, 2013 (4 years and 10 days ago)

69 views


1

Second bioinformatics lab:

Exercise on disease

(developed
in part
by
Sarah C. R
. Elgin, Washington University)

It is well known that smoking leads to an increased risk for lung cancer,

but how
does genetics play into the risk? The transformation of a norma
l cell into

a cancerous cell
can result from many causes. In one model, factors that lead to

an increased rate of
mutation in DNA increases the chances that a protooncogene

(the normal form of a gene)
will be mutated into an oncogene

(a cancer
-
causing gene
)
, causing a normal cell to be

transformed into a cancerous cell.

In this module, you will examine one proto
-
oncogene, K
-
Ras that has been

associated with many cancers, including lung cancer. You will be examining a

cDNA
sequence for K
-
Ras that contains a

mutation.
Y
ou will analyze the mutant K
-
Ras protein using

the bioinformatics tools
presented in lab. You will investigate the
mutation, and find out what is known, if
anything, about

the biological impact of the
mutation.

See our textbook for
a
discussion

of
Ras: pages 407, 412, and 596
-
597 (6
th

ed).

Summarizing,
insulin or
a growth factor
binds to a receptor, two bound receptors
come together (dimerization) and activate
each other. Each growth factor receptor is a
tyrosine kinase that puts a phosphate (
i.e.,
phosphorylate) on tyrosines (a particular
amino acid) located on the other receptor’s
tail. Thus, they put phosphate on each
other to activate each other. Next, adaptor
proteins
(GRB2
, which has an SH2 domain
that binds to phosphorylated tyrosines
,
and
SOS, which is a guanine nucleotide
exchange factor or GEF)
come in and are
activated by binding to phosphorylated
tyrosines located on the receptor.

The

adapator proteins then activate Ras. Ras is
off when GDP is bound, but

on


when a
new GTP comes

in and displaces the old
GDP.

Ras activates Raf, which activates
MEK, which activates map kinase (one map
kinase type is ERK
-

the kinase that we
studied earlier).

ERK turns on transcription
factors that turn on certain genes required
for cell division.

Fig. 19
-
41

is shown

(see
also
F
ig. 14
-
18)
.

In a cancerous cell, the
mutant Ras cannot be shut off and the cell
will divide and divide to form a tumor.


2

Working with Primary Protein Structure Information

Using the tools from the last lab, we
will search fo
r your
k
-
Ras
gene in the

Gene
database.
Gene
will contain the
RefSeq
sequence for your protein, which

you will
download in FASTA format. FASTA format is defined in your Glossary.

Be sure to
review th
e FASTA
definition before begin your
work
.
QUESTIONS ARE

POSTED AT
THE END, TAKE GOOD NOTES, TRY TO ANSWER SOME OF THEM AS YOU GO
ALONG (or else you will have to redo some of this work). Ask
me about any answers to
these
questions
(
if you do not know the answer and have made a good attempt
)
. S
ome
of
these
que
stions may be on the final exam.

You will

continue to learn about your protein using the
SwissProt

database. The

SwissProt database is a database maintained by the Swiss Bioinformatics

Institute and
contains entries for thousands of proteins. You can searc
h for the

k
-
Ras protein that
you
are studying by using the gene name given in Gene. The SwissProt

entry contains some
of the same information that you found in Gene, but also

contains a lot information about
the protein sequence, structure, and function th
at

is summarized in a short, easy
-
to
-
read
format.


The ultimate goal for today’s lab is to create a multiple sequence

alignment for
your protein using
Clustal

W
. You will use this alignment to identify

the protein
mutation, to observe regions of high seque
nce conservation, and to

make an evolutionary
tree for the evolution of Ras
. The protein mutation is important to

identify since it is
important for
the
understanding
of
the

link between the protein
structure and cancer
. The
regions of high

sequence conser
vation are important because they often correspond to
regions

in the protein that are important to the protein’s function

(e.g., active site,
regulatory region)
.

Part 1


Obtaining the basics: Getting sequence information and viewing

the SwissProt and Gen
Bank entries for your protein
.

Directions: Follow this guide
sheet and answer the questions at the end of this
file
.

Translating your patient’s cDNA

1.

Below is t
he mutant cDNA sequence

for a patient with a cancer caused by a mutant k
-
Ras protein
-
highlig
ht
all the sequence
and then hit control
-
C to copy to clipboard
.
BTW, we know that this is DNA since there are no U bases (only RNA has U, instead
DNA uses T).
Also, we always report the coding strand sequence (not the template
strand that actually helps

make the mRNA).
Furthermore
, this cDNA came from RT
PCR since the sequence ends in polyA (polyA tails found in mRNA).

GGCCGCGGCGGCGGAGGCAGCAGCGGCGGCGGCAGTGGCGGCGGCGAAGGTGGCG

GCGGCTCGGCCAGTACTCCCGGCCCCCGCCATTTCGGACTGGGAGCGAGCGCGGC

GCAGGCACTGAAGGCGGCGGCGGG
GCCAGAGGCTCAGCGGCTCCCAGGTGCGGGA

GAGAGGCCTGCTGAAAATGACTGAATATAAACTTGTGGTAGTTGGAGCTTGTGGC

GTAGGCAAGAGTGCCTTGACGATACAGCTAATTCAGAATCATTTTGTGGACGAAT

ATGATCCAACAATAGAGGATTCCTACAGGAAGCAAGTAGTAATTGATGGAGAAAC

CTGTCTCTTGGATATTCTCGACACAGCAGGTCAAGAGGAGTACAGTGCAATGAGG

GACCAGTACATGAGGACTGGGGAGGGCTTTCTTTGTGTATTTGCCATAAATAATA

CTAAATCATTTGAAGATATTCACCATTATAGAGAACAAATTAAAAGAGTTAAGGA

CTCTGAAGATGTACCTATGGTCCTAGTAGGAAATAAATGTGATTTGCCTTCTAGA


3

ACAGTAGACACAAAACAGGCTCAGGACTTAGCAAGAAGTTATGGAATTCCTTTTA

TTGAAACATCAGCAAAGACAAGACAGGGTGTT
GATGATGCCTTCTATACATTAGT

TCGAGAAATTCGAAAACATAAAGAAAAGATGAGCAAAGATGGTAAAAAGAAGAAA

AAGAAGTCAAAGACAAAGTGTGTAATTATGTAAATACAATTTGTACTTTTTTCTT

AAGGCATACTAGTACAAGTGGTAATTTTTGTACATTACACTAAATTATTAGCATT

TGTTTTAGCATTACCTAATTTTTTTCCTGCTCCATGCAGACTGTTAGCTTTTACC

TTAAATGC
TTATTTTAAAATGACAGTGGAAGTTTTTTTTTCCTCTAAGTGCCAGT

ATTCCCAGAGTTTTGGTTTTTGAACTAGCAATGCCTGTGAAAAAGAAACTGAATA

CCTAAGATTTCTGTCTTGGGGTTTTTGGTGCATGCAGTTGATTACTTCTTATTTT

TCTTACCAATTGTGAATGTTGGTGTGAAACAAATTAATGAAGCTTTTGAATCATC

CCTATTCTGTGTTTTATCTAGTCACATAAATGGATTAATT
ACTAATTTCAGTTGA

GACCTTCTAATTGGTTTTTACTGAAACATTGAGGGAACACAAATTTATGGGCTTC

CTGATGATGATTCTTCTAGGCATCATGTCCTATAGTTTGTCATCCCTGATGAATG

TAAAGTTACACTGTTCACAAAGGTTTTGTCTCCTTTCCACTGCTATTAGTCATGG

TCACTCTCCCCAAAATATTATATTTTTTCTATAAAAAGAAAAAAATGGAAAAAAA

TTACAAGGCAATGGAA
ACTATTATAAGGCCATTTCCTTTTCACATTAGATAAATT

ACTATAAAGACTCCTAATAGCTTTTCCTGTTAAGGCAGACCCAGTATGAAATGGG

GATTATTATAGCAACCATTTTGGGGCTATATTTACATGCTACTAAATTTTTATAA

TAATTGAAAAGATTTTAACAAGTATAAAAAATTCTCATAGGAATTAAATGTAGTC

TCCCTGTGTCAGACTGCTCTTTCATAGTATAACTTTAAATCTTTTCTT
CAACTTG

AGTCTTTGAAGATAGTTTTAATTCTGCTTGTGACATTAAAAGATTATTTGGGCCA

GTTATAGCTTATTAGGTGTTGAAGAGACCAAGGTTGCAAGGCCAGGCCCTGTGTG

AACCTTTGAGCTTTCATAGAGAGTTTCACAGCATGGACTGTGTCCCCACGGTCAT

CCAGTGTTGTCATGCATTGGTTAGTCAAAATGGGGAGGGACTAGGGCAGTTTGGA

TAGCTCAACAAGATACAATCTCAC
TCTGTGGTGGTCCTGCTGACAAATCAAGAGC

ATTGCTTTTGTTTCTTAAGAAAACAAACTCTTTTTTAAAAATTACTTTTAAATAT

TAACTCAAAAGTTGAGATTTTGGGGTGGTGGTGTGCCAAGACATTAATTTTTTTT

TTAAACAATGAAGTGAAAAAGTTTTACAATCTCTAGGTTTGGCTAGTTCTCTTAA

CACTGGTTAAATTAACATTGCATAAACACTTTTCAAGTCTGATCCATATTTAATA

ATGCTTTAAAATAAAAATAAAAACAATCCTTTTGATAAATTTAAAATGTTACTTA

TTTTAAAATAAATGAAGTGAGATGGCATGGTGAGGTGAAAGTATCACTGGACTAG

GAAGAAGGTGACTTAGGTTCTAGATAGGTGTCTTTTAGGACTCTGATTTTGAGGA

CATCACTTACTATCCATTTCTTCATGTTAAAAGAAGTCATCTCAAACTCTTAGTT

TTTTTTTTTTACAACTATGTAATTTATATTCC
ATTTACATAAGGATACACTTATT

TGTCAAGCTCAGCACAATCTGTAAATTTTTAACCTATGTTACACCATCTTCAGTG

CCAGTCTTGGGCAAAATTGTGCAAGAGGTGAAGTTTATATTTGAATATCCATTCT

CGTTTTAGGACTCTTCTTCCATATTAGTGTCATCTTGCCTCCCTACCTTCCACAT

GCCCCATGACTTGATGCAGTTTTAATACTTGTAATTCCCCTAACCATAAGATTTA

CTGCTGCT
GTGGATATCTCCATGAAGTTTTCCCACTGAGTCACATCAGAAATGCC

CTACATCTTATTTCCTCAGGGCTCAAGAGAATCTGACAGATACCATAAAGGGATT

TGACCTAATCACTAATTTTCAGGTGGTGGCTGATGCTTTGAACATCTCTTTGCTG

CCCAATCCATTAGCGACAGTAGGATTTTTCAAACCTGGTATGAATAGACAGAACC

CTATCCAGTGGAAGGAGAATTTAATAAAGATAGTGCTGAA
AGAATTCCTTAGGTA

ATCTATAACTAGGACTACTCCTGGTAACAGTAATACATTCCATTGTTTTAGTAAC


4

CAGAAATCTTCATGCAATGAAAAATACTTTAATTCATGAAGCTTACTTTTTTTTT

TTGGTGTCAGAGTCTCGCTCTTGTCACCCAGGCTGGAATGCAGTGGCGCCATCTC

AGCTCACTGCAACCTCCATCTCCCAGGTTCAAGCGATTCTCGTGCCTCGGCCTCC

TGAGTAGCTGGGATTA
CAGGCGTGTGCCACTACACTCAACTAATTTTTGTATTTT

TAGGAGAGACGGGGTTTCACCCTGTTGGCCAGGCTGGTCTCGAACTCCTGACCTC

AAGTGATTCACCCACCTTGGCCTCATAAACCTGTTTTGCAGAACTCATTTATTCA

GCAAATATTTATTGAGTGCCTACCAGATGCCAGTCACCGCACAAGGCACTGGGTA

TATGGTATCCCCAAACAAGAGACATAATCCCGGTCCTTAGGTAGTGCT
AGTGTGG

TCTGTAATATCTTACTAAGGCCTTTGGTATACGACCCAGAGATAACACGATGCGT

ATTTTAGTTTTGCAAAGAAGGGGTTTGGTCTCTGTGCCAGCTCTATAATTGTTTT

GCTACGATTCCACTGAAACTCTTCGATCAAGCTACTTTATGTAAATCACTTCATT

GTTTTAAAGGAATAAACTTGATTATATTGTTTTTTTATTTGGCATAACTGTGATT

CTTTTAGGACAATTACTGTACACA
TTAAGGTGTATGTCAGATATTCATATTGACC

CAAATGTGTAATATTCCAGTTTTCTCTGCATAAGTAATTAAAATATACTTAAAAA

TTAATAGTTTTATCTGGGTACAAATAAACAGGTGCCTGAACTAGTTCACAGACAA

GGAAACTTCTATGTAAAAATCACTATGATTTCTGAATTGCTATGTGAAACTACAG

ATCTTTGGAACACTGTTTAGGTAGGGTGTTAAGACTTACACAGTACCTCGTTTCT

ACACAGAGAAAGAAATGGCCATACTTCAGGAACTGCAGTGCTTATGAGGGGATAT

TTAGGCCTCTTGAATTTTTGATGTAGATGGGCATTTTTTTAAGGTAGTGGTTAAT

TACCTTTATGTGAACTTTGAATGGTTTAACAAAAGATTTGTTTTTGTAGAGATTT

TAAAGGGGGAGAATTCTAGAAATAAATGTTACCTAATTATTACAGCCTTAAAGAC

AAAAATCCTTGTTGAAGTTTTTTTAAAAAAAG
CTAAATTACATAGACTTAGGCAT

TAACATGTTTGTGGAAGAATATAGCAGACGTATATTGTATCATTTGAGTGAATGT

TCCCAAGTAGGCATTCTAGGCTCTATTTAACTGAGTCACACTGCATAGGAATTTA

GAACCTAACTTTTATAGGTTATCAAAACTGTTGTCACCATTGCACAATTTTGTCC

TAATATATACATAGAAACTTTGTGGGGCATGTTAAGTTACAGTTTGCACAAGTTC

ATCTCATT
TGTATTCCATTGATTTTTTTTTTCTTCTAAACATTTTTTCTTCAAAC

AGTATATAACTTTTTTTAGGGGATTTTTTTTTAGACAGCAAAAACTATCTGAAGA

TTTCCATTTGTCAAAAAGTAATGATTTCTTGATAATTGTGTAGTAATGTTTTTTA

GAACCCAGCAGTTACCTTAAAGCTGAATTTATATTTAGTAACTTCTGTGTTAATA

CTGGATAGCATGAATTCTGCATTGAGAAACTGAATAGCTG
TCATAAAATGAAACT

TTCTTTCTAAAGAAAGATACTCACATGAGTTCTTGAAGAATAGTCATAACTAGAT

TAAGATCTGTGTTTTAGTTTAATAGTTTGAAGTGCCTGTTTGGGATAATGATAGG

TAATTTAGATGAATTTAGGGGAAAAAAAAGTTATCTGCAGATATGTTGAGGGCCC

ATCTCTCCCCCCACACCCCCACAGAGCTAACTGGGTTACAGTGTTTTATCCGAAA

GTTTCCAATTCCACTG
TCTTGTGTTTTCATGTTGAAAATACTTTTGCATTTTTCC

TTTGAGTGCCAATTTCTTACTAGTACTATTTCTTAATGTAACATGTTTACCTGGA

ATGTATTTTAACTATTTTTGTATAGTGTAAACTGAAACATGCACATTTTGTACAT

TGTGCTTTCTTTTGTGGGACATATGCAGTGTGATCCAGTTGTTTTCCATCATTTG

GTTGCGCTGACCTAGGAATGTTGGTCATATCAAACATTAAAAATGACC
ACTCTTT

TAATTGAAATTAACTTTTAAATGTTTATAGGAGTATGTGCTGTGAAGTGATCTAA

AATTTGTAATATTTTTGTCATGAACTGTACTACTCCTAATTATTGTAATGTAATA

AAAATAGTTACAGTGACAAAAAAAAAAAAAAA



5

2. Go to the
Sequence Manipulation Site

(
http://bioinfo
rmatics.org/sms/

).

We want to
get the amino acid sequence from this
cDNA
sequence from the cancer patient.
We will
compare th
is
amino acid sequence
with
DNA

from
a person
with the normal gene.


3. In the menu to the left, Click on “
show translation
” fou
nd under the heading “
DNA

figures
.


Paste the above sequence into the first box, and under this box, “Show the
translation for…” you want to click on the drop down box and click “reading frame 2.”


After hitting
submit
, a new window pops up and
it contai
ns the original nucleotide base sequence
with a one letter amino acid symbol. Note that
the amino acid is listed above the three bases and
that amino acid number 61 is M (stands for
methionine
-

see table to right from your text
)
-

this
is the actual beginn
ing of the protein

(all proteins
begin with methionine)
.

Highlight the web page
with the info and paste into a Word file
(remember to take the word file with you or send
to yourself by email

when done
).


4.
In the menu to the left, Click on “
Translate

fou
nd under the heading “
DNA analysis
”.

Clear
the search box, then paste your patient’s cDNA
sequence into the

search box. Choose a reading
frame from the pull
-
down menu.
Use “
Reading
Frame 2
” when translating the sequence at the
Sequence

Manipulation

Suite.

Click


Submit
.”


5. You should be able to find the sequence of your protein by finding the first

methionine
(M), then continuing until you see the first “*” which is a stop

codon. Copy the protein
sequence in that region, starting with the first “M”

and p
aste it into a word document.
Sav
e the results in the same word file that you have started.
Now you have saved the file
of the mutant protein sequence.


6. Using

Entrez
Gene


on the NCBI website

(
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene
)
, find the entry
for the protein you are

studying by searching with the protein name
-

note the
other
possible ways of searching
.
Cut and paste in the following gene na
me:
Homo sapiens
KRas2.


After

the search, you will find 5 entries, pick the one that is
“KR
AS

and from
Homo

sapiens
.
For this KRAS entry
, answer the following
:

a.
what is the
official symbol:

b.
no
te some other names (aliases):

c.
l
ocated on
which
chro
mosome (remember we have 23 pairs of
homologous
chromosomes
-

46 total with Mom giving you one of the homologous chrosomes, and Dad
giving you the other
):

d.
Gene ID:



6

7.
Click

on the highlighted KRAS and you open
Entrez Gene

for more info on the gene.
Rea
d the summery info

can you understand this information? When it says

Alternative splicing leads to variants encoding two isoforms that differ in the C
-
terminal
region” this means that one pre
-
mRNA is made from the gene, but the processing of the
pre
-
mRNA

can differ

(different sections are cut out)
. So two proteins
(isoform a and
isoform b)
result from the same gene/pre
-
mRNA.
In this case, isoforms are
different
proteins from the same gene.

Highlight all on this web page, hit control
-
c to copy and
then
paste into your Word file.

When you are all done, print off this Word file and put it
in your notebook.


On this page, look at the
Genomic Regions
:

the 5’ is the beginning and the 3’ is
the end. Note that the short vertical red line means a section that
is used for the protein
(coding regions), and the blue section is not (called an
untranslated region
-

UTR
-
-

so,
there is a 5’ UTR at the beginning and a 3’ UTR at the end of the sequence
). Note the
first three coding regions are the same for both isoform
a and b, but the last one differs
(in isoform b, the last coding region is right up against the 3’
UTR
).

Next, glance
through the
Genomic Context
(
again,
note it is on chromosome 12) and then
Bibliography

(you will be asked to look at one paper on this li
st)
.

Go down to
Pathways and look at KEGG pathway
:
Insulin signaling pathway
04910.

In this path, note that insulin activates the receptor, which activates the insulin
receptor substrate (IRS), which turns on GRB2, SOS, Ras, Raf, MEK, ERK1/2 (same
path
we looked at before in
Xenopus
oocytes).
MAKE SURE AS MUCH OF THE
TABLE IS IN VIEW and then hit the “print screen” button and go to your Word file and
hit control
-
v to paste the pathway into your Word file. Note that
the insulin path is a little
different

from the growth factor path (see text figure on page 1
of this protocol
); the
insulin receptor is already dimerized and there is an insulin receptor substrate (IRS)
inserted into the path before Grb2/SOS.
The end results in this
insulin
path are


(1)____
_______________________________ and (2)___________________________.

On the KRAS page,
go down to
RefSeq
section
. Compare variant a and b now (use text
in web site; which exons are used by what isoform/variant)

note which
isoform or
variant is rare
and w
hich is common:






Click
on the
mRNA Sequence

for

variant (b)”
(
not
variant (a)).
Use the RefSeq
entries for the mRNA and protein sequences for

K
-
Ras2

isoform b


also called “variant
(b).”

Go down to the
protein sequence
(
/translation="
)
and save the

sequence in

FASTA format
in your
Word file
.

Remember this is the un
-
mutated protein sequence
-

name it the “protein sequence for the normal protooncogene.” Use
the font
“courier
new” and get rid of all gaps. Also, go to the bottom of the web page and
copy the gene



copy the sequence of bases that begins after ORIGIN, and goes down to number 5281
-

and put it in your Word file for your lab notebook.



7

8. Go to the
ExPASy

website
(
http://us.expasy.org/
)
and search fo
r the SwissProt entry
for your

protein using
“kras2
.


Be sure to select

the human protein from the list of
results. Make sure the information in the

entry is the same as you saw in the Gene entry.
If your protein is an

enzyme, the EC number is a good way
to double
-
check

it is
_______________
. You may want to

record the SwissProt entry number
(
primary
accession number
P01116)

in case you want to find this entry

again.

Note that we could
get the normal gene here also (at bo
ttom, directly in FASTA format)
-

y
ou want to save
this version also.


Part 2::
Protein
-
protein BLAST

--
Finding
homologous

(similar)
proteins

9.
Search for similar proteins by a
BLAST

(from NCBI home page or:
http://www.ncbi.nlm.nih.gov/BLAS
T
)
search using the
RefSeq

or SwissProt
protein
FASTA
sequence (the
unmutated

protein sequence).

BLAST

is a
program that
compares
your input
sequence

to all the sequences in a database (that you choose). This

program aligns the most similar segments betw
een
the two
sequences

(
using
a scoring
matrix similar to BLOSUM
-
see entry). This scoring

method gives penalties for gaps and
gives the highest score for identical

residues. Substitutions are scored based on how
conservative the changes are

(a nonpolar sma
ll amino acid replaced by a nonpolar large
amino acid)
.

The output shows a list of sequences, with the highest scoring sequence at
the

top. The scoring output is given as an E
-
value. The lower the E
-
value, the

higher
scoring the sequence is. E
-
values in t
he range of 1^
-
100 to 1^
-
50 are very

similar (or
even identical) sequences. Sequences with E
-
values 1^
-
10 and higher

need to be
examined based on other methods to determine homology. An Evalue

of 1^
-
10 for a
sequence can be interpreted as, “a 1 in 1^10 cha
nce that the

sequence was pulled from the
database by chance alone (has no homology to

the query sequence).”


First,
under
Protein
,
select
PSI
-
PHI BLAST
. Then paste the

FASTA formatted protein

sequence in the search box. Select the
nrprotein

database. Clic
k “BLAST” to begin.


You may need to wait a few minutes before the results

page opens.

On the next page that
appears,

you will see that putative conserved domains have been found. S
elect

Format.
” After obtaining the results, choose 5 sequences from var
ious

positions in the
results

(under
Sequences producing significant alignments
)
.
Be sure not to

choose any
sequences that are human

(see
SOURCE
)
, since they are the same as your

search
sequence.

Choose ones for
“lower” animals:
rat,
Tetraodon nigroviridi
s
,
Xenopus
,
Rivulus marmoratus
,

Oryzias latipes
(Japanese medaka),

etc.
The goal is to choose a
variety of sequences that

greatly
differ in

evolutionary distance from the human protein.

Be sure to choose a good variety of

sequences from the BLAST search.

The more varied
the sequences, the more interesting the
resulting phylogram
will be.

Be sure the wild type human (RefSeq) and mutant sequences
only differ by one
amino acid residue
. If more differences are found, there may have

been a mistake in the

tran
slation of the mutant sequence.

For each of the five sequences, click on the sequence

name to view the GenBank
entry for the sequence. Then view the

sequence in FASTA format. Copy and paste all the
FASTA formatted

sequences into the same Word file
(get rid

of any gaps or numbers)
. At

8

the

beginning of this file,
make sure that you have
your mutant protein sequence

(see
very start of this exercise)
, also in FASTA

format.


10. This Word file will be used to create the multiple sequence alignment, so

the
format
ting is very important.
Get rid of all gaps

esp those at the end of each line (go to
the end of each line, and hit delete until you start deleting sequence amino acid symbols)
.
You should end up with a Word file that contains

the 5 sequences from the BLAS
T
search plus the un
-
mutated human

protein sequence and your mutant sequence for a total
of 7 sequences.

Each sequence should be in FASTA format and contain a title line (
starting

with >,

then
text, then a return). Shorten the text to contain JUST the

spec
ies information so it will fit
in
one line!!
For example,

you should erase the “gi” line and add in something simpler
like “pig,”

“cow,” etc. Your mutant sequence should read “>mutant”. At the end of

each
title, be sure to press return to separate it from

the rest of the

sequence.



Part 3


Multiple Sequence Alignment

11. Go to the
ClustalW

website
(
http://www.ebi.ac.uk/clustalw/index.html
)
and enter (by
using “copy” and “paste”) all

your FASTA for
matted sequence into the data entry box.
The default

parameters will work for us, except for the output order.

a. Select “input” for the Output order

b. Press “run”


12.
When the results come up, click on
Edit
on the top bar of the internet explorer tool
b
ar, and then highlight
Select All
.

Or you can highlight the central text (getting rid of the
heading at the top of the page) and copy.
After
items are highlighted, hit control
-
c to
copy and then paste into your Word file.

Next, let’s center in on the
a
lignment


click on
View Alignment File

and copy all and
paste into your Word file. It
may
look broken up. Follow these steps

to make it readable
again.

a. Select
the
alignment
text

(highlight it with your mouse)

b. Change the font to size 10 and Courier

New

c. Change the
page set
-
up
(first hit
FILE
)
to landscape

d. Save the file to your desktop


13.
To save the “
cladogram
tree,” make sure that the diagram is in the center of your
monitor, then hit the “Print Screen/SysRq” key on your keyboard. Then go to

your Word
file and make sure that you are on the stop you want the diagram, then hit Control
-
V or
the paste icon (or under Edit) to put the picture in your Word file. Repeat the process but
first click on “Show as
phylogram
Tree.”

From the web site: “
Ph
ylogram is a branching
diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional
to the amount of inferred evolutionary change. A Cladogram is a branching diagram
(tree) assumed to be an estimate of a phylogeny where the bran
ches are of equal length,
thus cladograms show common ancestry, but do not indicate the amount of evolutionary
"time" separating taxa. Tree distances can be shown, just click on the diagram to get a
menu of options. The ".dnd" file is a file that describes

the phylogenetic tree.


With your
phylogram, you might note that
Xenopus

separated

into a species about 400 million

9

yeaers ago. Other animals:

b
irds arose about 170 million years ago; Mammals about 220
million years ago; Reptiles about 320 million year
s ago
;
Amphibians about 400 million
years ago

and f
ish about 500 million years ago
.


Scroll through the
file and
alignment and make sure none of the blocks of

sequences are
separated by a page break. Save and print the alignment
-

it will be part of your

la
b
notebook
.


OMIM search

14. Search the OMIM database
(
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM&cmd=search&term=
)


by typing in:



l
ung Cancer”

KRAS2
.

Yo
u have to put parentheses around the two words
to make sure that you link them (or you get pages of breast cancer, etc).


15.
C
lick on the
first two items that come up on lung cancer and Ras. Read through
them. If you want, w
hen a new page comes up,
use
Edit, Find

(on this
web
page) and
type in ras

to go right to a note about ras.

An outline for the entry is

provided to the left
of the window
-
go to the references and read the first article listed
. Go to the Allelic
Variants section.

Scroll until you
see the entry for the Gly12Cys mutation

(or G12C)
.
Answer

appropriate
question
s

at the end of this exercise
.


Examination of K
-
Ras structure with Kinemage:

1. Open the program Kinemage (should be on C drive) or download it from the internet.


2. Download t
he file called:
c14Recp.kin

and save it next to the Kinemage program.


3.
See figure below, small part on right:
Ras is essentially a flat plane (made up of 6
BLUE
beta
-
pleated sheets), with

GREEN
alpha heli
ces
sitting on
both sides and on the
edge

see lower right hand small figure in illustration below
.



Find the structures of Ras noted in the figure above with Kinemage.

Open Mage
program, open file
c14Recp.kin

and then
you have to go from the first figure (that of
Src) to the second (
showing
Ras)


you will not need the third figure.


10


4.
Ras:
View1

of 4
. T
he backbone of the amino acid chain is

in white, the bound GTP
analogue in pink, and the Mg++ in yellow. Mg is captured by Ras to help the protein
bind to the phosphate part of GTP
-

thus, Mg
is a cofactor

(also a

trace element

-

required in trace amounts
)
.

Rotate the Ras so that you get the

same arrangement as
shown in the PowerPoint slide (
GTP

on lower right side, see n terminus on the left side)
-

the flat sheet of the 6 beta strands is up and down, with the GTP binding site on top (see
figure above).
Make sure you read the
captions

for ea
ch of the four views of Ras
.
Zoom
out (top right slider bar)

to find n terminus
-

remember that the P loop is just down the
chain from the N terminus

(the beginning of the protein)
.

In this orientation, w
hich of the
two G’s represent Glycine 12

(versus G
lycine 13)
?

The G on the left or right?


With the GTP site on lower right, can you count down from the N terminus?


5.
View2
(remember how you go from the first view to the second?)
: Glycine 12 (where
the most common mutation occurs) and Glycine 13 are

labeled in green and are found on
what is called the G1 or P loop. These 2 Glycines are located in a critical part of one of
the main GTP
-
binding loops (the G1 or P loop). They are the two major sites of
mutations that convert this enzyme into an oncogen
e
-

when these Gly's mutate to Cys,
the GTP cannot be broken down (GTPase activity of ras is reduced) so Ras stays in the
"on" state more of the time
-

causing cancer.


Draw (block diagram only, not atoms and bonds)
GTP structure
, then point to and name
th
e three parts of
GTP




Can you identify the

3 parts of
GTP parts in the kinemage image?


6.
To see details of interactions at the binding site,
go to
View3 and turn on "interact"

(click on its box). Now, you can see the R group
sidechains in cyan and w
eak H bonds in
purple
.

Remember that you can
zoom in
,
and
alter the
Z slab

so that you can see atoms
behind the plane of view.

With the view three and interaction shown, you
can also click on the atoms and
find the number,
name of the amino acid and what

atom of the
amino acid
(
the beginning of the amino acid has
an
amino group
; for
glycine 12
, click on parts of
the amino acid white chain and looking for “n;”

the
tail end of the amino acid has the
carboxy
group

noted by “
c
,” and “
ca


stands for the
alpha
carbon in the middle of the amino acid
).

See figure to right (Fig. 3
-
3 in sixth ed) showing
how two amino acids are connected
)
.

In the
Kinemage image, f
ind

the

amino


group NH at
the
beginning of
glycine and the

beginning of

11

the second amino acid alanin
e. Note the R group comes off of the alpha (or central)
carbon and, for glycine, the R group consists of two H atoms.
The R group of
Alanine
has a carbon and three
hydrogen

atom
s

(
called a
methyl group).


7.

Glycine 13 interacts with what part of GTP?
(see view
three
,

and click on interactions
to see blue lines representing weak
H bonds
; which of GTP’s 3 parts interacts with
glycine 13?
)
.




8.
Does glycine 12 (the one that mutates

most commonly
) have any weak interactions
(blue line) with the GTP?


Yes or No (circle one)


How do you
interpret

this answer?



9.
Wh
ich of the three parts of
GTP
interacts with
the cofactor
(in yellow)
?


10.
How many weak bonds link GTP to Ras?




11.
Draw and compare the R group of glycine with that of cys
teine;
which R group is
larger
?



12. W
hat type of amino acid are they (remember there are three types of amino acids
based on the R group)?

Why
might
changing from a G to a C cause a problem?




Questions

over bioinformatics exercise on Ras and cancer


1.
In the first or second reference from step
7 on page 6 (
Entrez Gene
, bibliography)
above,
go to a paper and
read the
abstract
:

w
ho is the first author on this
first
article and
what
is the reference for this article (journal name, volume, page numbers,

year)
?




2. Describe how many and what categories of

patients

were involved in the study
?




3. What did the researchers find out about K
-
Ras mutations?






12

4. What conclusion(s) did the researchers come to about K
-
Ras mutations

based on their
data? (Sum
marize and put into your own words)



5.
Using your Word file, l
ook at the gene sequence for the normal k
-
Ras, and the mutant,
cancer causing k
-
Ras. Look for the point mutation in the genes. To do this, use your

The Sequence Manipulation Suite:
Show Tra
nslation
-

for the cancer patient’s mutant
ras
:
Results for 5312 residue sequence starting "GGCCGCGGCG"” and the equivalent
for the normal gene

(
see
hint
below
)
.

Find the first M or methionine (proteins begin with
this amino acid), and then count down to t
he 12
th

amino acid.

What is the three base code in the
normal
gene for this 12
th

amino acid
--
specifies

G: ___

What is the three base code in the
mutant

gene for the 12
th

amino acid
--
specifies C:___
_



Hint: see my alignment below…note that the amino acid

is listed above the first base of
the triplet that specifies the amino acid, start with M (methionine) and count down 12
amino acids…

NORMAL GENE:

181 aatgactgaa tataaacttg tggtagttgg agc
t

???

ggc gtaggcaaga gtgccttgac


Mutant gene;


61


M

T

E

Y


K

L

V

V

V

G

A


C


G

V

G

K

S

A

L

T



181

A
ATG
ACTGAATATAAACTTGTGGTAGTTGGAGC
T

???
GGCGTAGGCAAGAGTGCCTTGAC



Note that the M is above ATG
-

this is where
the gene actually starts. Look at your
Genetic
code table
; why does it say methionine i
s
AUG?
The genetic code table gives the codon
for an amino acid from
mRNA
-

not DNA
.
However, g
ene
DNA
sequences are given for
the
coding strand of DNA

(
not the template
strand because
the
coding strand
and
mRNA
are the same except that T is replaced with

a
U in
the
mRNA).

DNA:

Coding strand:
---
ATG
---



Template strand:
---
TAC
---


mRNA:
---
AUG
---


protein: methionine


Note how A pairs with T (or U in mRNA) and
G pairs with C.


6
.
List the steps from insulin to the fin
al cellular events
-

u
se the Kegg pathway for
insulin.


13


7
. Elk
-
1, c
-
Jun, and c
-
Fos are transcription factors. In general, what do they

do
as a result
of Ras activation?



8
. If Ras were mutated to be always active, what part of the pathway

becomes irrelevan
t?



Concerning its
Entrez
Gene
e
ntry

9
. Fill in the following information

a. Write the GeneID number here ________________.


b. What is the gene name?

c. Where in the human genome is this gene located?

d. What is the RefSeq number for the mRNA sequence f
or isoform b?

e. What is the RefSeq number for the protein sequence for isoform b?


Concerning

Swiss
-
Prot Entry

10
. How many splice variants are there of K
-
Ras2 and what are they called?



11
. Describe how K
-
Ras2 is activated and inactivated.



12
. What pr
oteins does K
-
Ras2 interact with? (Hint: GDP and GTP are not

proteins)



C
oncerning
Multiple Sequence Alignment

with ClustalW:

13
. What is the mutation in the
amino acid
mutant sequence? Write it in the following

format “Res123Res” where the first Res is t
he three
-
letter code for the

amino acid in the
un
-
mutated (wild type) protein and the second Res is the

amino acid in the mutated
protein. In place of “123” put the amino acid

residue number of the mutation.



14
. Is the mutation in a region of conservatio
n

(look for sequences with many * or :
(blanks mean no conservation; what does
the
colon mean?)
?

YES NO

If the sequence stays the same throughout the evolutionary path, the sequence (or part of
a gene/protein) is said to be conserved
-

this probab
ly means that the sequence plays a
crucial role in the active site (or in a regulatory region).



15
. Based on the alignment, what span of amino acids is LEAST conserved?

Does this
correlate with the region specified in the Swiss/Prot entry as

“hypervaria
ble”?

These
sections or sequences can vary over evolutionary time because they may not be important
to the functioning of the protein (they are probably not in the active site; although the
sequence could play a role in regulation).