insulin bioinformatics

richessewoozyBiotechnology

Oct 1, 2013 (3 years and 10 months ago)

75 views

Bioinformatics of Diabetes and Insulin

Finding the sequence for a Gene using today’s online databases.

First go to the National Center for Biotechnology Information (NCBI) web
site.

http://www.ncbi.nlm.nih.gov/

Follow the following steps to find the sequence of the insulin gene in NCBI.

Under search, change the dropdown tab that reads “All Databases” to “Gene”. In the search
box enter “Insulin homo sapiens” and press GO. The result is about
19
00 entries. Which d
o
we choose?

(Answer questions 1
-
4)

The top entry will change especially over time. It may read IGF1 or IGF2 which is the
Official Symbol for Human insulin
-
like growth factor ½ or it may read IDDM a gene
associated with Insulin dependent diabetes. These a
re not insulin, but in the case of IGF1 it
is the same gene only it codes for a similar protein to insulin and is made in tissues other
than the pancreas.

Review the results of the first two pages of your search. Did you find a gene that is labeled
Insulin
? Go back and refine your search by entering in “INS” into the search box at the top
of the page and press GO.

(Answer questions 5
-
6)

There are fewer results with this search, find the entry for Human Insulin (Homo sapiens).
Click on the “INS” link. Things

to take note of on this page:

-
The GeneID


only one ID per gene in the database.

-
The Lineage


this gives the complete taxonomy of the organism the gene came from.

-
The Summary


basic information on the gene’s function.

-
Genomic regions, transcripts, a
nd products


Top line is the gene area where on the
Chromosome the gene is at. Bottom line has introns in Blue,
exons in red. The bold red line
are the peptide A and peptide B portions of insulin. Th
e fine red line is removed post
translationally. Right c
lick on the link to the sequence viewer and open it in a new window.
The top histogram show a vertical red line. This is the location of the INS gene. The details
of the gene are in the bottom half of the screen. Close the window.

Genomic context


Shows w
hich chromosome and the location INS is on that chromosome.
Right click on the MapViewer link and open it in a new window. This is another graphical
representation of where the INS gene is in the genome. Note the ideogram on the left side
of the page. Agai
n the red line shows the location of the INS gene. Close this window.

-
GeneRIFs: Gene References Into Function


A listing of the articles published about the
function of insulin.

-
HIV
-
1 protein interactions


Specific information on HIV.

-
Interactions


H
ow insulin interacts with other proteins and drugs.

-
Genotypes and Phenotypes


describe the various alleles of the gene and their affects.

-
Pathways


A list of the metabolic pathways that insulin is involved with in the body.

-
NCBI Reference Sequences


all the sequence information.

(Answer questions 7
-
10)

Under the section header, “NCBI Reference Sequences” (About ¾ of the way down the web
page). Under the sub
-
header “mRNA and Protein(s)” and find the “Consensus CDS”
sequence data which will be a link th
at starts with “CCDS”. Select this link which is “CCDS
7729.1”. This is the page that provides the DNA sequence that is a consensus of all the
submitted sequence data for this gene. The sequence IDs that were used to form the
consensus sequence are listed.

And then the chromosomal locations for the sequence is
listed. Then we have the actual sequence data.

Here are both the Nucleotide Sequence and the Translation or Amino Acid Sequence for the
protein. With your mouse, click on a Nucleotide letter in the se
quence, then an Amino Acid
in the sequence. What do you observe? These sequences are copied here for reference.

Human INS Nucleotide sequence:

ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCC
GCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGC
TCTCTACCTAGTGTGCGGG
GAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGT
GGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGA
AGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAA
CTAG

Human INS Amino Acid sequence:

MALW
MRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELG
G

GPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

(Answer questions 11
-
14)

Now copy the Nucleotide sequence and paste it as unformatted text into a text editor such
as Notepad. There is a menu optio
n at the top of the page called BLAST, select this. You will
get a list of genomes to search in. Select the link for “list all genomic BLAST databases”.
This link provides a list of all the partial genomes that have been sequenced and submitted
to NCBI. Si
nce Pig insulin was used at one time to treat diabetes lets find the sequence for
Pig insulin. Under vertebrates/mammals/other mammals, find “Sus scrofa (pig)”. Select the
blast link next to the pig (the circle with a B in it). Enter your text copy of the
Nucleotide
sequence that you copied earlier into the area to enter a sequence. Make sure the radio
button above the box is selected. Under “Database” select ‘RefSeq RNA’. For “Program
select BLASTN and for
Algorithm parameters
“Expect” select ‘0.01’. Now c
lick on “Begin
Search” at the bottom of the page. You will get a page with your search information, click
on “View report”. Once your report is formatted, it will be displayed, this may take a minute
or so as the search is performed. Scroll down and you wi
ll see your results.

At the top of your search results you see information on the database used for the search.
Following this is the graphic summary, a description of the matching sequence and the
alignment of the sequences.

(answer questions 15
-
22)

The B
LAST results gives us an idea of how conserved the nucleotide sequence is between
pig and human. The human DNA sequence is above the pig DNA sequence and there is a
line between the nucleotide letters where the letters match. When there is no match there i
s
no line and if there is a missing letter there is a dash in the sequence.

(answer question 23
-
24)

Use your back button to return to the CCDS page for insulin. For reference we can look at
the total records for the Insulin gene by referencing the nucleoti
de and protein ID. Return to
the sequence IDs at the top of the CCDS page. There are multiple sources for the CCDS
information. The last two lines give the current Nucleotide and Protein IDs for the gene.
Click on the Nucleotide report link on the NCBI sou
rce (last line) of the sequence
information. Then click on the NM_000207 LOCUS report for the sequence where there is a
lot of information and links on the insulin gene. Things to note:

-
The PUBMED lines are articles that have been written about this gene.

-
Read the summary under COMMENT about half way down the page. Where have you seen
that before?

-
Under FEATURES find CDS.

(Answer questions 25
-
26)

Use your back button until you return to the CCDS page. From the NCBI source (last line) of
the sequence IDs
click on the Protein report for the gene ID NP_000198. This report is for
the proinsulin precursor and the AA sequence is at the bottom of the page. There is a lot of
information here and links to more about the protein. These are the official pages for th
e
nucleotide and protein in NCBI. At the bottom of the page is the Amino Acid sequence.

(Answer questions 27)

We thus have the nucleotide sequence for Insulin and the CCDS page also displays the AA
sequence that would be made from this DNA gene. Is the ent
ire AA sequence the insulin
protein? Are there introns and exons in this DNA sequence that would be cut out of the AA
sequence prior to the translation into insulin? In other words, what part of this actually
makes up the insulin protein? It’s hard to tell

from the NCBI information what makes up the
protein.

Copy the last 21 AA of the protein sequence, it starts with the letters “give”. Open up a new
page in your browser to the Protein Data Bank at

http://www.pdb.org
. Sel
ect the Advanced
search tab in the upper
right

then choose the search database. In the drop down box select
“Sequence (Blast/Fasta)” and paste in the 21 AA of the sequence, remove spaces, and click
on “Evaluate Query” to start the search.

(Answer questions

28)

There are over 100 results and the pictures look different. How to know which one is
correct? They all are! Each is a separate research result. A person doing research would
need to sort through the results to come up with the structure they want to work w
ith. To
make things easy, enter the PDB ID “2omg”, a recent structure for insulin entered into PDB,
in the search line at the top of the page and do a site search. The information on the 2omg
study of insulin structure should appear. About half way down th
e page there is a Molecular
description of the Asymmetric Unit. There is the Insulin A chain and the Insulin B chain.
What this means is that there are two peptide chains and 3 copies of each chain in an
insulin molecule. Thus the last 21 AA we entered in
for the search is the A chain of insulin.
How would you go about finding the AA sequence of the B chain? Click on the “Sequence
Details” tab near the top of the page and to get the sequence information on the chains.
Locate the B chain AAs in the overall s
equence in CCDS. By highlighting the end AA of the
chains in CCDS you can see the two peptides in the insulin gene.

(Answer questions 29
-
31)

Return back to the tab on structure summary in the PDB site. Under the picture of the
protein on the right of the w
eb page are a list of 3
-
dimensional viewers, click on MBT
Protein Workshop. This will load the protein viewer in java. Play with the viewer and see
what the options are. They are improving this viewer all the time and there are always new
options.

Insulin
from

http://www.pdb.org
, PDB ID 2omg

As you can see the actual insulin protein is a hexamer, 6 separate AA chains that make up
the final protein. There are 3 copies each of the 2 different chains. On the right side of th
e
viewer is a list of the chains. Click on chain A to expand to the list of AA. Click on each AA to
highlight it in the protein. Expand the list of AA for Chain A. click on the last AA in the list.
This should show you the atoms of the last AA in Chain A.
You can rotate the picture by left
clicking anywhere on the picture and while holding the mouse button down moving the
mouse around. Try to get the AA you have highlighted to the top of the screen.

Finally, we return to Pig insulin. At the top of the PDB p
age enter in the PDB ID 1ZNI and
click ‘site search’. This is the insulin that was given to diabetes patents for many years.
Select the tab at the top of the page for sequence details. Here are the two different chains
that make up the protein.

(Answer que
stions 32
-
37)

Insulin sequence & structure questions:

1. What is the first entry of your search results?

2. Do you think the first entry is Insulin? Why or Why not?

3. What species is the gene from? What species is the second entry from?

4. Review the resu
lts of the first two pages of your search. Did you find a gene that is
labeled Insulin? What is the name of the gene?

5. After refining your search to INS, how many hits did you get?

6. How many of the hits are “INS”? What is the difference between the ent
ries that are
“INS” only?

7. Read the summary. What is the role of insulin?

8. How many nucleotides long is the gene? (Note:subtract the smaller number from the
larger number).

9. What does post
-
translationally mean?

10. What chromosome contains the insuli
n gene? Describe where this gene is located on the
gene.

11. What does the acronym for CCDS mean?

12. Do all human genes have a specific identification like CCDS?

13. How long is the nucleotide sequence for insulin? Why is this number different from the
nu
mber you entered in question 8 above?

14. How many Amino acids are in the chain?

15. What does the graphic summary tell you about the match?

16. How do we know that our results are from the pig genome?

17. Review: How many nucleotides are there for each co
don? From the answer to questions
13&14, is there the right amount of Amino Acids for the number of nucleotides? Why or why
not?

18. What does the acronym BLAST stand for?

19. When we run BLAST what are we attempting to do?

20. How many sequences did your
BLAST search return?

21. How many nucleotides were matched?

22. How close percentage wise is the match between these segments?

23. From what you know of Protein translation does it seem that the pig insulin protein will
be close to the human insulin protei
n?

24. What is the report based on? DNA, RNA, mRNA, or Protein? (look at the top of the
page).

25. Under FEATURES find CDS. What is the length of this segment? (You need to subtract
again).

26. How many amino acids are there in the protein report?

27. Divi
de the number of bp in the nucleotide report by the number of bp in a codon, how
many should we have in the protein report? Are they equal? Why or why not?

28. How many structure hits did you get from the protein search?

29. What is the AA sequence of the
B chain?

30. Refer back to the Amino Acid sequence on the peptide page for insulin n NCBI. What
happened to the AA that are between the A and B chains?

31. On the PDB Why do you think the letters ‘c’ highlighted in yellow in the sequences?

32. How is pig i
nsulin different from human insulin?

33. Why did people have allergic reactions to pig insulin?

34. Are the nucleotide differences between pig and human insulin significant?

35. Has this exercise in looking at what is available online about insulin helpful
?

36. What more would you like to know about online resources for insulin?

37. What could be changed to improve this lesson?