Accessing and Analyzing Nucleic Acid

moredwarfBiotechnology

Oct 1, 2013 (4 years and 13 days ago)

126 views

First Homework Assignment: Bioinformatics

Accessing and Analyzing Nucleic Acid Sequence Data from NCBI's Database

Examples: chicken ovalbumin mRNA and gene

Sandra B. Sharp

Inspired by Dr. Rick Hershberger’s
The Bioactive Site

Introduction

Much of the wor
ld's data on the sequences of DNA and protein molecules are available to the global
scientific community via several databases available through the Internet. During this activity you will access the
National Center for Biotechnology Information’s (NCBI) g
enetic sequence database to obtain and study DNA
sequence entries relating to the chicken ovalbumin mRNA and genomic sequences.
Follow the directions
very carefully. You will need access to a printer.



Learning Objective: Concept Knowledge (Review)


*Y
ou will review some basic concepts of molecular biology.

Learning Objectives: Technical Skills

*You will use browser software to view files from the Internet.

*You will use data entry fields on web page forms to input query data for online database searche
s.

Learning Objectives: Research Skills

*You will find, view and print copies of sequence data files relating to a specific mRNA and gene.

*You will locate key molecular features of DNA and RNA sequences, such as promoters, exons, introns,
start codons, s
top codons, untranslated regions, etc.

*You will link from sequence data records to related records in genetic, bibliographic, and taxonomic
databases.

*You will determine the kinds of information relating to a sequence available within a sequence
database

record.

*You will use BLAST software to search a database of sequences to find those that are similar to your
query sequence.


GUIDED ACTIVITY: STU
DYING THE CHICKEN OV
ALBUMIN
mRNA

AND GENE

Follow the instructions in the guided activity below. Answer the
numbered items either on these pages or on a
separate sheet of paper.
There are 3
2

numbered short
-
answer questions.

Use your cell biology or
biochemistry text

and online resources

if you need to. With patience and thought you will be able to answer the
questions.
Remember to make a copy of all your work so that you have the original to turn in and the
copy to use during class.
Answering on a separate sheet of paper means you have less to copy.



Database Records

Sequences of Ovalbumin mRNA

You will fir
st find the
full length

sequence of the cDNA for chicken ovalbumin or egg white. A cDNA
sequence is exactly identical to the mRNA sequence, except that U has been substituted with T, as would be
expected in DNA. If you do not yet know what cDNA is, never

fear, you’ll learn all about it during Week 2 of this
course. To get started, log on to the National Center for Biotechnology Information’s site at
http://www.ncbi.nlm.nih.gov/
. (One way to do this is to go
to the Cal State LA homepage, type the address for
NCBI in the location textbox, and then hit Enter.) Take a few minutes to look at what is available from the NCBI
homepage. You may wish to come back later to explore other links. For this lesson, you w
ill be using four of
the seven sites listed in the dark blue strip above the search text box. From the NCBI homepage, move to
All
Databases

by clicking on the correct link in the dark blue strip. Once you have reached the Entrez homepage,
take a moment to

scan

the list of
available
databases.
We will search for the chicken ovalbumin complete
mRNA sequence in the Nucleotide database. First use the link

to the “
Nucleotide
” database homepage from
the list to start a search for the nucleotide sequence for c
hicken ovalbumin mRNA/cDNA.

Type "
ovalbumin
" in
the query box (the horizontal rectangle), and then click
Go
.
NCBI

will return a list (or the first page of a long list)
of database records containing the word or words you entered.


1.

How many records h
as this first search found?

Now you will narrow the search from all entries that contain either ovalbumin sequence or refer to
ovalbumin sequence to only those for the mRNA/cDNA. Under the text box into which you typed the word
ovalbumin
, click on the
Lim
its

link. Go to the
Molecule

pull down menu and click on
mRNA
. Click on
Go
.
Notice that you have reduced the number of entries to
486
, that the Limits link under the text box has a check
next to it, and that the yellow Limits bar says “mRNA.”

Notice
that the list contains entries from organisms other
than chicken (Gallus gallus).

Click on the
Preview/Index

link under the query box. Note how on the resulting
display you can follow the history of your search. Let’s try further limiting, this time by o
rganism. Scroll down
and click on the pull down menu that currently says
All Fields
. Click on
Organism
to highlight it. Type “
Gallus
gallus,
” the scientific name for chicken, in the query box. Click the
AND
button. Notice what now appears in
the search

query box at the top of the page.

Click on
Go
. You should now have
~6
4

entries
.
Scro
l
l down until
you find
the blue link for Accession Number BM440799. The accession number(s) is a unique indentifier. No
other sequence entry in the database has the s
ame accession number.
Click on the link for
BM440799
. Once
you have reached the page,
look at the nucleotide sequence. This is the sequence of part of the ovalbumin
mRNA as it is has been determined from a cDNA. The sequence is read from 5’ to 3’ (by co
nvention, left to
right). Note how many nucleotides are present (there are 60 in each line).



2.

This number of nucleotides cannot represent the full
-
length ovalbumin mRNA. Given that the
number of amino acids in ovalbumin is 386, explain why the seque
nce in this database entry can
not represent the full length mRNA.

This entry is one of many in the database that contain incomplete sequence for the cDNA. At this point,
let’s use a simple way to limit the entries to those for the complete mRNA/cDNA
. G
o
BACK
to the list,
type the
word
complete

in front of the word ovalbumin in the text box,
and click on
GO.

You should now have only
four

entries. Click on the entry

with

accession
NM_205152
. You now have a display presenting and giving
information abou
t the complete mRNA sequence for chicken ovalbumin. Note that the number of base pairs
(bp) is more than sufficient to code for ovalbumin protein.


3.

a) How many different journal articles contributed to the compilation and annotation of this
sequence?

b)
Over what span of years

were they written?

4.

At least how many different scientists contributed to this entry?

Click on a PubMed bibiliographic (journal article) link.

5.

Describe in general what you find at a PubMed site.

Click
Back

to the
mRNA
display. Click on the
word
Links
at the top right of the page and select

TAXONOMY

from the menu.

6.

a) To what site at NCBI have you gone?

Click on Gallus gallus at this site.



b) Describe in general what you find at a “TAXONOMY” site.

Click
back

to

the mRNA display. Scroll to the list of “features”. Scroll down until you reach the information in
the section headed
CDS
,
which may appear

in blue.


7.

What must the term “/translation” indicate?

8.

At what nucleotide number of this mRNA does transla
tion start?


9.

What do you think abbreviation “CDS” must stand for?

(Recall the function of mRNA.)

Scroll further down the page to the words “BASE COUNT”


1
0
.

Why are there no annotations (or notes) about introns

in this entry
?

1
1
.

Which bases (N
a



N
b
) represent

the

5’ untranslated region of the mRNA?
(Your answer should be
two numbers separated by a hyphen; for example, 43
-
62.)

1
2
.

Which bases (N
x



N
y
) represent 3’ untranslated region of the mRNA?

1
3
.

What would you expect to find
on

the
mRNA

followi
ng position 1873?

P
RINT OUT THIS OVALBU
MIN M
RNA

DISPLAY
,

INCLUDING THE SEQUEN
CE
,

FOR FUTURE USE
.



Using BLAST to Find Similar Sequences

You are now going to use NCBI’s BLAST

(Basic Local Alignment Search Tool)

and the chicken mRNA
sequence to find other
similar sequences in the available database. You would expect one of them to be the
sequence for the gene for chicken ovalbumin. Recall that the actual gene is found in the nuclear DNA and is the
template from which mRNA is transcribed.

First, you need t
o put the mRNA sequence into a form that can be used by the BLAST software. Using
the pulldown menu next to the
Display

button on the page for chicken ovalbumin mRNA, highlight
FASTA
.


Note that the ovalbumin mRNA sequence is now displayed in a new format
. This format is plain text, and can
be interpreted by the BLAST software. Use the mouse to highlight the sequence. Be sure to highlight only the
sequence itself, and not the information before or after, which can be misinterpreted by the software as
se
quence. Once you have finished highlighting, click on
Edit

at the very top of the page, and then click on
Copy
. You now have the sequence on the computer clipboard for future use.

Next, you need to get to the BLAST software. Scroll to the top of the FAS
TA mRNA page, and click on
the large
NCBI

icon at the top left. When you get to the NCBI homepage, link to
BLAST

from the dark blue
strip. At BLAST, find the link under the heading Nucleotide BLAST that says
N
ucleotide
-
nucleotide BLAST
[blastn].

Click
on this link. You will find a textbox labeled Search. Into that textbox, paste the mRNA/cDNA
sequence. You are essentially going to undertake the same kind of search of the databases as you would if you
had just sequenced the mRNA for chicken ovalbumin
for the first time today. The computer will use a pattern
-
matching algorithm to compare your query with all the sequences in the database and give you the best
matches it finds. Click on the
BLAST

button. You will first get a message that your query has

been placed in a
queue (waiting line). Click on the
FORMAT
button. After a short time, a response will appear, listing the items
in the database which show sequence similarity to your query sequence.


1
4
.

How many matches or hits did the BLAST search f
ind?

Scroll down to the
Color Key for Alignment Scores

box. Notice the
6

red lines extending all or nearly all the
way across the scale of your query, which was 1874 nucleotides long. Point the mouse over the top red line,
and read what pops up in the te
xt box above the graphics box. Not surprisingly, the query you made perfectly
matches the database entry with accession #V00383.


1
5
.

Why is this

a perfect match?
(This questions tests whether you’ve kept track of what is going on.)

You can move the mouse

over some of the other colored lines to see what happens.

Now try something different. Scroll down the page until you find the list of “hits” which matched your query. The
first is the mRNA #V00383.



1
6
.

What is the second?

1
7
.

What is the fourth?

1
8
.

What is the fifth?

We’ll come back to this list in a bit.

Scroll further down until you see the sequence alignments. In this section, all or part of your query sequence is
lined up over each of the sequence hits in the database. Wherever there is a v
ertical line, the two nucleotides
found at those positions match. Scroll back up to the top of the list, and look at the
fourth

item. There are two
links, one on either side of the entry.


gi|212504|gb|J00895.1|CHKOVAL

Gallus gallus ovalbumin gene,...

2020


Click on the right hand link
2020.


19
.

Where has it taken you?

Scroll back up and click on the left hand link. Now look where you have landed. You should be at the NCBI
annotated sequence for the chicken ovalbumin gene.


P
RINT THIS RESULT OUT
,

INCLUDING THE ENTIRE

SEQUENCE
,

FOR FUTURE
USE

AND STAY ONLINE TO C
OMPLETE THE
QUESTIONS BELOW
.



Database Records: Sequence of the Chicken Ovalbumin Gene

Note the long nucleotide sequence at the end of this document. Of course, the sequence given is th
at from only
one of the two complementary antiparallel strands in the nuclear DNA of chicken cells.


2
0
.

What are the accession and
numbers

for this entry?

2
1
.

How many base pairs (bp) are in this entry?

Scroll down under “features” to the first notation
for “exon”.

2
2
.

At what nucleotide number in the sequence does the first exon start?

2
3
.

What is the name given to the region 5’ (to the left) of the first exon in any gene that codes for a
protein?

Scroll down to the CDS feature.

2
4
.

Explain why th
e annotation says to “join”? Tell what is being joined and what is being left out.

2
5
.

Based solely on the FEATURES table and the annotation at CDS, without referring to the
nucleotide sequence below, what should be the sequence of nucleotides beginning a
t position
2996?

Check your answer.

2
6
.

Explain why the first exon and the CDS do not start at the same nucleotide.

2
7
.

At what nucleotide does the CDS end?

2
8
.

Without looking at the sequence, what three nucleotides should be just before position 8259.
(T
here are three possible correct answers; each answer is three nucleotides long.)

Check your answer.

29
.

At what nucleotide does the last exon end?

3
0
.

Explain why the CDS and the last exon do not end at the same nucleotide.

3
1
.

Is nucleotide 6285 pre
sent in the mature messenger RNA or only in the nuclear RNA (primary
transcript)? How do you know?

Just for fun, go
Back

to the entry in the BLAST and find the line for ovalbumin sequence from
Meleagris
gallopavo
. Use what you have learned about NCBI to
find the common name for this organism.



3
2
.

What is the common name for
Meleagris gallopavo
?


Darwin 2000 and Other Biocomputing Tools

If you enjoy using the databases, you may wish to visit the site for
Darwin 2000

to beco
me more familiar with
database manipulations. This site was developed by Dr. Rick Hershberger. The database formats have been
updated since he wrote the instructional text, so sometimes you will need to play around to find the current
equivalent to the s
teps given in the instructions. Have fun! (http://www.rickhershberger.com/darwin2000/).