Tucson High School
Today we will be learning about annotating a genome. This process involves
finding genes in a genome sequence, and determining the function of those genes
data currently available in public sequence databases such as
We will be finding and annotating genes in our genome
phage that infects
in the oceans.
Part 1. Gene Finding
We will be looking for gene
s in our genome sequence using
a program called
Prodigal. Prodigal (
gorithm) is a
microbial (bacterial and archaeal) gene finding program developed at Oak Ridge
National Laboratory and the University of Tennes
Go to the following website:
Upload the genome sequence on your desktop using the browse button, using
the following genome sequence (genome_ass
Select “Gene Coordinates with Protein Translations” under the output
Run prodigal by pressing “Begin prodigal analysis”
Keep the web page open to use in the next section.
Part 2. Gene Annotation
Using the protein
sequences from the step above, we will try to find the function for
several genes by comparing them to databases of known genes.
Open another web browser or tab and go to the following website:
Click on protein blast under the “Basic BLAST” header
Copy and paste any of the protein sequences you generated above using the
prodigal gene finder into the box that says “enter accession, gi, or FASTA
sequence”. For example:
Prodigal Gene 1 # 1 # 561 # 1
all other options as default
and click on “BLAST”
Look at the top matches for your protein sequence and answer the following
questions to yourself. Does the sequence match a phage? How good are the
matches? How much of the sequence matches
, and how many
mismatches do y
? Do you believe your protein is exactly,
similar to, or not at all the same thing as what it hit to?
Repeat steps 1
for 10 random gen
es and fill out the table below for the best
or a very close hit to a phage
What is coverage a measure of?
identity a measure of?
Were you able to find a hit to most, some, or only a few of the genes you
compared against Genbank (a public sequence repository)? Why do you
For the gene
that you got the best hit to
of the ten above
and identity), are the genes coding for something that is common in phages
or specific to our phage? Why do you think this is?
Suppose the best hit for a phage gene was to a bacteria rather than a phage.
What are some possible explanati
ons for this?