Genome Annotation

richessewoozyΒιοτεχνολογία

1 Οκτ 2013 (πριν από 3 χρόνια και 6 μήνες)

56 εμφανίσεις

Genome Annotation

Bioinformatics Lab2

Tucson High School


Introduction


Today we will be learning about annotating a genome. This process involves
finding genes in a genome sequence, and determining the function of those genes
based on
existing
data currently available in public sequence databases such as
Genbank.

We will be finding and annotating genes in our genome

P
-
HM1
,

a T4
-
like
phage that infects
Prochlorococcus

in the oceans.


Directions


Part 1. Gene Finding


We will be looking for gene
s in our genome sequence using

a program called
Prodigal. Prodigal (
Pro
karyotic
Dy
namic Programming
G
enefinding
Al
gorithm) is a
microbial (bacterial and archaeal) gene finding program developed at Oak Ridge
National Laboratory and the University of Tennes
see.


1.

Go to the following website:
http://compbio.ornl.gov/prodigal/server.html

2.

Upload the genome sequence on your desktop using the browse button, using
the following genome sequence (genome_ass
embly/genome/
P
-
HM1
-
genome.fa
)

3.

Select “Gene Coordinates with Protein Translations” under the output
options

4.

Run prodigal by pressing “Begin prodigal analysis”

5.

Keep the web page open to use in the next section.




Part 2. Gene Annotation


Using the protein
sequences from the step above, we will try to find the function for
several genes by comparing them to databases of known genes.


1.

Open another web browser or tab and go to the following website:
http://blast.n
cbi.nlm.nih.gov/

2.

Click on protein blast under the “Basic BLAST” header

3.

Copy and paste any of the protein sequences you generated above using the
prodigal gene finder into the box that says “enter accession, gi, or FASTA
sequence”. For example:

>
Prodigal Gene 1 # 1 # 561 # 1
MYLSLKLHFTTDTFDYFKYGNAAKASQQSFDSRRDKFFFVKLSRTFKEDELREFFVANMI
VEDKVYPATLVREGAKNYQEYLKRKQSLTYRFKEDVITLHEVSQKFDKLFIIDGMHPPLL
KAHLGGRISIETLAIFHKIFNYVENFDKIIKEEIVWRPIRNRILKYEPFIFIDKGKYKNI
IKQQYV

4.

Leave
all other options as default
and click on “BLAST”


5.

Look at the top matches for your protein sequence and answer the following
questions to yourself. Does the sequence match a phage? How good are the
matches? How much of the sequence matches

(coverage)
, and how many
mismatches do y
ou have

(identity)
? Do you believe your protein is exactly,
similar to, or not at all the same thing as what it hit to?


6.

Repeat steps 1
-
5

for 10 random gen
es and fill out the table below for the best
hit

or a very close hit to a phage
.


g
ene ID

h
it

description

cov

iden

evalue

p
hage?

f
unction?

Gene 1

gp59 [Prochlorococcus
phage P
-
SSM4]

97%

36%

4e
-
29

yes

helicase









































































Questions


1.

What is coverage a measure of?




2.

What is
identity a measure of?





3.

Were you able to find a hit to most, some, or only a few of the genes you
compared against Genbank (a public sequence repository)? Why do you
think this
is
?







4.

For the gene

that you got the best hit to

of the ten above

(based

on coverage
and identity), are the genes coding for something that is common in phages
or specific to our phage? Why do you think this is?






5.

Suppose the best hit for a phage gene was to a bacteria rather than a phage.
What are some possible explanati
ons for this?