BSc Bioinformatics


Oct 1, 2013 (3 years and 8 months ago)


BSc Bioinformatics

Spring Term 2004

Exercise: Eukaryotic Gene Prediction

This exercise is expected to take you not more than about one or 1½ hours, during the
practical session on 11 February. It is not assessed in any way.

You will try out out some
of the public web servers available for predicting the location
and structure of eukaryotic genes, using the DNA sequence of the contig containing most
of the human PAX6 gene (the subject of your assessed exercise).

Firstly, search the EMBL database for t
he contig concerned, using the SRS server at the
). The easiest way to locate the entry is to type the accession
number (Z83307) in the “Quick Text Search” box. Display the sequence in FASTA
rmat using the “Display Options” at the bottom of the web page.

Now investigate some of the following web server based gene prediction programs,
following the instructions in each case
. You may be required to select the species or type
of organism and/or

the type of DNA (genomic), and you should also note whether the
sequence is expected to be in raw or FASTA format. If raw sequence (DNA characters
only) is required, you simply use the FASTA sequence without the top (title) line.






In each case, examine the output you get. What exactly is predicted: exons? C
genes? Promoters, polyA tails and other upstream/downstream signals? How many exons
are predicted in each case? How easy is the output to understand?

Finally, go to the Ensembl database and find the entry for PAX6 on chromosome 11, as
you did in t
he first exercise. Look at Ensembl’s latest prediction for the structure of the
PAX6 gene, and compare its gene structure with that predicted by each of the individual
programs you used, remembering that the contig sequence you used does not contain the
mplete gene. Assuming that Ensembl’s prediction is accurate (
which will not always be
the case

), which of these programs did best, and which worst? Is this what you
would expect?


If you think you will run out of time, miss out one or two of the pr
ediction programs rather than skipping
the comparison with Ensembl.