BSc Bioinformatics - People Page

moredwarfΒιοτεχνολογία

1 Οκτ 2013 (πριν από 4 χρόνια και 11 μέρες)

78 εμφανίσεις

BSc Bioinformatics


Spring Term 2004


Exercise: Eukaryotic Gene Prediction



This exercise is expected to take you not more than about one or 1½ hours, during the
practical session on 11 February. It is not assessed in any way.


You will try out out some
of the public web servers available for predicting the location
and structure of eukaryotic genes, using the DNA sequence of the contig containing most
of the human PAX6 gene (the subject of your assessed exercise).


Firstly, search the EMBL database for t
he contig concerned, using the SRS server at the
EBI (
http://srs.ebi.ac.uk
). The easiest way to locate the entry is to type the accession
number (Z83307) in the “Quick Text Search” box. Display the sequence in FASTA
fo
rmat using the “Display Options” at the bottom of the web page.


Now investigate some of the following web server based gene prediction programs,
following the instructions in each case
1
. You may be required to select the species or type
of organism and/or

the type of DNA (genomic), and you should also note whether the
sequence is expected to be in raw or FASTA format. If raw sequence (DNA characters
only) is required, you simply use the FASTA sequence without the top (title) line.


Genscan




http://genes.
mit.edu/GENSCAN.html


HMMgene




http://www.cbs.dtu.dk/services/HMMgene/


GrailEXP




http://grail.lsd.ornl.gov/grailexp/


Genie





http://www.fruitfly.org/seq_tools/genie.html


In each case, examine the output you get. What exactly is predicted: exons? C
omplete
genes? Promoters, polyA tails and other upstream/downstream signals? How many exons
are predicted in each case? How easy is the output to understand?


Finally, go to the Ensembl database and find the entry for PAX6 on chromosome 11, as
you did in t
he first exercise. Look at Ensembl’s latest prediction for the structure of the
PAX6 gene, and compare its gene structure with that predicted by each of the individual
programs you used, remembering that the contig sequence you used does not contain the
co
mplete gene. Assuming that Ensembl’s prediction is accurate (
which will not always be
the case


why?
), which of these programs did best, and which worst? Is this what you
would expect?




1

If you think you will run out of time, miss out one or two of the pr
ediction programs rather than skipping
the comparison with Ensembl.