Instructions

throneharshBiotechnology

Oct 2, 2013 (4 years and 1 month ago)

149 views

Genomics (BIO 294) Laboratory, Week
3
, Spring 2011



GOAL: To get an introduction to the
Ensembl

database


URL:
http://uswest.ensembl.org/index.html


PREPARATION:
Revisit the Entrez

Gene website before the lab, and look over your
documentation of your exploration of last week, to remind yourself how that database is
structured and what information you found there
.


ASSIGNMENT:
At the end of class, email me
the
documentation of your
p
air’s
exploration

of
the Ensembl database
.


Ensembl
is the central
genomic
database
maintained by the
European Bioinformatics Institute
(EBI)
and the Wellcome Trust Sanger Institute
. The two main entry points

are by (1) searching
for specific genes, which
takes you to the Ensembl equivalents of the Entrez Gene page, or by
(2) browsing genomes, which takes you to the Ensembl equivalents of the Entrez Map Viewer.


In this exploration, we will be accessing it by searching for specific genes.

Just like last we
ek,
you will work with your lab partner to familiarize yourself with the information about genes
available in Ensembl. And just like Entrez Gene, the Ensembl pages for a gene can be
overwhelming at first, given the amount and diversity of information
in a
hierarchically
organized series of pages
. While you and your partner introduces yourselves to this datab
a
se, one
of you should
document
the steps by which
you have navigated through the Ensembl database
and what information

has been displayed on each page
.

After we have all explored different
genes, we will get together as a group and exchange notes on what we have found.


You may be interested in watching a ten
-
minute video introduction to Ensembl, which you can
find here:


http://www.ensembl.org/info/website/tutorials/index.html


FINDING

YOUR GENE


The simplest way to find your gene is to select the same organism’s genome that you studied last
week, and then enter the gene’s name in t
he search box (for example, the name for the human
phosphofructokinase gene that is active in liver cells is PFKL). Clicking Go will take you to the
Results Summary, and then clicking on the appropriate Gene result will take you to the Result in
Detail pag
e, where clicking on the Ensembl protein_coding Gene link will take you to the page
for that gene.




EXPLORING THE
ENSEMBL INFORMATION ON YOUR
GENE


Spend
90 minutes
or more, working with your partner

to explore

the Ensembl

database listing for
your gene. You may find it helpful to have one partner handle the mouse and the other type up
the documentation in a Word document on an adjacent computer.


While the Ensembl database listing for a gene has some of the same informatio
n that is found in
Entrez Gene, it is organized completely differently, and has different strengths and weaknesses.


1.

Note the tabs at the top of the page. When you first get to the page for your gene, there
should be three tabs. The leftmost tab is for th
e human genome taken as a whole. The
next tab is a map of part of the chromosome on which the gene is found. Depending on
how much time we have, you may explore that later. The third tab is for the gene itself.
Depending on what links you click on,
a tab

f
or an individual transcript will appear later.


2.

Within each tab, there is a hierarchical table of contents on the left. You can navigate
through the pages for a gene, transcript, or protein, by clicking on items within this table
of contents, or by clicki
ng on forward
(>>)
and back
(<<)
buttons within each page (see
below).

The pages within this table of contents differ from gene to gene, depending on
how much information is available for that gene.


3.

In comparison to Entrez Gene, the Ensembl database
has
(a) more pages with less
information on each page, (b) more graphical representations, and (c) fewer links to other
databases. The graphical nature of the pages can make it fun to navigate, but it may take
time to make sense of all of the graphics. Clickin
g on many of the items within any given
graphic will bring up an infobox with information on that item and/or links to further
information.


4.

At the top of the Gene page is a table containing lists of transcripts and proteins.
Clicking on any of these will

bring up the corresponding Transcript tab

(the protein
information is partway through the Transcript tab table of contents)
. But before you go
there, explore the Gene pages.


5.

If you scroll down below that table, you will find the Gene Summary. Here and
e
lsewhere, you can click on the he!p button for more information.
The he!p button is
your friend.

Try clicking on it w
henever you find yourself perplexed by what you’re
looking at.


6.

Just as in the case of the Entrez Gene database, the Ensembl database has
a bewildering
array of information, some of which you may be able to make some sense of (don’t forget
the he!p button), and some of which will just perplex you. Don’t give up right away when
something is unclear, but also don’t spend forever wrestling with

a single page. I may be
able to help you if you’re interested in something but can’t make sense of it, but frankly I
haven’t yet explored all the dark corners of the Ensembl database.

7.

To the right of the he!p button is the button to navigate forward to th
e next page in the
hierarchical table of contents (in this case, it is Splice variants).


8.

You can customize the information on any page by clicking on the Configure this page
button on the left, just below the table of contents, and clicking on any of the

items to
remove them f
rom or add them to the display. In some cases (such as External Data and
Genomic Alignments), you have to first configure the page before it will display anything
interesting.


9.

Clicking on the graphics will often bring up infoboxes w
ith links that can take you to
more information on that feature. Try doing this a lot.


10.

Often, the easiest way to get back to where you were is using the back arrow in your
browser.


11.

Once you’ve explored the Gene
-
based displays to your heart’s content, hav
e a look at one
or more individual transcripts. In the table at the top of each Gene page,
transcripts are
listed by length in both base pairs and amino acids (if applicable), and by biotype. There
should be at least one protein
-
coding transcript for your
gene. These are the most relevant
ones.


12.

If there is more than one protein
-
coding transcript, make a note of how they differ. Are
there skipped or alternative introns? Alternative start or stop codons?

The best place to
get this information is on the Gene

Summary or Splice Variants pages in the Gene
-
based
displays.


13.

You can click through to Transcript
-
based displays either by clicking on the Transcript
ID for one of the transcripts in the table, or by clicking on that transcript in the graphic,
and then c
licking on the Transcript number in the infobox that appears. Once you have
done so, the table of contents to the left of the page will show the available Transcript
based display pages. You can navigate through these either by using the table of contents
or by clicking the forward and back buttons on each of the display pages.


14.

You may sometimes get error messages instead of an image when you click on a
particular page. If so, try clicking to that page a few more times and you might succeed,
or you may ke
eping getting the same error message. You could give up, or move onto
other things and try coming back to that page later.


15.

Eventually, clicking through the transcript pages will bring you to the Protein Summary
page, which you could also have gotten to di
rectly by clicking on the Protein ID for a
protein
-
coding transcript instead of the Transcript ID.
If your gene has a Domains &
Features page in the Protein Information set of pages, be sure to click through to and
explore the InterPro page for one or more

of your protein’s domains. We’ll be coming
back to InterPro and analogous protein databases later this semester, to explore them in
more depth.