for GO and pathway enrichment analysis
For advanced help please see tutorial on the website
Huang da W
. Upload or paste a gene list
To start DAVID, first click on “Functional Annotation” under “Shortcut to David tools” at the
left of the home page.
This will take you directly to the “Upload” Tab of the functi
To upload a file, you can
either paste a list of gene identifiers into the window, or
upload a file with gene identifies saved as a txt file
. The identifiers can be a variety of
Official gene names; affymetrix, etc probeset
IDs; ensembl gene names, etc.
Open the tab
under “step 2: Select Identifier” to see the full list.
Note that whatever you use it must be the official identifiers from one of those lists, if the
names are misspelled or etc. David will not recognize the
You can also click on the link “Upload Help” to get further instruction on composing the
For this exercise, we will use a gene list
from human, with official gene symbols as
the identifier (
). You can either upload the txt file
by browsing or just past the gene names into the window.
. Now, since these are gene names, we need to select “official_gene_symbol” from
the pulldown menu under
“step 2: select identifier”.
Step 3”, select “Gene list
” to submit the list (Step 4).
5. A new window will open saying
Please note that multiple species have been detected in your gene list. You may
specific specie(s) with the List Manager on the left side of the page by
highlighting the specific specie(s) and pressing the "Select" button
(This happens whenever you submit identifiers, like gene names, that could apply to
K” to proceed.
In the top window at the left side of the screen
will be a list of species with a
number in parentheses (e.g. 81). That signifies the number of your identifiers that
correspond to a unique gene in that species; the species with the large
always appear first.
Since these are human genes,
select “Homo Sapiens” and then the button “Select
Note that you can select another species instead and the results might be
different. We will try that, below.
7. Next, open the “B
ackground” tab to select the background against which
this gene set will be analyzed.
Note that if you used, e.g. Affy U133A arrays, your
“background” will be different than if you searched the whole genome (RNAseq) or
used another type of array. That is
because the array only contains
and so you can
genes that are represented on the array. This will affect your
statistics. So It is generally important to select the appropriate backgrou
For this dataset
3’IVT Background” list, the Human
Genome U133A 2 array.
Note that you can also enter your own “background” e.g. in case you have a
specialized array, or a species with only some orthologs mapped to established
B. Analyze the dataset
w that you have the gene list uploaded and the background selected, you can begin the
functional analysis. In the center of the page, you will see at top
a series of links with a + at the lefthand side. If you click on the + it will open that
nd show you what the default selections are and offer you more choices.
For example, click on “Pathways”
; there are several pathway databases that will be
incorporated into the functional search.
Select “Reactome_Pathway” in addition
to default for this exercise.
A series of buttons “Combined view for Selected annotation”
. For this exercise,
the top button “Functional annotation clustering
A new page will appear with a hot
. In this version of David, the GO and
other terms are clustered together based on functional relatedness, to give you an
for the set of functional groups, rather than the individual terms.
This clustering algorithm is a major benefit
of using DAVID.
Note that you can also obtain a table of individual GO category enrichments by
clicking on “Functional annotation chart” instead. These two formats can lead you
to different conclusions, so it can be interesting to view both.
At the top right hand side, you will see a link “Download File”. This will allow you to
download a version of the chart with the clusters, the component categories, the
enrichment factor and p
value for each, and a list of the genes that fall into each
The enrichment factor score is important (high is better) but any category
with a small number of genes is always a bit suspect
C. Analyze using data from another species
Now, go back to your “List” main page and
select “Mus Musculus”
human as the species. You can see that 70 of your 81 gene names are mapped to this
You also need to change the
ackground on the top “Select a Bac
Highlight Mus Musculus on the list, then Select the “Use”
below that top window.
“Functional Annotation Clustering”
in the center of the page
(‘Combined view for Selected Annotation”.
What are the similarities and differences in the annotation?
Please note that different species may be
better annotated than others for certain
functions, and it is often useful to examine other species to get a more detailed view.
For example, mouse is much better annotated for developmental functions than
human. Every time you switch species, you will l
ose some genes, so this only works
well for large gene lists.
rom these functional categories, what would be your guess as to the
function of the genes in this dataset?