Using DAVID for GO and pathway enrichment analysis

illinoiseggoSoftware and s/w Development

Oct 28, 2013 (3 years and 5 months ago)

285 views

Using DAVID

for GO and pathway enrichment analysis


For advanced help please see tutorial on the website
(
http://david.abcc.ncifcrf.gov/home.jsp
) and
Huang da W
,
Sherman BT
,
Lempicki RA
.,
Nat
Protoc.

2009;4:44
-
57).


A
. Upload or paste a gene list


To start DAVID, first click on “Functional Annotation” under “Shortcut to David tools” at the
left of the home page.
This will take you directly to the “Upload” Tab of the functi
onal
annotation page.


To upload a file, you can
either paste a list of gene identifiers into the window, or
upload a file with gene identifies saved as a txt file
. The identifiers can be a variety of
things:


Official gene names; affymetrix, etc probeset

IDs; ensembl gene names, etc.
Open the tab
under “step 2: Select Identifier” to see the full list.


*
Note that whatever you use it must be the official identifiers from one of those lists, if the
names are misspelled or etc. David will not recognize the
m.


You can also click on the link “Upload Help” to get further instruction on composing the
gene list.


1
.
For this exercise, we will use a gene list
from human, with official gene symbols as
the identifier (
ht
tp://DAVID
-
GENELIST
-
1.html

). You can either upload the txt file
by browsing or just past the gene names into the window.


2
. Now, since these are gene names, we need to select “official_gene_symbol” from
the pulldown menu under

“step 2: select identifier”.


3
.
Under “
Step 3”, select “Gene list



4. Then,
click “Submit
” to submit the list (Step 4).


5. A new window will open saying



Please note that multiple species have been detected in your gene list. You may
select a
specific specie(s) with the List Manager on the left side of the page by
highlighting the specific specie(s) and pressing the "Select" button
….”


(This happens whenever you submit identifiers, like gene names, that could apply to
many species).


Click “O
K” to proceed.


6.
In the top window at the left side of the screen

will be a list of species with a
number in parentheses (e.g. 81). That signifies the number of your identifiers that
correspond to a unique gene in that species; the species with the large
st numbers
always appear first.


Since these are human genes,
select “Homo Sapiens” and then the button “Select
species”.

Note that you can select another species instead and the results might be
different. We will try that, below.


7. Next, open the “B
ackground” tab to select the background against which
this gene set will be analyzed.
Note that if you used, e.g. Affy U133A arrays, your
“background” will be different than if you searched the whole genome (RNAseq) or
used another type of array. That is

because the array only contains


and so you can
only measure


genes that are represented on the array. This will affect your
statistics. So It is generally important to select the appropriate backgrou
n
d.


For this dataset
, select

from

the “Affymetrix

3’IVT Background” list, the Human
Genome U133A 2 array.


Note that you can also enter your own “background” e.g. in case you have a
specialized array, or a species with only some orthologs mapped to established
model organisms.


B. Analyze the dataset

No
w that you have the gene list uploaded and the background selected, you can begin the
functional analysis. In the center of the page, you will see at top


1.

a series of links with a + at the lefthand side. If you click on the + it will open that
section a
nd show you what the default selections are and offer you more choices.
For example, click on “Pathways”
; there are several pathway databases that will be
incorporated into the functional search.
Select “Reactome_Pathway” in addition
to default for this exercise.


2.

A series of buttons “Combined view for Selected annotation”
. For this exercise,
select

the top button “Functional annotation clustering



A new page will appear with a hot
-
linked table
. In this version of David, the GO and
other terms are clustered together based on functional relatedness, to give you an
overall enrichment
for the set of functional groups, rather than the individual terms.
This clustering algorithm is a major benefit
of using DAVID.


Note that you can also obtain a table of individual GO category enrichments by
clicking on “Functional annotation chart” instead. These two formats can lead you
to different conclusions, so it can be interesting to view both.


3.

At the top right hand side, you will see a link “Download File”. This will allow you to
download a version of the chart with the clusters, the component categories, the
enrichment factor and p
-
value for each, and a list of the genes that fall into each
ca
tegory.
The enrichment factor score is important (high is better) but any category
with a small number of genes is always a bit suspect
.


C. Analyze using data from another species

1.
Now, go back to your “List” main page and
select “Mus Musculus”

instea
d of
human as the species. You can see that 70 of your 81 gene names are mapped to this
species.


You also need to change the
b
ackground on the top “Select a Bac
k
ground” list

in the
Background tab.
Highlight Mus Musculus on the list, then Select the “Use”

button

below that top window.


2.
Next,

select

“Functional Annotation Clustering”

in the center of the page
(‘Combined view for Selected Annotation”.


What are the similarities and differences in the annotation?


Please note that different species may be
better annotated than others for certain
functions, and it is often useful to examine other species to get a more detailed view.

For example, mouse is much better annotated for developmental functions than
human. Every time you switch species, you will l
ose some genes, so this only works
well for large gene lists.


D.

F
rom these functional categories, what would be your guess as to the
overall
function of the genes in this dataset?