Vol.24 no.4 2008,pages 537–544

BIOINFORMATICS

ORIGINAL PAPER

doi:10.1093/bioinformatics/btm628

Gene expression

Multiple testing on the directed acyclic graph of gene ontology

Jelle J.Goeman

1,

*

and Ulrich Mansmann

2

1

Department of Medical Statistics,Postzone S5-P,P.O.Box 9600,2300 RC Leiden,The Netherlands and

2

IBE,Medical School,University of Munich and Department of Statistics,University of Munich,Germany

Received on July 26,2007;revised on November 28,2007;accepted on December 16,2007

Advance Access publication January 18,2008

Associate Editor:David Rocke

ABSTRACT

Motivation:Current methods for multiplicity adjustment do not

make use of the graph structure of Gene Ontology (GO) when testing

for association of expression profiles of GO terms with a response

variable.

Results:We propose a multiple testing method,called the focus

level procedure,that preserves the graph structure of Gene Ontology

(GO).The procedure is constructed as a combination of a Closed

Testing procedure with Holm’s method.It requires a user to choose

a ‘focus level’ in the GO graph,which reflects the level of specificity

of terms in which the user is most interested.This choice also

determines the level in the GO graph at which the procedure has

most power.We prove that the procedure strongly controls the

family-wise error rate without any additional assumptions on the joint

distribution of the test statistics used.We also present an algorithm

to calculate multiplicity-adjusted P-values.Because the focus level

procedure preserves the structure of the GO graph,it does not

generally preserve the ordering of the raw P-values in the adjusted

P-values.

Availability:The focus level procedure has been implemented in the

globaltest and GlobalAncova packages,both of which are available

on www.bioconductor.org.

Contact:j.j.goeman@lumc.nl

Supplementary information:Supplementary data are available at

Bioinformatics online.

1 INTRODUCTION

In microarray data analysis,there has been a shift of interest

towards methods that analyze genes in the context of their

annotation.Annotation allows genes with similar function

or location to be grouped and analyzed together based on

information fromsources such as KEGG(Ogata et al.,1999) or

Gene Ontology (Ashburner et al.,2000).

In recent years several statistical methods have been

proposed for analyzing sets of genes with similar annotation.

Most of these methods start from univariate measures of

differential expression (usually P-values) obtained from single

gene analyses,using these to test for over-representation

(‘enrichment’) of differentially expressed genes in the gene sets

of interest (Khatri and Dra˘ghici,2005;Mootha et al.,2003).

Other methods start from the raw data and test for differential

gene expression profiles over the gene set in a multivariate

way without first going through single gene testing (Dinu et al.,

2007;Goeman et al.,2004;Mansmann and Meister,2005;

Tomfohr et al.,2005).

The most popular source of annotation used in this context is

the Gene Ontology (GO) database.GO is popular because it is

not only large and detailed,but also highly structured:the GO

terms are ordered in a directed acyclic graph,in which the set of

genes annotated to a certain term (node) is a subset of those

annotated to its parent nodes.Some authors (Alexa et al.,2006;

Grossmann et al.,2006) have proposed to take into account

the GO graph structure when testing for enrichment.

In this article we propose a way to exploit the GO graph

structure for multiple testing correction.The structure of

the GO graph complicates the application of multiple testing

procedures.On the one hand,overlap between sets of genes and

correlations between genes lead to highly complicated and

unknown dependency structures between test statistics.This

means that methods that make assumptions on the joint distri-

bution of the test statistics,such as Westfall and Young (1993)

and others (Dudoit et al.,2003),should be used with caution,

while methods for which the validity is assured under any

dependency structure,such as the procedures of Bonferroni or

Holm (1979),can be overly conservative.

The structure of the GO graph also presents special opportu-

nities,however,as this structure induces logical relationships

between the null hypotheses,which can be exploited (Shaffer,

1986).However,at this moment no currently available

method makes use of the GO structure for multiple testing.

Meinshausen (2007) and Yekutieli (2007) have suggested useful

methods,but both of these are for trees only and do not work

in a directed acyclic graph.

In this article we present a sequentially rejective multiple

testing method that strongly controls the family-wise error rate

(Dudoit et al.,2003) while explicitly making use of the directed

acyclic graph structure of GO.This focus level procedure

merges Holm’s (1979) procedure with the closed testing

procedure of Marcus et al.(1976).Like the two procedures

it combines,the new procedure makes no assumptions on the

joint distribution of the test statistics.

A prerequisite for the focus level method is that the tested

null hypotheses have a logical structure that mimics the

structure of the GO graph (detailed in Section 2).The primary

application we have in mind is with global testing methods such

*To whom correspondence should be addressed.

The Author 2008.Published by Oxford University Press.All rights reserved.For Permissions,please email:journals.permissions@oxfordjournals.org

537

by guest on October 1, 2013http://bioinformatics.oxfordjournals.org/Downloaded from

by guest on October 1, 2013http://bioinformatics.oxfordjournals.org/Downloaded from

by guest on October 1, 2013http://bioinformatics.oxfordjournals.org/Downloaded from

by guest on October 1, 2013http://bioinformatics.oxfordjournals.org/Downloaded from

by guest on October 1, 2013http://bioinformatics.oxfordjournals.org/Downloaded from

by guest on October 1, 2013http://bioinformatics.oxfordjournals.org/Downloaded from

by guest on October 1, 2013http://bioinformatics.oxfordjournals.org/Downloaded from

by guest on October 1, 2013http://bioinformatics.oxfordjournals.org/Downloaded from

as the Global Test (Goeman et al.,2004,2005),Global Ancova

(Hummel et al.,2007;Mansmann and Meister,2005),PLAGE

(Tomfohr et al.,2005) and SAM-GS (Dinu et al.,2007).These

methods test a null hypothesis H

A

for a gene set A,of the form

H

A

:no gene in is associated with the response ð1Þ

The procedure we propose is not valid for all gene set testing

methods proposed for the GO graph.In particular it is not

valid for enrichment methods such as described by Khatri and

Dra

˘

ghici (2005) and Mootha et al.(2003),which have a more

complicated logical structure (Goeman and Bu

¨

hlmann,2007).

Because the focus level procedure follows the structure of the

GO graph,the collection of rejected null hypotheses is always

a coherent subgraph of the GO graph,growing from the

root node up to the most specific nodes that can be called

significant.This coherence greatly facilitates interpretation.

However,a consequence of the preservation of the graph

structure is that the ordering of the raw P-values is not

preserved in the multiplicity-adjusted P-values.It may even

occur that some multiplicity-adjusted P-values are smaller than

their corresponding unadjusted P-values.

The new procedure will be illustrated with two applications

using the biological process GO graph.One uses the Global

Test method on a microarray data set of 295 breast cancer

patients from the Netherlands Cancer Institute (Van de Vijver

et al.,2002).A second example,available in the Supplemental

information,uses the Global Ancova method on a data set

of 238 AML patients from the University Medical Center

Grosshadern in Munich (Schnittger et al.,2005).

The focus level procedure has been implemented in the R

packages globaltest and GlobalAncova.Both packages are

available on www.bioconductor.org.

2 TESTING IN THE GO GRAPH

GO (Ashburner et al.,2000) is a controlled vocabulary

that describes gene and gene product attributes.It can be

represented as three disjoint directed acyclic graphs describing

biological process,molecular function and cellular component.

Each has a single root and thousands of dependent nodes.

The three graphs are sometimes joined together by an overall

root node called gene ontology,but in this article we consider

GO as three separate graphs and speak of a GO graph.We do

not make a distinction between the two types of relationships

in the GO graph (‘is a’ and ‘part of’).When talking of the

GO graph we visualize the root node of the graph as the ‘top’

of the graph and the end nodes as the ‘bottom‘.

For the purposes of this article,we will look at a GO graph

as a structured collection of sets of genes.Genes are annotated

to GO terms,so that each node in the gene ontology graph

can be associated with a set of genes.The structure of the

graph dictates that each gene annotated to a certain node is

automatically also annotated to all ancestor nodes,so that,

viewed in terms of sets,every edge in the graph represents

a subset relationship:each child node is a subset of all its parent

nodes,and the root node in each GO graph includes the

superset of all nodes.Note,however,that not all subset

relationships necessarily correspond to an edge in the GO

graph,because genes may have multiple functions.

Every gene set has a corresponding null hypothesis that can

be tested.We assume that these null hypotheses are formulated

in such a way that they reflect the relationships in the GO

graph.In general,we let any subset relationship between gene

sets imply a logical relationship between the corresponding

null hypotheses.Let A,B and C be gene sets,and H

A

,H

B

and

H

C

the corresponding null hypotheses,then we assume that

1.If B4A,then truth of H

A

implies truth of H

B

2.If A¼B[C,then simultaneous truth of H

B

and H

C

implies truth of H

A

Note that the null hypothesis corresponding to the set of

all genes on the chip is false in this setup whenever any null

hypothesis is false.

The assumptions hold for the multivariate gene set testing

methods of Global Test (Goeman et al.,2004,2005),Global

Ancova (Hummel et al.,2007;Mansmann and Meister,2005),

PLAGE (Tomfohr et al.,2005) and SAM-GS (Dinu et al.,2007)

which test hypotheses of the form (1).Methods that test

enrichment of GO nodes (Khatri and Dra

˘

ghici,2005;Mootha

et al.,2003) are typically not consistent with these assumptions.

See Tian et al.(2005) and Goeman and Bu

¨

hlmann (2007) for an

overview of different null hypotheses used in gene set testing.

It should be remarked that the logical structure that holds for

the null hypotheses generally should not be expected to hold

for the raw,non-multiplicity adjusted test results.As the power

of the tests depends on the gene set,it may,for example,

easily occur in practice that H

B

is significant but H

A

is not,

while B4A.

3 BOTTOM-UP AND TOP-DOWN PROCEDURES

The focus level procedure we propose for the GO graph is

a combination of the procedure of Holm (1979) and the closed

testing procedure of Marcus et al.(1976).It is actually not

a single procedure,but a sequence of procedures that depends

on a user-chosen focus level.At the two extreme choices of the

focus level,the procedure is either bottom-up,and purely based

on Holm,or top-down,and purely based on closed testing.

We’ll discuss these two extremes first.Let be the chosen

significance level.

Abottom-up procedure based on Holmwould first look only

at the collection of all end node null hypotheses of the GO

graph.If this collection has m elements,it compares the

smallest P-value to /m.If the P-value is smaller,the corres-

ponding null hypothesis is declared significant,and the second

smallest P-value is compared to /(m1).The procedure con-

tinues until the kth smallest P-value is larger than /(mkþ1).

At this point,all mkþ1 largest P-values are declared

non-significant.At this point,significance can be propagated

to all ancestors in the GO-graph:a higher level GOnode can be

declared significant whenever it has any significant offspring.

It is easy to show that this procedure keeps strong control of

the family-wise error rate at level .By the proof of Holm’s

procedure (Holm,1979) the family-wise error rate is controlled

for the collection of end node null hypotheses.Further,by the

J.J.Goeman and U.Mansmann

538

logical structure of the null hypotheses,the upward propa-

gation of significance through the GO-graph can only lead to

a false rejection if there was already a false rejection in one of

the end nodes.Hence,it can never introduce a first false

rejection and so does not increase the probability of making

a first false rejection.For a formal proof,see Section 5.

An advantage of this bottom-up procedure is that it is quick,

as the number of tests that has to be preformed is smaller than

the number of nodes in the GO graph.A disadvantage is that

it is not very efficient.The GO graph fans out like a tree and

has a great number of end nodes,so that the multiple testing

correction can still be quite severe.The bottom-up procedure,

being focused at the leaves of the graph,may easily find

a strongly significant effect in a few genes in one of the end

nodes,but it may just as easily miss a weaker but more

consistent effect of a large number of genes in a gene set at

a higher level in the graph.

An alternative to the bottom-up procedure is a top-down

procedure based on the closed testing procedure of Marcus,

Peritz and Gabriel.To be able to use this procedure we must

enlarge the graph to be tested so that the collection of gene sets

in the nodes becomes closed to the union operation.That is,for

any gene sets A and B corresponding to nodes in the graph,

a node corresponding to A[B must also be included,and this

process must be continued recursively until all possible

(multiple) unions are included.New edges are added in

the expanded graph for every direct subset relationship.This

closing of the graph can dramatically increase the number of

hypotheses to be tested if the original number of hypotheses

was already large,as the size of the expanded graph increases

exponentially with the size of the original graph.We illustrate

the closing of the GO graph in Figure 1.

In the first step the top down closed testing procedure tests

only the top node,which is the union of all GO gene sets,

at level .If that test is not significant,the procedure stops.

Otherwise,at every next step it tests those gene sets of which all

supersets (i.e.all ancestors in the expanded graph) have been

declared significant,and when there are no such sets,the

procedure stops.At this point all gene sets which have not been

tested yet are declared non-significant.All tests in the closed

testing procedure can be performed at level ,but the

procedure still keeps strong control of the family-wise error

rate at level ,as has been very elegantly proved by Marcus

et al.(1976).

The property that all tests are done at level makes the

closed testing procedure very attractive.It is a very efficient

multiple testing procedure if the test statistics are highly

correlated,which makes it especially useful for the GO graph.

However,a closed testing procedure can easily miss a strong

but isolated effect in one of the end nodes in a data set where

very few genes show an effect:to be called significant,the effect

of such a small end node must remain significant when it is

watered down in all possible unions of the significant set with

all possible other sets.Moreover,unless the initial number of

nodes to be tested is very small,the computational burden

can be crippling due to the huge size of the expanded graph.

For example,in the breast cancer data set used in Section 7 the

biological process GOgraph with 2865 non-empty nodes would

expand to a closed graph with around 8.5 10

270

nodes.

The bottom-up and top-down multiple testing procedures

are mirror images of each other in many ways,and they have

opposite strengths.The bottom-up method is good at finding

local ‘hotspot’ effects,where only a single end node is highly

significant,even when most of the other nodes are non-

significant.Conversely,the top-down method is good at finding

weak global effects,where many genes show a small effect.

In the top-down method significance of GO terms low down

is greatly influenced by the (lack of) significance of large gene

sets near the top of the graph:a small node with a highly

significant raw P-value may fail to be significant in the presence

of a large non-significant node.Conversely,in the bottom-up

method significance of GO terms near the top of the graph is

greatly influenced by the presence of highly significant end

nodes:a large node with a significant rawP-value may fail to be

significant if none of its offspring terms individually reaches

the Bonferroni threshold.

4 THE FOCUS LEVEL METHOD

The focus level method strikes a balance between the two

extremes described above.The basic idea is very simple:instead

of starting at the top or at the bottom of the graph,we start

somewhere the middle,at a pre-chosen level in the GO graph

called the focus level.We then use the bottom-up method to

extend significance to higher levels and the top-down method

to extend it to lower ones.

In the most general form a focus level is simply a collection

of GO terms.However,for an efficient procedure it is

recommended that the focus level truly represents a level in

the graph,meaning that no term in the focus level has either

ancestors or offspring which are also in the focus level.It is also

recommended that all other GO nodes of interest are either

ancestors or offspring (or at least ancestors of offspring) of

focus level nodes,because otherwise the focus level procedure

will not reach all nodes of interest.A third consideration is

computation time,as discussed in Section 6.

The full GOgraph G can nowbe split at the focus level into m

subgraphs F

1

,...,F

m

,each with a single focus level node as

root node and consisting of all its offspring nodes.Note that

the subgraphs may partly overlap.Each of these subgraphs

will be expanded separately in the same way as in the closed

Fig.1.Illustration of the expansion (‘closing’) of a graph for use in a

Closed Testing procedure.Left:original graph.Right:expanded graph.

Multiple testing on the GO graph

539

testing procedure (Section 3).Call these expanded subgraphs

F

1

;...;

F

m

.

We also define an expanded scope graph

G.Its nodes are

given by the union of all nodes in the expanded graphs

G [

S

m

i¼1

F

i

.It gets a newset of edges which correspond to all

subset relationships:r 2

G is a parent of s 2

G if the

corresponding gene sets R of r and S of s fulfil RS.

The focus level multiple testing procedure revolves around

a collection H

G of testable hypotheses.At the start of

the procedure H is the collection of focus level hypotheses.

The focus level procedure also uses a multiple testing correction

factor h,which we call Holm’s factor.It starts as m,the number

of elements in H.

The steps of the focus level procedure are outlined in Table 1.

This procedure keeps the family-wise error rate at level ,

as we shall prove in the Section 5.The algorithm is presented

in its most simple form here.Several adjustments can be made

to increase computation speed;we consider these in Section 6.

It is easy to adapt the algorithmto let it produce multiplicity-

adjusted P-values.A multiplicity adjusted P-value (Dudoit

et al.,2003) for a hypothesis is defined as the smallest -level at

which that hypothesis will be called significant.For the focus

level procedure these can be calculated as follows.Simply start

at ¼0 and run the algorithm.Amend the algorithm so that

every time no (more) rejection can be made,it does not stop,

but increases to the next value that allows a new rejection,

and continues.All rejected hypotheses get an adjusted P-value

equal to the -level at which they were rejected.Stop the

algorithm only when reaches 1,at which point all unrejected

hypotheses get adjusted P-value 1.

The focus level procedure is a true generalization of the

top-down and bottom-up procedures described in Section 3.

If the root node is taken as the focus level,the focus level

procedure is exactly the top-down closed testing procedure

(nothing happens in steps 2 and 4).Similarly,if the collection of

all end nodes is taken as the focus level,the focus level procedure

is exactly the bottom-up procedure (nothing happens in step 3).

The focus level procedure presents a generalized procedure

whose properties are between the bottom-up and top-down

procedures.Just as the top-down procedure has good power for

detecting weak global effects and the bottom-up procedure has

good power for detecting strong local effects,the focus level

procedure has good power for detecting intermediate effects

near the focus level.In the top-down procedure the significance

of end nodes is influenced by significance of nodes near the

root of the graph,and in the bottom-up procedure the signifi-

cance of nodes near the root is influenced by significance of

end nodes.Similarly,in the focus level procedure significance of

nodes far from the focus level is influenced by the significance

of nodes at the focus level.The choice of the focus level should

therefore be considered carefully:the focus level should ideally

be chosen at the level of specificity that is of most interest to the

researcher,taking into account the restrictions of computability

(see Section 6).

The significant subgraph returned by the focus level

procedure is always a coherent graph,because step 2 of the

algorithm guarantees that there is a path from any significant

node through the significant subgraph to the root node.In this

sense the result of the procedure reflects Assumption 1 of

Section 2:that assumption on truth and falsehood of null

hypotheses also holds for rejection and non-rejection.Similarly,

reflecting Assumption 2,a significant node will often have

at least one offspring node that is significant.This does not

always happen,however.In the first place,it may be that the

significance of the higher level node is due to genes which are

annotated to the higher level GO term but not to any of its

offspring terms.Alternatively,it may be that the significant

signal of the higher level node is due to the combination of

weak signals from its offspring nodes,none of which is strong

enough to be detected by itself.In both of these cases the

interpretation of the result should be that there is a clear signal

in the general GO term,which cannot be safely attributed

to any specific offspring term.

It is interesting to note that the focus level algorithm (just as

the bottom-up procedure) can produce multiplicity-adjusted

P-values which are smaller than the corresponding raw

P-values.This can only happen to terms above the focus level

terms,and occurs in step 2,where hypotheses are rejected based

on the P-values of their offspring,without reference to the

P-value of the hypothesis itself.An example of this effect will be

shown in Section 7.If this property of the procedure is felt to

be undesirable for interpretational reasons,the algorithm can

be easily amended to only reject in step 2 if the raw P-value of

the ancestor hypotheses is smaller than .However,this may

result in a significant graph that lacks coherence.

5 PROOF OF FWER CONTROL

As the focus level procedure merges Holm’s procedure with

the closed testing procedure,the proof that the new procedure

controls the Family-wise error rate essentially merges the two

elegant proofs of Marcus et al.(1976) and Holm (1979).

For each subgraph F

i

,i ¼1,...,n,let T

i

be the collection

of nodes in F

i

that correspond to true null hypotheses.As each

node corresponds to a set of genes,we can represent T

i

as

Table 1.The focus level algorithm

Repeat

1.Test phase:Reject all hypotheses in H with raw

P-value /h.

2.Upward phase:For every new hypothesis rejected in

step 1,reject all ancestors in

G.

3.Downward phase:In each of

F

1

;...;

F

m

find

any hypotheses for which all parent hypotheses in

that subgraph have been rejected.Add these hypotheses

to the testable set H.

4.Holm’s phase:Recalculate Holm’s factor h as the

number of subgraphs f

F

1

;...;

F

m

g which have

non-empty intersection with H.

Until no new rejections occur

J.J.Goeman and U.Mansmann

540

fT

i;1

;...;T

i;n

i

g,a collection of n

i

gene sets corresponding to true

null hypotheses in F

i

.Let

T

i

¼

[

n

i

j¼1

T

i;j

:

Let k¼#{i:T

i

6

¼;},the number of subgraphs containing true

null hypotheses,and let i

1

,...,i

k

be the indices i such that

T

i

6¼;.Let H

1

,...,H

k

be the null hypotheses corresponding to

T

i

1

;...;T

i

k

.Note that by assumption 2 in Section 2 H

1

,...,H

k

are themselves true null hypotheses.

We prove that the focus level procedure keeps the family-wise

error at level by showing that

1.The first true null hypothesis that is rejected is one

of H

1

,...,H

k

.This must happen during step 1 of the

algorithm.

2.If all H

1

,...,H

k

have raw P-values greater than /k,

none of them will be rejected.

If 1 and 2 are true,and p

1

,...,p

k

are the P-values of H

1

,...,H

k

,

then

Pðno false rejectionÞ Pðminðp

1

;...;p

k

Þ4=kÞ

so that

FWER Pðat least one of p

1

;...;p

k

=kÞ

P

[

k

i¼1

fp

i

=kg

X

k

i¼1

Pðp

i

=kÞ :

which proves the family-wise error control of the method.

To prove 1,first remark that by Assumption 1 in Section 2

any offspring in

G of a true hypothesis is itself a true hypothesis.

Therefore,step 2 can never lead to a first false rejection:if the

rejected ancestor in step 2 was a true null,its offspring rejected

in step 1 was already a true null itself.

The first false rejection must therefore take place in step 1.

To show that the first false rejection in step 1 is always from

among H

1

,...,H

k

,remark that by the construction of the closed

graph

F

i

the null hypothesis H

i

is contained in the graph

F

i

,and

that by the construction of T

i

every other true null hypothesis in

that graph is offspring of H

i

in that graph.Therefore no true null

other than those in H

1

,...,H

k

can be added to Hin step 3 before

at least one H

1

,...,H

k

has been rejected.Therefore,as long

as no false rejection has been made,Hwill only contain true null

hypotheses from among H

1

,...,H

k

.

To prove 2,remark that if all p

1

,...,p

k

are larger than /k,

none of H

1

,...,H

k

will be rejected before the Holm’s factor h

drops below k.But the Holm’s factor will not drop below

k before all but k1 subtrees

F

i

have been fully rejected,which

cannot happen as long as H

1

,...,H

k

remain unrejected.

6 COMPUTATIONAL ISSUES

In a practical implementation of the focus level algorithm

in a large graph,efforts must be made to reduce the

computational burden.In this section we describe some

amendments to the algorithm that can be used to reduce this

burden,although sometimes at a small loss of power.

The greatest bottleneck of the algorithm is the potentially

enormous size of the expanded scope graph

G,which can

become especially large if some of the focus level nodes

have a large number of offspring nodes.To keep

G small,

it is,therefore,important to keep the expanded subgraphs

F

1

;...;

F

m

small.This can be done for each subgraph by

finding a small number of atom sets that allow construction of

all offspring sets of the focus level node as unions of these

atoms.Using these atoms it is possible to avoid redundant sets

when making the closed graphs as in Figure 1,while still

including all the original gene sets in the graph.For example,

in the data set of Section 7,a closed graph of all 2865 biological

process GO terms would contain 2

2865

nodes,while reducing

the 2865 sets to 900 atoms leads to a closed graph of 2

900

nodes

which still includes all 2865 original sets.

We construct atoms for each subgraph as follows.Start with

the a collection C

ð1Þ

¼ fC

ð1Þ

1

;...;C

ð1Þ

k

g of gene sets.the algorithm

is given in Table 2.Note that the atom construction is not

necessarily unique and that a more sophisticated algorithm

may be able to find a smaller closed graph.

It is important to keep the number of atoms small,because the

number of nodes in the final closed graph

F

i

is 2

#atoms

.We

recommend not choosing any focus level nodes whose subgraphs

contain more than 9–12 atoms (500–4000 subgraph nodes).

Computation speed can also be gained by avoiding the

laborious construction of the whole graph

G,most of which will

never be reached by the algorithm.We can avoid construction

of

G by creating each expanded graph

F

i

only when it is needed,

that is when the ith focus term is declared significant.Avoiding

construction of

G may entail some loss of power in step 2 of the

algorithm,because in that step only known subset relationships

from the original GO graph G and from already expanded

subgraphs can be exploited for rejection.

These measures allow a good reduction of computation time.

Note that due to the stepwise nature of the algorithm,finding

only the smallest few adjusted P-values is much quicker than

finding all adjusted P-values.In the current R implementation

for Global Test,calculating all 2865 adjusted P-values of the

Biological process GO graph for the Netherlands Cancer

Institute data (see Section 7) takes about 42min on a 3.2GHz

PC.Calculating only the 100 smallest multiplicity-adjusted

P-values takes just 5min.

7 APPLICATION:BREAST CANCER

We have applied the focus level procedure to the Netherlands

Cancer Institute Breast Cancer Data (Van deVijver et al.,2002).

This is a data set of 295 breast cancer patients followed up to

18 years.Microarray data (Rosetta technology) are available for

Table 2.Finding atoms

For i in 1,...,k

1.Choose the ith atom A

i

from among the elements of C

(i)

such that A

i

6

¼;and#A

i

is minimal.

2.Let C

ðiþ1Þ

¼ fC

ðiÞ

1

\A

i

;...;C

ðiÞ

k

\A

i

g.

3.Terminate if C

(i þ1)

contains only empty sets.

Multiple testing on the GO graph

541

these patients.We used the same quality preselection of

4919 genes as the original authors.Survival time and censoring

indicators are available for each patient.We are interested in

finding GO terms associated with survival of patients.GO

information was obtained using the AnnBuilder package (Zhang

et al.,2003).Of all GO terms,2865 from the biological process

ontology corresponded to non-empty gene sets in this data set.

The focus level for each of these three graphs was calculated

in an automated way.We first found the ‘feasible terms’ as all

GO terms whose offspring could all be written in terms of

unions of no more than 9 atom sets.Then we defined the focus

level as all feasible terms which had no parents among the

feasible terms.This gave us a focus level of maximumgenerality

given a reasonable computation time.For the biological

process graph the focus level calculated in this way consisted

of 589 terms;262 terms were above the focus level and 2002

below.For calculating raw P-values we used the global test for

survival data of Goeman et al.(2005).

Figure 2 compares the multiplicity-adjusted P-values from

the focus level method to the raw P-values.The figure clearly

shows that the focus level multiplicity adjustment does not

preserve the ordering of the raw P-values.The dominant

diagonal line around y ¼xþlog

10

(589) gives the GO terms at

or close to the focus level,while the spread of points around the

line is of GO terms far from the focus level.The figure also

shows examples of multiplicity-adjusted P-values which are

smaller than the corresponding raw P-values.

Next we compared the number of gene sets found with the

focus level method to Holm’s procedure,which also controls

the family-wise error rate without any assumptions on the joint

distribution of the test statistics.Table 3 gives the number of

significant GO terms for Holm’s (1979) procedure,and for

Holm’s procedure incorporating the logical relationships in

the graph (‘Holm plus’).The latter procedure tests all nodes as

Holm’s procedure,but rejects not only the term itself but also

all its ancestor terms.The table also gives the number of

rejected hypothesis for the focus level procedure for different

choices of the focus level.The focus levels are defined in

relation to the maximum number of atoms below each focus

level term,as explained above.Table 3 shows that the focus

level procedure rejects more gene sets than Holm’s procedure.

In this data set it seems preferable to choose a very general

focus level:the number of gene sets found when choosing a high

focus level is greater than the number found by the extended

Holm procedure,while much fewer gene sets are found when

choosing a low focus level.Table 4 gives the overlap between

the GO terms found by the focus level procedure at different

Table 3.Number of rejected biological process gene ontology terms in

the Netherlands Cancer Institute breast cancer data set for Holm’s

procedure and the focus level procedure at different choices of the focus

level

Procedure Number of

focus level

-level

0.1 0.01 0.001 1e04 1e05 1e06

Holm 2865 439 247 98 38 9 0

Holm plus 2865 475 312 161 86 44 0

Bottom up 840 423 252 135 80 25 0

Focus level 1 960 466 283 142 73 36 0

Focus level 2 918 506 305 162 80 37 0

Focus level 3 851 508 313 164 89 51 26

Focus level 4 770 513 307 166 89 51 26

Focus level 5 711 518 317 169 85 53 32

Focus level 6 668 525 320 172 90 50 26

Focus level 7 641 536 318 165 87 40 26

Focus level 8 607 535 333 191 97 49 26

Focus level 9 589 539 336 197 99 52 26

Focus level 10 572 543 334 196 98 59 25

Focus level 11 559 544 342 193 99 63 26

Focus level 12 534 545 342 191 99 68 26

Table 4.Overlap between the significant GOgraphs found by the focus

level procedure at different heights of the focus level at ¼0.01

(numbers of GO terms)

1 2 3 4 5 6 7 8 9 10 11 12

1 283

2 268 305

3 252 287 313

4 246 281 304 307

5 250 275 298 300 317

6 250 275 297 299 316 320

7 250 275 294 295 311 315 318

8 254 278 300 297 312 316 315 333

9 253 277 300 295 310 313 312 330 336

10 252 276 299 294 309 312 311 329 334 334

11 253 277 299 294 309 312 311 329 334 333 342

12 242 265 287 282 297 300 299 317 322 321 330 342

Focus level height Focus level height.

Fig.2.Base 10 logarithmof raw P-values for the biological process GO

graph versus base 10 logarithm of multiplicity-adjusted P-values from

the focus level procedure.The dotted line is the line of equality.

J.J.Goeman and U.Mansmann

542

choices of the focus level at ¼0.01.The numbers suggest good

agreement between the GO graphs found even at widely

different choices of the focus level.

Visualization of the significant GO graph obtained by the

focus level method can be done using the R package Rgraphviz

(Gentry,2005).Figure 3 shows the significant subgraph of the

biological process graph at ¼10

4

.This graph gives a succinct

picture of the processes most clearly associated with survival of

breast cancer patients.It has prominent clusters of processes

involved in cell cycle and related processes such as DNA

replication.

8 APPLICATION:AML

Asecondapplicationusing Global Ancova (Hummel et al.,2007;

Mansmann and Meister,2005) on an AML data set (Schnittger

et al.,2005) is presented in the Supplemental information.

9 DISCUSSION

Two things are important when adjusting for multiple testing

in the Gene Ontology graph.In the first place,the GO graph is

structured,and it is wasteful not to make use of that structure.

Secondly,due to the unknown and possibly highly irregular

dependency structures between test statistics,it is important not

to make any assumptions on the joint distribution of the test

statistics.In this article we have presented the focus level

method:a multiple testing method that controls the family-wise

error rate on the GO graph that makes use of the graph

structure and makes no further assumptions.Because the

procedure makes use of the logical structure of the GO graph,

the procedure has more power than traditional procedures such

as Holm’s.To our knowledge the focus level method is the first

method that allows the structure of Gene Ontology to be used

when correcting for multiple testing.

The significant gene sets that come out of the focus level

methods can be presented as a subgraph of the GO graph.

Table 5.Legend of Figure 3

1 biological process 53 DNA replication

2 response to stimulus 54 DNA recombination

3 metabolic process 55 protein modification

4 cellular process 56 dephosphorylation

5 biological regulation 57 second-messenger-mediated

6 developmental process signaling

7 response to stress 58 M phase

8 response to endogenous 59 interphase

stimulus 60 mitotic cell cycle

9 primary metabolic process 61 regulation of progression

10 macromolecule metabolic through cell cycle

process 62 microtubule-based process

11 cellular metabolic process 63 chromosome organization and

12 cell communication biogenesis (sensu Eukaryota)

13 cell division 64 chromosome localization

14 cell cycle 65 establishment of organelle

15 chromosome segregation localization

16 regulation of biological process 66 cell death

17 cell organization and 67 DNA-dependent DNA

biogenesis replication

18 localization 68 protein amino acid

19 cellular developmental process dephosphorylation

20 death 69 phosphoinositide-mediated

21 response to DNA damage signaling

stimulus 70 M phase of meiotic cell cycle

22 protein metabolic process 71 M phase of meiotic cell cycle

23 nucleobase,nucleoside,72 interphase of mitotic cell cycle

nucleotide and nucleic acid 73 regulation of progression through

metabolic process mitotic cell cycle

24 biopolymer metabolic process 74 cell cycle checkpoint

25 cellular macromolecule 75 positive regulation of

metabolic process progression through cell cycle

26 phosphorus metabolic process 76 microtubule cytoskeleton

27 signal transduction organization and biogenesis

28 cytokinesis 77 sister chromatid segregation

29 cell cycle process 78 establishment of chromosome

30 regulation of cellular process localization

31 positive regulation of biological 79 programmed cell death

process 80 regulation of a molecular function

32 organelle organization and 81 mitosis

biogenesis 82 meiosis

33 cellular localization 83 G2/M transition of mitotic cell

34 establishment of localization cycle

35 protein localization 84 positive regulation of progression

36 cell differentiation through mitotic cell cycle

37 DNA metabolic process 85 spindle checkpoint

38 cellular protein metabolic 86 spindle organization and

process biogenesis

39 biopolymer modification 87 regulation of programmed cell

40 phosphate metabolic process death

41 intracellular signaling cascade 88 apoptosis

42 meiotic cell cycle 89 regulation of catalytic activity

43 cell cycle phase 90 mitotic sister chromatid

44 regulation of cell cycle segregation

45 positive regulation of cellular 91 regulation of mitosis

process 92 chromosome condensation

46 cytoskeleton organization and 93 regulation of apoptosis

biogenesis 94 regulation of hydrolase activity

47 chromosome organization and 95 mitotic chromosome

biogenesis condensation

48 organelle localization 96 regulation of exit from mitosis

49 establishment of cellular 97 regulation of caspase activity

localization 98 positive regulation of exit from

50 protein complex localization mitosis

51 cell development 99 negative regulation of caspase

52 DNA repair activity

Fig.3.The significant subgraph of the biological process GO graph

according to the focus level procedure at ¼0.0001.See Table 5 for

a legend to this figure.The nodes colored grey are focus level nodes.

Multiple testing on the GO graph

543

This makes it easy to interpret the results of the experiment,

as the significant gene sets are always presented in context.The

focus level method tends to produce coherent graphs because

the algorithm favors finding significant gene sets close to other

significant sets.

The focus level method requires the user to specify a focus

level,which is the initial level of the graph in which the algorithm

searches for significant sets;it only delves deeper into the graph

at places where significance is found.The choice of a focus level

gives the researcher some control over the multiple testing

procedure,specifying at which level of detail in the GO graph

it should have most power.This flexibility of the procedure is an

important asset of the focus level method,as it allows for more

biological input into the statistical analysis.There are some

computational constraints that exclude some of the general

nodes as focus level nodes,but in our experience these con-

straints are flexible enough to fit the needs of most researchers.

Because the focus level method preserves the structure of the

GO graph when correcting for multiple testing,it does not

preserve the ordering of the raw P-values.Therefore,adjusted

P-values from the focus level method should never be

interpreted individually,but only in the context of the

significant GO graph,because the interpretation of an

individual adjusted P-value should depend on the location in

the graph where it occurs.The primary result of the focus level

method is a significant GO subgraph,not a list of adjusted

P-values.If the primary interest is in individual GO nodes

instead of in the significant graph as a whole,other multiple

testing correction methods may be more appropriate.

The focus level procedure controls the family-wise error rate

rather than the false discovery rate (Benjamini and Hochberg,

1995).This is because the concept of the false discovery

rate was designed for an unstructured collection of potential

discoveries,in which discoveries are exchangeable and it makes

sense to talk about a rate of false discoveries.This concept

cannot be immediately transferred to a hierarchical graph-like

structure with logical relationships.Control of the false

discovery rate in the closed graph (Figure 1),for example,

does not imply control of the false discovery rate in the original

graph.The concept of the family-wise error rate,on the other

hand,is easily applicable to graphs and much more consistent

with the idea that the graph that results from the focus level

procedure in fact constitutes a single ‘discovery’.

It is a frequently voiced general criticism of GO that it does

not represent an objective view of biology.The genes annotated

are mostly the result of disease related research and the

construction of GO is strongly biased in this direction.

However,this criticism does not reduce the need for statistical

procedures on acyclic graphs derived from ontologies,because

ontologies will remain an essential tool for organizing complex

knowledge in genomic research.The methods presented in this

article for GO can easily be applied to more reliable ontologies

that will be available in the future.

ACKNOWLEDGEMENTS

The authors thank Prof.Laura van ’t Veer (Netherlands Cancer

Institute) and Prof.Stephan Bohlander (Internal Medicine III,

University Hospital,University of Munich) for kindly

providing their data.The research of J.J.G.was supported by

the Netherlands Bioinformatics Centre,Biorange project 1.3.2.

The research of UM was supported by the German National

Genome Research Network,project 01 GR 0459.

Conflict of Interest:none declared.

REFERENCES

Alexa,A.et al.(2006) Improved scoring of functional groups from gene

expression data by decorrelating GO graph structure.Bioinformatics,22,

1600–1607.

Ashburner,M.et al.(2000) Gene Ontology:tool for the unification of biology.

Nat.Genet.,25,25–29.

Benjamini,Y.and Hochberg,Y.(1995) Controlling the false discovery rate:

a practical and powerful approach to multiple testing.J.Roy.Stat.Soc.

B Met.,57,289–300.

Dinu,I.et al.(2007) Improving gene set analysis of microarray data by SAM-GS.

BMC Bioinformatics,8,242.

Dudoit,S.et al.(2003) Multiple hypothesis testing in microarray experiments.

Stat.Sci.,18,71–103.

Gentry,J.(2005) Rgraphviz:Provides Plotting Capabilities for R Graph Objects,

R package version 1.10.0.

Goeman,J.J.and Bu

¨

hlmann,P.(2007) Analyzing gene expression data in terms

of gene sets:methodological issues.Bioinformatics,23,980–987.

Goeman,J.J.et al.(2004) A global test for groups of genes:testing association

with a clinical outcome.Bioinformatics,20,93–99.

Goeman,J.J.et al.(2005) Testing association of a pathway with survival using

gene expression data.Bioinformatics,21,1950–1957.

Grossmann,S.et al.(2006) An improved statistic for detecting over-representated

Gene Ontology annotations in gene sets.In A.Apostolico et al.(eds.) Lecture

Notes in Computer Science,Springer,New York,vol.3909,pp.85–98.

Holm,S.(1979) A simple sequentially rejective multiple test procedure.

Scandinavian J.Stat.,6,65–70.

Hummel,M.et al.(2007) GlobalANCOVA:exploration and assessment of gene

group effects.Bioinformatics,doi:10.1093/bioinformatics/btm531.

Khatri,P.and Dra

˘

ghici,S.(2005) Ontological analysis of gene expression

data:current tools,limitations,and open problems.Bioinformatics,21,

3587–3595.

Mansmann,U.and Meister,R.(2005) Testing differential gene expression in

functional groups:Goeman’s global test versus an ANCOVA approach.

Method.Inform.in Med.,44 (3),449–453.

Marcus,R.et al.(1976) On closed testing procedures with special reference to

ordered analysis of variance.Biometrika,63,655–660.

Meinshausen,N.(2007) Hierarchical testing of variable importance.Unpublished

preprint available at www.stats.ox.ac.uk/

meinshau.

Mootha,V.K.et al.(2003) PGC-1 alpha-responsive genes involved in oxidative

phosphorylation are coordinately downregulated in human diabetes.

Nat.Genet.,34,267–273.

Ogata,H.et al.(1999) KEGG:Kyoto Encyclopedia of Genes and Genomes.

Nucleic Acids Res.,27,29–34.

Schnittger,S.et al.(2005) Nucleophosmin gene mutations are predictors of

favorable prognosis in acute myelogenous leukemia with a normal karyotype.

Blood,106,3733–3739.

Shaffer,J.P.(1986) Modified sequentially rejective multiple test procedures.

J.Am.Stat.Assoc.,81,826–831.

Tian,L.et al.(2005) Discovering statistically significant pathways in expression

profiling studies.Proc.Natl Acad.of Sci.USA,102,13544–13549.

Tomfohr,J.et al.(2005) Pathway level analysis of gene expression using singular

value decomposition.BMC Bioinformatics,6,225.

Van de Vijver,M.J.et al.(2002) A gene-expression signature as a predictor of

survival in breast cancer.New Engl.J.Med.,347,1999–2009.

Westfall,P.H.and Young,S.S.(1993) Resampling-Based Multiple Testing:

Examples and Methods for p-value Adjustment.Wiley,New York.

Yekutieli,D.(2007) Unpublished preprint available at www.math.tau.ac.il/

yekutiel.

Zhang,J.H.et al.(2003) An extensible application for assembling annotation for

genomic data.Bioinformatics,19,155–156.

J.J.Goeman and U.Mansmann

544

## Comments 0

Log in to post a comment