Vol.22 no.14 2006,pages 1730–1736
doi:10.1093/bioinformatics/btl161
BIOINFORMATICS ORIGINAL PAPER
Gene expression
How accurately can we control the FDR in analyzing
microarray data?
SinHo Jung
1,
and Woncheol Jang
2
1
Department of Biostatistics and Bioinformatics,Duke University,NC 27710,USA and
2
Institute of Statistics and
Decision Sciences,Duke University,NC 27705,USA
Received on December 5,2005;revised on April 11,2006;accepted on April 23,2006
Advance Access publication April 27,2006
Associate Editor:David Rocke
ABSTRACT
Summary:We want to evaluate the performance of two FDRbased
multiple testing procedures by Benjamini and Hochberg (1995,J.R.
Stat.Soc.Ser.B,57,289–300) andStorey(2002,J.R.Stat.Soc.Ser.B,
64,479–498) in analyzing real microarray data.These procedures
commonly require independence or weak dependence of the test
statistics.However,expression levels of different genes from each
array are usually correlated due to coexpressing genes and various
sources of errors from experimentspecific and subjectspecific
conditions that are not adjusted for in data analysis.Because of high
dimensionality of microarray data,it is usually impossible to check
whether the weak dependence condition is met for a given dataset or
not.We propose to generate a large number of test statistics from a
simulation model which has asymptotically (in terms of the number of
arrays) the same correlation structure as the test statistics that will
be calculated from the given data and to investigate how accurately
the FDRbased testing procedures control the FDR on the simulated
data.Our approach is to directly check the performance of these
procedures for a given dataset,rather than to check the weak depen
dency requirement.We illustrate the proposed method with real
microarray datasets,one where the clinical endpoint is disease
group and another where it is survival.
Contact:jung0005@mc.duke.edu
1 INTRODUCTION
Microarrays are high throughput technology measuring the expres
sion levels of a large number of genes simultaneously.Discovering
the genes whose expression levels are associated with a clinical
endpoint,such as disease type or survival,involves a serious mul
tiple testing problem.Suppose that,for j ¼1,...,m,we want to test
the null hypothesis
H
j
:gene j has no association with the clinical outcome
against the alternative hypothesis
HH
j
:gene j has some association with the clinical outcome:
Without an appropriate adjustment for the multiplicity,many
discoveries will be false positives.Two multiple testing type I
error rates have been used to tackle this issue:familywise error
rate (FWER) and false discovery rate (FDR).Usually,expression
levels on different genes are correlated due to coexpressing genes
and shared noises from experimentspeciﬁc and subjectspeciﬁc
conditions that are not adjusted for in analysis.The correlation,
together with the high dimensionality,has a strong inﬂuence on
the testing results controlling these type I errors.FWERis deﬁned as
the probability of at least one false positive.So,a multiple testing
procedure controlling the FWERrequires the null distribution of the
test statistics while maintaining the correlation among test statistics.
The permutation method proposed by Westfall and Young (1993)
usually approximates the null distribution for FWERcontrol well.
Refer to Huang et al.(2005) for the cases where the permutation
method may not be appropriate.
Some investigators consider controlling the probability of one
false rejection out of thousands of genes to be too strict and advocate
an FDRbased multiple testing instead.Suppose that there are
m genes among which the null hypotheses,H
j
,are true for
m
0
genes,called nonprognostic genes,and the alternative hypo
theses,
HH
j
,are true for m
1
(¼ m m
0
) genes,called prognostic
genes.Based on the calculated pvalues,we may reject or accept
the null hypotheses.Let R denote the number of genes for which the
null hypotheses are rejected (discovery),and R
0
denote,among
these R genes,the number of genes that the null hypotheses are
true (false discovery).Then,the FDR is deﬁned as the expected
value of the proportion R
0
/R.
Benjamini and Hochberg (1995) propose a stepup procedure to
control the FDR.They prove that this procedure conservatively
controls the FDR if the test statistics for the m
0
nonprognostic
genes are independent.The conservativeness becomes more serious
as the proportion of prognostic genes m
1
/mincreases.Benjamini and
Yekutieli (2001) loosen the independence assumption among m
0
nonprognostic genes to a weak dependence assumption called
positive regression.
Pointing out the conservativeness of the Benjamini and
Hochberg’s procedure,Storey (2002) proposes a procedure that
is considered to be more accurately controlling the FDR when
m is large as in most microarray data.Under the independence
assumption on m genes,he shows that the FDR is approximated by
FDR ¼
m
0
a
R
‚
and replaces m
0
with an estimated value
^
mm
0
.He later loosens the
independence assumption for a weak dependence assumption
To whom correspondence should be addressed.
1730
The Author 2006.Published by Oxford University Press.All rights reserved.For Permissions,please email:journals.permissions@oxfordjournals.org
among whole m genes (Storey,2003) or among m
0
nonprognostic
genes (Storey and Tibshirani,2001).Jung (2005) derives a sample
size formula for the Storey’s procedure under the weak dependence
assumption among m genes.
For an accurate control of the FDR,we need the joint distribution
of (R
0
,R),which will be decided by the joint distribution of the test
statistics,under the true effect size for each gene and the depen
dency among m test statistics.The aforementioned FDRbased
procedures have been shown to be reliably controlling the FDR
through simulations based on the independence or weak dependence
assumptions.However,because of highdimensionality of micro
array data and complexity of the real correlation structure,it is
almost impossible to check whether the weak dependence condition
holds for a given dataset or not.As an approach to tackling this
difﬁculty,we propose to generate a large number of randomvectors
from the asymptotic distribution of the test statistics,to apply an
FDRbased procedure and to see how accurately the procedure
controls the FDR on the simulated data.
When a new dataset is given,we calculate the effect size
(mean and standard deviation in case of a twosample ttest) of
each gene.For a chosen m
1
value within a range of interest,we
identify the top m
1
genes in terms of effect size.The simulation is
modeled as follows.We specify the effect sizes of these m
1
genes by
the observed ones from the data.For the remaining m
0
¼ m m
1
genes,the effect sizes are set at 0.Lin (2005) proposes a simulation
algorithm to generate the null distribution of correlated test
statistics conditioning on given data.By modifying his algorithm,
we generate a large number of test statistics under the observed
effect sizes and correlation structure.For a nominal FDR level to be
chosen for analysis,we apply an FDRbased multiple testing pro
cedure to each simulation sample and count the numbers of false
and total discoveries.The true FDR is estimated by the empirical
average of their proportions through a large number of simulations.
By comparing the empirical FDR with the nominal one,we can
decide how accurate the data analysis will be.Although the simu
lated test statistics have the same covariances as those observed
from the data,the simulation scheme does not require the prelim
inary estimation of the covariance matrix.Our method is suggested
when a sufﬁcient number of arrays are available.The proposed
method is applied to the real datasets by Golub et al.(1999) and
Beer et al.(2002).
2 FDRBASED MULTIPLE TESTING
PROCEDURES
We assume that,for large number of arrays,the distribution of
the test statistics is approximated by a multivariate normal
distribution.Let p
1
,...,p
m
denote the pvalues for m genes that
are calculated by a resampling method or the theoretical
distribution of the test statistics.In this paper,we obtain them
from the standard normal distribution based on the large sample
approximation.Let p
(1)
p
(m)
denote the ordered pvalues for
m genes,and H
(j)
for j ¼ 1,...,m the corresponding null hypo
theses.Benjamini and Hochberg (1995) propose to reject H
(j)
for all
j J ¼ max{j:p
(j)
jq
/m}.They prove that this procedure
controls the FDR below q
if the m
0
nonprognostic genes are
independent.
Suppose that we reject H
j
if p
j
< a.Storey (2002),under inde
pendence or a weak dependence among m
0
null genes,claims that
R
0
¼
X
m
j¼1
I ðH
j
true‚ H
j
rejectedÞ
¼
X
m
j¼1
Pr ðH
j
trueÞ Pr ðH
j
rejected j H
j
Þ þo
p
ðmÞ‚
which equals m
0
a with the error termignored.Here,m
1
o
p
(m)!0
in probability as m!1.Hence,with R replaced by the observed
value r,we have
FDRðaÞ
a · m
0
r
if r > 0
0 if r ¼ 0
ð1Þ
Now,given a,estimation of FDR(a) requires estimation of m
0
only.
Storey (2002) proposes to estimate m
0
by
^
mm
0
ðlÞ ¼
P
m
j¼1
Iðp
j
> lÞ
1 l
for a chosen constant l away from0,such as 0.5.By combining this
estimator with (1),we obtain
d
FDRFDRðaÞ ¼
a ·
^
mm
0
r
¼
a
P
m
j¼1
Iðp
j
> lÞ
ð1 lÞr
if r > 0
0 if r ¼ 0
For an observed pvalue p
j
,Storey (2003) deﬁnes qvalue,the min
imum FDR level at which we reject H
j
,as q
j
¼ inf
a
p
j
d
FDRFDRðaÞ.
Jung (2005) shows that this formula is simpliﬁed to q
j
¼
d
FDRFDRðp
j
Þ in
a twosample testing case.This procedure is implemented by a
computer package called SAM (Signiﬁcance Analysis of Micro
arrays) (Tusher et al.,2001).
This procedure has been shown to reliably control the FDR under
independence (Storey,2002) or a weak dependence (Storey,2003),
such as block compound symmetry,by simulations.Given a real
dataset,however,it is difﬁcult to check whether the weak depend
ence requirement holds or not.In this paper,we propose a direct
approach for evaluation of the FDRbased multiple testing methods
in real data analysis.This approach uses a simulation algorithm by
Lin (2005) modiﬁed for estimation of the true FDR.
3 A SIMULATIONBASED METHOD
In this section,we propose a simulation method to generate a large
number of test statistics whose correlation structure,given the data,
is asymptotically identical to that of the test statistics to be calcu
lated fromthe given dataset.There are two versions of large sample
approximations in this paper:one with respect to large m and the
other with respect to large n.All asymptotic results here and after
are with respect to large n.
3.1 Twosample ttest case:when the clinical outcome
is dichotomous
Let x
kij
denote the expression level of gene j(¼ 1,...,m) from
subject i(¼ 1,...,n
k
) in disease group k(¼ 1,2).We assume that
{(x
ki1
,...,x
kim
),1 i n
k
} are independent and identically dis
tributed (
IID
) random vectors from a multivariate distribution with
means (m
k1
,...,m
km
),variances ðs
2
1
‚...‚s
2
m
Þ and correlation mat
rix G ¼ (r
jj
0
)
1j,j
0
m
.Let ð
xx
k1
‚...‚
xx
km
Þ and ðs
2
1
‚...‚s
2
m
Þ denote the
sample means and pooled variances,respectively,estimated from
the data.If the sample sizes are large,then the test statistics
Accuracy of FDRcontrol
1731
W
j
¼
xx
1j
xx
2j
s
j
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
n
1
1
þn
1
2
p
are approximately normal with mean
d
j
¼
m
1j
m
2j
s
j
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
n
1
1
þn
1
2
p
and covariance G.
We want to generate random vectors with asymptotically
the same distribution as (W
1
,...,W
m
) using a simulation
method.Let (e
ki
,1 i n
k
,k ¼ 1,2) be
IID
N(0,1) random
variables,which are independent of the dataset,and
fð
~
xx
ki1
‚...‚
~
xx
kim
Þ‚1 i n
k
‚k ¼ 1‚2g with
~x
x
kij
¼
x
x
kj
þðx
kij
x
x
kj
Þe
ki
denote a new dataset.Also,let
~
xx
kj
¼
1
n
k
X
n
k
i¼1
~
xx
kij
denote the sample means of the new dataset,and
~
WW
j
¼
~
xx
1j
~
xx
2j
s
j
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
n
1
1
þn
1
2
p
the resulting test statistics.Then,given the dataset {(x
ki1
,...,x
kim
),
1 i n
k
,k ¼ 1,2},each
~
WW
j
is a weighted average of the inde
pendent normal random variables.It follows that,given the data,
ð
~
WW
1
‚...‚
~
WW
m
Þ is normal with means
^
dd
j
¼
xx
1j
xx
2j
s
j
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
n
1
1
þn
1
2
p
for j ¼ 1,...,m and covariance matrix
^
GG which are consistent
estimators of d
j
for j ¼ 1,...,m and G.Hence,the conditional
joint distribution of ð
~
WW
1
‚...‚
~
WW
m
Þ given the data is asymptotically
identical to the unconditional joint distribution of (W
1
,...,W
m
).See
Appendix for a proof with general test statistics.Note that our
simulation method requires calculation of sample means and vari
ances from data,but not the correlation coefﬁcients.
The above procedure is based on an equal variancecovariance
assumption in two groups.However,some genes may possibly have
unequal variances and covariances between the two groups even
under H
j
:m
1j
¼ m
2j
.In this case,the null distribution of the ttest
statistics based on equal variancecovariance assumption may not
be approximated by the standard normal distribution.If the equal
variancecovariance assumption is questionable,we can use
the same simulation method with s
j
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
n
1
1
þn
1
2
p
replaced by
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
n
1
1
s
2
1j
þn
1
2
s
2
2j
q
in the denominators of W
j
,
~
WW
j
and
^
dd
j
,where
ðs
2
kj
‚j ¼ 1‚...‚mÞ are sample variances for group k(¼ 1,2).In
this case,the effect sizes will be expressed as
d
j
¼
m
1j
m
2j
ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
n
1
1
s
2
1j
þn
1
2
s
2
2j
q
‚
where s
2
kj
is the population variance for gene j(¼1,...,m) in group
k(¼ 1,2).
3.2 Cases for cox regression and general test statistics
A large family of test statistics for H
j
:
j
¼
j0
versus H
j
:
j
6
¼
j0
can be written as (U
1
,...,U
m
) ¼ {U
1
(
10
),...,U
m
(
m0
)},where
U
j
ð
j
Þ ¼
X
n
i¼1
U
ij
ð
j
Þ
and,for large n,U
ij
¼ U
ij
(
j0
) can be expressed as a function of the
data from subject i only,so that U
1j
,...,U
nj
are independent.Let
m
ij
(
j
) ¼ E(U
ij
) and m
j
ð
j
Þ ¼
P
n
i¼1
m
ij
ð
j
Þ.If E{U
j
()} is a
smooth function and E{U
j
()} ¼ 0 has a unique solution,then
the solution
^
j
to U
j
() ¼0 is consistent to
j
.Further,by the central
limit theorem,(U
1
,...,U
m
) is approximately normal with means
m
j
(
j
) and covariances s
jj
0
that can be consistently estimated by
^m
m
j
¼ m
j
ð
^
j
Þ and
^s
s
jj
0
¼
X
n
i¼1
ðU
ij
^m
m
ij
ÞðU
ij
0
^m
m
ij
0
Þ‚
respectively,where ^m
m
ij
¼ m
ij
ð
^
j
Þ.Let s
jj
¼ s
2
j
and ^s
s
jj
¼ ^s
s
2
j
.
For
IID
N(0,1) randomvariables e
1
,...,e
n
that are independent of
the data,let
~
UU
j
¼ ^m
m
j
þ
X
n
i¼1
e
i
ðU
ij
^m
m
ij
Þ:ð2Þ
Let W
j
¼ ^s
s
1
j
U
j
and
~
WW
j
¼ ^s
s
1
j
~
UU
j
.By Appendix,the conditional
joint distribution of ð
~
WW
1
‚...‚
~
WW
m
Þ given the data is asymptotically
identical to the unconditional joint distribution of (W
1
,...,W
m
).
Note that the standardized effect sizes of test statistics
~
WW
j
are
^
dd
j
¼ ^s
s
1
j
^m
m
j
.
The twosample ttest case discussed in Section 3.1 is a speciﬁc
example of this formulation.As another example,we consider the
cases to discover the genes associated with a survival endpoint.Let,
for subject i(¼ 1,...,n),T
i
denote the time to an event,such as
tumor recurrence or death,and (z
i1
,...,z
im
) denote the gene expres
sion data frommgenes.Survival time may be censored due to loss to
followup or study completion,so that we observe X
i
¼ min(T
i
,C
i
)
together with a censoring indicator D
i
¼ I(T
i
C
i
),where C
i
is the
censoring time that is assumed to be independent of T
i
given the gene
expression data (z
i1
,...,z
im
).A given dataset will be expressed as
{(X
i
,D
i
,z
i1
,...,z
im
),1 i n}.Let Y
i
(t) ¼I(X
i
t) and N
i
(t) ¼D
i
I(X
i
t) be the atrisk and event processes for patient i,respectively,and
let YðtÞ ¼
P
n
i¼1
Y
i
ðtÞ and NðtÞ ¼
P
n
i¼1
N
i
ðtÞ.
Suppose that,for subject i,z
ij
is related to the hazard rate by
l
ij
ðtÞ ¼ l
j0
ðtÞ expð
j
z
ij
Þ‚ ð3Þ
where l
j0
(t) is the unknown baseline hazard speciﬁc to gene j (Cox,
1972).The hypotheses are expressed as H
j
:
j
¼ 0 versus
HH
j
:
j
6
¼ 0.The partial MLE,
^
j
,solves the partial score function
U
j
ðÞ ¼
P
n
i¼1
U
ij
ðÞ ¼ 0,where
U
ij
ðÞ ¼
Z
1
0
z
ij
P
n
i
0
¼1
Y
i
0
ðtÞz
i
0
j
e
z
i
0
j
P
n
i
0
¼1
Y
i
0
ðtÞe
z
i
0
j
dN
i
ðtÞ:
Let
m
ij
¼
Z
1
0
z
ij
P
n
i
0
¼1
Y
i
0
ðtÞz
i
0
j
P
n
i
0
¼1
Y
i
0
ðtÞ
Y
i
ðtÞe
j
z
ij
dL
j0
ðtÞ
S.H.Jung and W.Jang
1732
and
^m
m
ij
¼
Z
1
0
z
ij
P
n
i
0
¼1
Y
i
0
ðtÞz
i
0
j
P
n
i
0
¼1
Y
i
0
ðtÞ
Y
i
ðtÞe
^
j
z
ij
d
^
LL
j0
ðtÞ‚
where dL
j0
(t) ¼ l
j0
(t)dt and
d
^
LL
j0
ðtÞ ¼
X
n
i¼1
Y
i
ðtÞexpð
^
j
z
ij
Þ
1
dNðtÞ‚
refer to Andersen and Gill (1982).Then the partial score test statistic
(U
1
,...,U
m
) ¼ (U
1
(0),...,U
m
(0)) is approximately normal with
mean m
j
¼
P
n
i¼1
m
ij
and variancecovariances s
jj
0
that can be con
sistently estimated by ^m
m
j
¼
P
n
i¼1
^m
m
ij
and
^s
s
jj
0
¼
X
n
i¼1
ðU
ij
^m
m
ij
ÞðU
ij
0
^m
m
ij
0
Þ‚
respectively.Let ^s
s
2
j
¼ ^s
s
jj
.Note that m
j
¼ 0 under H
j
.
The regression model (3) may not hold for some genes.By Lin
and Wei (1989),whether model (3) is valid or not,the score statistic
U
j
is a meaningful measure of association between z
ij
and T
i
,and the
test statistic W
j
¼ ^s
s
1
j
U
j
is asymptotically N(0,1) under H
j
.For
robust testing against potential outliers in gene expression data,
Jung et al.(2005) propose to use the rank of z
ij
among gene j
observations,z
1j
,...,z
nj
,as the covariate of model (3),rather
than the raw expression level.
For
IID
N(0,1) randomvariables e
1
,...,e
n
which are independent
of the data,let
~
UU
ij
¼ ^m
m
ij
þe
i
ðU
ij
^m
m
ij
Þ‚
~
UU
j
¼
X
n
i¼1
~
UU
ij
¼ ^m
m
j
þ
X
n
i¼1
e
i
ðU
ij
^m
m
ij
Þ
and
~
WW
j
¼ ^s
s
1
j
~
UU
j
.Then,the conditional joint distribution of
ð
~
WW
1
‚...‚
~
WW
m
Þ given the data is asymptotically identical to the
unconditional joint distribution of (W
1
,...,W
m
).
4 NUMERICAL STUDIES
In this section,we take real microarray data and demonstrate our
simulationbased method discussed in the previous section to check
how well the FDRcontrol procedures will work under the depend
ency embedded in a given dataset.
An accurate estimation of the FDR,to be discussed below,
requires identiﬁcation of the prognostic genes and their effect
sizes in addition to the correlation structure among the genes.
While the second part in the righthand side of (2),
P
n
i¼1
e
i
ðU
ij
^m
m
ij
Þ,is to approximate the correlation among m
test statistics,the ﬁrst part,^m
m
j
,is a function of their effect sizes.
For an accurate calculation of the FDR,we need to know which
genes are prognostic.Since none of the observed ^m
m
j
would be
exactly 0,we ﬁrst have to specify m
1
and identify m
1
genes with
large effect sizes as follows.Given a dataset,we ﬁrst calculate ^m
m
j
(¼
xx
1j
xx
2j
in twosample ttest case) and s
j
.For a chosen m
1
,we
identify the genes with top m
1
effect sizes in absolute value.The
effect sizes for these m
1
genes will be set at the observed absolute
values d
j
¼ ^s
s
1
j
j ^m
m
j
j,but those for the remaining m
0
¼ m m
1
genes will be set at d
j
¼ 0.By the same arguments as in Appendix,
the change in effect sizes does not change the correlation structure
of the test statistics.In order to simplify the procedure,we take the
absolute values of the effect sizes and use onesided tests.We
observe similar results by using twosided tests and the raw effect
sizes.
Now,we generate a large number of sets (say,B ¼ 4,000) of the
test statistics ð
~
WW
1
‚...‚
~
WW
m
Þ,where
~
WW
j
¼ d
j
þ ^s
s
1
j
X
n
i¼1
e
i
ðU
ij
^m
m
ij
Þ:
Table 1.Simulation results for Golub et al.(1999)
m
1
Exact SAM BH
q
*
^
rr
1
^
rr
0
^
qq
^
rr
1
^
rr
0
^
qq
^
rr
1
^
rr
0
20 0.5
20.0 57.7 0.3804 20.0 530.1 0.3491 20.0 253.7
0.4 20.0 38.4 0.3266 20.0 302.7 0.3042 20.0 162.6
0.3 20.0 24.0 0.2695 20.0 166.8 0.2535 20.0 99.1
0.2 20.0 13.5 0.2089 20.0 83.4 0.1968 20.0 54.7
0.1 19.9 5.5 0.1336 19.9 32.8 0.1256 19.9 22.9
0.05 19.9 2.4 0.0871 19.9 14.9 0.0809 19.9 11.1
0.01 19.4 0.4 0.0345 19.7 3.5 0.0317 19.7 2.7
60
0.5 59.9 120.3 0.4005 59.8 578.1 0.3765 59.8 295.9
0.4 59.8 80.7 0.3433 59.7 339.4 0.3215 59.8 191.3
0.3 59.7 51.5 0.2812 59.6 190.8 0.2629 59.7 119.0
0.2 59.5 59.5 0.2123 59.4 98.4 0.1974 59.5 67.1
0.1 58.9 12.0 0.1317 59.0 40.2 0.1211 59.0 29.1
0.05 58.0 5.3 0.0826 58.4 19.0 0.0755 58.5 14.5
0.01 53.5 0.8 0.0299 56.4 4.6 0.0274 56.4 3.7
100
0.5 99.5 173.5 0.4113 99.2 621.0 0.3860 99.3 327.5
0.4 99.3 117.0 0.3507 99.0 371.8 0.3271 99.1 215.2
0.3 98.9 74.5 0.2845 98.6 211.6 0.2644 98.7 135.3
0.2 98.2 42.2 0.2128 97.9 111.4 0.1963 98.0 77.8
0.1 96.5 17.5 0.1294 96.5 46.6 0.1181 96.6 34.4
0.05 93.7 7.7 0.0800 94.7 22.4 0.0721 94.8 17.3
0.01 82.3 1.2 0.0276 88.8 5.4 0.0248 88.9 4.4
400
0.5 394.4 499.2 0.4428 390.7 913.5 0.3991 391.2 523.4
0.4 391.5 339.9 0.3685 387.3 575.1 0.3302 387.7 360.0
0.3 386.8 217.9 0.2922 382.3 350.7 0.2588 382.6 235.7
0.2 378.5 124.1 0.2104 374.2 195.2 0.1846 374.2 140.2
0.1 359.1 51.3 0.1211 357.7 85.4 0.1053 357.3 64.4
0.05 333.1 22.1 0.0711 337.9 41.8 0.0614 337.1 32.3
0.01 248.4 3.2 0.0222 282.3 9.9 0.0191 280.3 7.8
1000
0.5 977.1 1062.7 0.4650 961.2 1422.7 0.3812 956.3 782.2
0.4 962.7 716.4 0.3788 945.2 918.1 0.3087 938.4 553.1
0.3 940.8 458.9 0.2921 922.0 573.3 0.2366 913.2 368.2
0.2 903.9 260.1 0.2040 885.3 325.2 0.1643 874.3 220.0
0.1 825.2 105.5 0.1122 813.8 141.3 0.0898 799.6 99.7
0.05 730.0 44.6 0.0631 735.1 67.5 0.0505 718.1 48.5
0.01 481.8 6.2 0.0182 543.7 14.7 0.0147 523.5 10.8
2500
0.5 2465.2 2488.6 0.4753 2414.2 2523.4 0.3006 2301.6 1063.0
0.4 2407.6 1635.4 0.3770 2351.4 1744.6 0.2385 2211.9 752.4
0.3 2313.4 1021.7 0.2914 2251.5 1088.5 0.1785 2089.0 498.6
0.2 2157.3 565.0 0.1954 2088.7 597.8 0.1202 1907.7 290.7
0.1 1846.1 218.4 0.1011 1787.7 241.8 0.0630 1597.6 124.0
0.05 1519.8 87.8 0.0540 1490.0 106.3 0.0342 1307.9 56.4
0.01 848.9 10.8 0.0143 915.7 19.6 0.0095 783.4 11.1
Accuracy of FDRcontrol
1733
Let ð
~
ww
b1
‚...‚
~
ww
bm
Þ denote the set of m test statistics from the bth
(b ¼ 1,...,B) simulation.Also,let
r
0b
ðaÞ ¼
X
m
j¼1
Ið
~
ww
bj
> z
1a
‚d
j
¼ 0Þ‚
r
b
ðaÞ ¼
X
m
j¼1
Ið
~
ww
bj
> z
1a
Þ
denote the numbers of false rejections and total rejections,
respectively,from the bth simulation sample.Here,z
1a
is the
100(1 a)th percentile of N(0,1)distribution.For a large B,
the true FDR is approximated by
^
qqðaÞ ¼ B
1
X
B
b¼1
r
0b
ðaÞ
r
b
ðaÞ
:
If we want to control the FDR at q
level,then we have to ﬁnd the
corresponding a ¼ a
by solving
^
qqðaÞ ¼ q
*
using the bisection
method,and reject all genes with pvalues smaller than a
.We call
this procedure ‘exact method’ since it exactly controls the empirical
FDR from the simulations based on the true parameter values.We
estimate the average true rejections and false rejections by
^
rr
1
¼ B
1
X
B
b¼1
r
1b
ða
*
Þ‚
and
^
rr
0
¼ B
1
X
B
b¼1
r
0b
ða
*
Þ‚
respectively,where r
1b
¼r
b
r
0b
.The calculated
^
rr
0
and
^
rr
1
will be
compared with those by the Storey’s method,called SAM,and the
method by Benjamini and Hochberg (1995),called BH,that are
discussed in Section 2.The exact method uses a common critical
value for m pvalues like SAM,so that it can be used as a gold
standard for SAM.Note that the exact method cannot be used in a
real data analysis because we do not know which genes are pro
gnostic.We can replace z
1a
with a resamplingbased quantile.
However,we use the theoretical standard normal quantile based
on the asymptotic normality.
SAMand BHare applied to each simulated set of test statistics to
control the FDR level at q
,calculate r
0b
and r
b
,and estimate the
FDR level,at which these procedures are really controlling,as the
average of r
0b
/r
b
,
^
qq ¼ B
1
X
B
b¼1
r
0b
r
b
‚
through the B simulations.If these procedures accurately control the
FDR,then
^
qq is expected to be close to q
.
4.1 An example for twosample ttests
An example dataset is taken from the golubTrain object in
golubEsets package (version 1.0.1) in Bioconductor release 1.7.
(Gentleman et al.,2004).Golub et al.(1999) explore m ¼ 6810
genes extracted frombone marrow in 38 patients,of which n
1
¼27
with acute lymphoblastic leukemia (ALL) and n
2
¼ 11 with acute
myeloid leukemia (AML),in order to identify the genes with poten
tial clinical heterogeneity being differentially expressed in the two
subclasses of leukemia.Genes useful to distinguish ALLfromAML
may provide insight into cancer pathogenesis and patient treatment.
We conduct simulation studies for m
1
¼20,60,100,400,1000 and
2500,and q
¼0.5,0.4,0.3,0.2,0.1,0.05 and 0.01.Using l ¼0.5,
0.0 0.2 0.4 0.6 0.8 1.0
01234
(a)
R
0
R
0.0 0.2 0.4 0.6 0.8 1.0
01234
(b)
0.0 0.04 0.16 0.36 0.64 1.0
051015
(c)
0.0 0.04 0.16 0.36 0.64 1.0
051015
(d)
R
0
R
R
0
R R
0
R
Fig.1.Distribution of R/R
o
from 4000 simulations under q
¼ 0.5 and 0.01 and m
1
¼ 20.(a) q
¼ 0.5 (SAM),(b) q
¼ 0.5 (BH),(c) q
¼ 0.01 (SAM),
(d) q
¼0.01 (BH).
S.H.Jung and W.Jang
1734
we obtain
^
mm
0
¼ 4278(
^
mm
1
¼ m
^
mm
0
¼ 2532) from the original
data.
Table 1 reports the estimated FDR,
^
qq,number of true and false
rejections,
^
rr
1
and
^
rr
0
,respectively,for SAMand BH methods.Only
^
rr
1
and
^
rr
0
are reported for the exact method since it controls the FDR
accurately.BH is always more conservative than SAM,i.e.
^
qq
BH
<
^
qq
SAM
.For example,when m
1
¼ 20 and the nominal FDR
is set at q
¼ 5%level,SAMand BH control the FDR at
^
qq ¼ 8:71
and 8.09%,respectively.The ratio
^
qq
SAM
/
^
qq
BH
increases in m
1
as in
independent case (Storey,2002).With m
1
400,SAM (BH) is
conservative if q
0.3(q
0.2) and anticonservative otherwise.
SAMcontrols the FDR accurately when m
1
100 and q
0.3.So
does BH when m
1
400 and q
0.2.However,the bias of SAM
and BH in
^
qq is more serious with a smaller m
1
value.
There exists a similar trend in
^
rr
1
to that in
^
qq,i.e.with m
1
400,
both SAM and BH have smaller
^
rr
1
than the exact method for
q
0.2 and larger
^
rr
1
for q
< 0.2.SAM and BH have almost
the same
^
rr
1
for m
1
400,but,with a larger m
1
,the former tends to
have a larger
^
rr
1
.
SAM always has higher false rejections,
^
rr
0
,than the other two
methods.The discrepancy is more noticeable with smaller m
1
.BH
also has higher false rejections than the exact method when m
1
400.With m
1
1000,however,BH tends to have lower
^
rr
0
than the
exact method except for a small q
such as 1 or 5%.
Figure 1 reports the distribution of R
0
/R observed from the 4000
simulations for q
¼0.5 or 0.01 with m
1
ﬁxed at 20.When q
¼0.5,
R
0
/R is highly distributed around 0 and 1.When q
¼0.01,R
0
/R has
a large density around 0 and is widely distributed in the rest of the
range.Note that the horizontal axes of Figure 1c and d are rescaled
using a square root transformation to show the distribution of R
0
/R
around q
¼ 0.01 better.At each q
level,the distributions of R
0
/R
for SAM and BH look almost the same,so that it is difﬁcult to
observe any difference in the amount of conservativeness or anti
conservativeness between the two procedures.Furthermore,the
distributions are widely spread over the range of [0,1],so that
the ﬁgures do not clearly showthe location shift of the distributions
from the nominal q
.
4.2 An example with survival data
Beer et al.(2002) generated expression proﬁles of m ¼ 4966 genes
to discover the genes that can predict disease progression.The data
include n ¼ 86 stage I or III lung cancer patients,of whom
24 patients have disease progressions.By controlling the FWER
at 10%level,Jung et al.(2005) discovered two genes whose expres
sion levels are signiﬁcantly associated with the time to progression.
In this section,we consider the same test statistics standardized by
the standard errors as described in Section 3.2.Simulations are
conducted at similar settings to those in the previous example.
The simulation results are reported in Table 2.We observe that
both SAMand BHconservatively control the FDRin the simulation
settings.BH is slightly more conservative than SAM,but they
become similar as m
1
decreases.SAMcontrols the FDRvery accur
ately for q
0.1 which is the range of interest in usual data
analyses.SAM becomes more accurate with a larger m
1
.Using
l ¼ 0.5,we obtain
^
mm
0
¼ 4092(
^
mm
1
¼ m
^
mm
0
¼ 874) from the
original data.
In all the simulation settings,the exact method has the largest
^
rr
1
and BH has the smallest
^
rr
1
,although the difference is small.When
controlling the FDR at q
¼ 10% level,the exact method has only
about
^
rr
1
/m
1
¼ 40% of true rejections for m
1
60.The other two
methods have lower true rejection rates.
When m
1
200,the exact method has the smallest
^
rr
0
and SAM
has the largest
^
rr
0
among the three methods.The discrepancy in
^
rr
0
among the three methods becomes smaller as m
1
increases and q
decreases.
5 DISCUSSION
Benjamini and Hochberg (1995) and Benjamini and Yekutieli
(2001) prove that their approach conservatively controls the FDR
under independence and weak dependence of m
0
test statistics for
which null hypotheses are true.However,fromthe ﬁrst example of
Section 4,we found that the BHmethod can be anticonservative in
a general correlation combined with a small number of differentially
expressing genes,m
1
,and a small nominal FDR level,q
.Storey
et al.(2004) also claimthat SAMprocedure conservatively controls
the FDR under weak dependency.From the ﬁrst example,it is
Table 2.Simulation results for Beer et al.(2002)
m
1
Exact SAM BH
q
*
^
rr
1
^
rr
0
^
qq
^
rr
1
^
rr
0
^
qq
^
rr
1
^
rr
0
20 0.5
15.6 29.5 0.3390 13.9 90.7 0.3268 13.7 85.4
0.4 14.6 18.4 0.2785 12.9 53.9 0.2684 12.8 51.0
0.3 13.4 10.7 0.2170 11.8 29.7 0.2098 11.6 28.3
0.2 11.8 5.4 0.1522 10.3 13.5 0.1461 10.2 12.9
0.1 9.5 1.8 0.0793 8.2 4.0 0.0765 8.0 3.8
0.05 7.5 0.7 0.0414 6.4 1.3 0.0399 6.2 1.3
0.01 3.9 0.1 0.0094 3.3 0.1 0.0091 3.3 0.1
60
0.5 45.7 68.0 0.3787 41.5 120.2 0.3700 41.1 113.2
0.4 42.2 42.5 0.3091 38.1 73.5 0.3023 37.7 69.7
0.3 38.2 24.9 0.2378 34.0 41.4 0.2325 33.7 39.6
0.2 33.1 12.5 0.1634 29.0 19.7 0.1597 28.7 18.9
0.1 25.3 4.2 0.0857 21.8 6.2 0.0835 21.6 6.0
0.05 19.1 1.5 0.0432 16.2 2.1 0.0415 15.8 2.0
0.01 9.2 0.2 0.0096 7.5 0.2 0.0091 7.2 0.2
100
0.5 76.4 103.8 0.3979 70.5 149.4 0.3875 69.8 138.5
0.4 70.5 65.0 0.3236 64.3 92.2 0.3145 63.6 86.5
0.3 63.3 37.8 0.2475 57.2 52.8 0.2404 56.5 49.9
0.2 54.3 18.9 0.1695 48.3 25.6 0.1645 47.8 24.3
0.1 40.8 6.4 0.0882 35.6 8.3 0.0854 35.1 7.9
0.05 30.1 2.3 0.0450 25.7 2.9 0.0434 25.1 2.8
0.01 13.9 0.2 0.0094 11.1 0.3 0.0089 10.7 0.3
400
0.5 322.8 355.4 0.4496 311.4 369.1 0.4121 299.6 302.8
0.4 295.2 222.7 0.3603 281.6 231.5 0.3306 270.8 194.3
0.3 261.7 129.5 0.2712 246.7 134.9 0.2488 237.0 115.1
0.2 218.3 63.8 0.1818 203.0 67.2 0.1670 194.4 58.2
0.1 156.4 20.9 0.0920 141.3 22.3 0.0843 134.6 19.5
0.05 108.3 7.2 0.0467 94.5 7.9 0.0430 89.3 7.0
0.01 42.5 0.6 0.0093 33.1 0.8 0.0084 30.7 0.7
1000
0.5 865.8 882.2 0.4756 851.9 843.1 0.3804 771.5 526.1
0.4 785.5 541.1 0.3774 767.4 520.3 0.3031 689.9 341.9
0.3 687.5 311.5 0.2814 663.8 299.2 0.2264 592.3 203.5
0.2 560.5 151.4 0.1864 531.9 146.0 0.1502 469.6 102.1
0.1 382.5 48.1 0.0924 347.4 46.6 0.0746 301.3 33.4
0.05 251.0 15.9 0.0459 216.2 15.8 0.0373 184.7 11.6
0.01 85.9 1.3 0.0091 63.6 1.5 0.0072 53.4 1.1
Accuracy of FDRcontrol
1735
shown that SAM can be anticonservative too (with small m
1
and q
).
If an FDRbased procedure does not control the FDR accurately
in a real data analysis,there may be two potential reasons:(1) the
test statistics are heavily correlated or (2) the null distribution of
each test statistic cannot be approximated by the standard normal
distribution because the sample size is not large enough for the
normal approximation of the test statistics.In order to avoid
issue (2) and focus our discussion on (1),we generated es from
N(0,1) distribution so that the simulated test statistics have exactly
normal distributions.As mentioned in Appendix,however,if n is
large enough,es can be generated from any distribution with mean
0 and variance 1,such as Uð
ﬃﬃﬃ
3
p
‚
ﬃﬃﬃ
3
p
Þ.
From simulations not reported in this paper,we observed that
SAM closely estimates E{R
0
(a)}E{R
1
(a)} ¼ am
0
E{R
1
(a)}
rather than the FDR,E{R
0
(a)/R(a)}.
As mentioned previously,our procedure is to check the perform
ance of the existing multiple testing methods for a given dataset,so
that it is different from the simulationbased multiple testing pro
cedures used for data analysis.However,a similar simulation
method can be used to derive a testing procedure too,see Lin
(2005).Given g 2 (0,1),van der Laan et al.(2004) and
Lehmann and Romano (2005) propose a conservative procedure
to control the false discovery proportion,deﬁned as P(R
0
/R > g),
at a certain level.Our simulation method can be easily modiﬁed to
evaluate the conservativeness of their procedure for a given dataset.
We have discussed our procedure in terms of microarray data,but
it can be used for any high dimensional data involving multiple
testing using dependent test statistics,e.g.proteomic data.The
simulation programs are written in Fortran 77.Uniform random
numbers were generated using RAN2 subroutine of Press et al.
(1980).
ACKNOWLEDGEMENTS
The authors want to thank the two reviewers for their valuable
comments.
Conflict of Interest:none declared.
REFERENCES
Anderson,P.K.and Gill,R.D.(1982) Cox’s regression model for counting processes:a
large sample study.Ann.Stat.,10,1100–1120.
Beer,D.G.et al.(2002) Geneexpression proﬁles predict survival of patients with lung
adenocarcinoma.Nat.Med.,8,816–824.
Benjamini,Y.and Hochberg,Y.(1995) Controlling the false discovery rate:a practical
and powerful approach to multiple testing.J.R.Stat.Soc.Ser.B,57,289–300.
Benjamini,Y.and Yekutieli,D.(2001) The control of the false discovery rate in
multiple testing under dependency.Ann.Stat.,29,1165–1188.
Cox,D.R.(1972) Regression models and lifetables (with discussion).J.R.Stat.Soc.
Ser.B,34,187–220.
Gentleman,R.C.et al.(2004) Bioconductor:open software development for compu
tational biology and bioinformatics.Genome Biol.,5,R80.
Golub,T.R.et al.(1999) Molecular classiﬁcation of cancer:class discovery and class
prediction by gene expression monitoring.Science,286,531–537.
Huang,Y.,Xu,H.,Calian,V.and Hsu,J.C.(2005) To permute or not permute.Technical
Report 756,Department of Statistics,Ohio State University,OH.
Jung,S.H.(2005) Sample size for FDRcontrol in microarray data analysis.
Bioinformatics,21,3097–3104.
Jung,S.H.et al.(2005) Amultiple testing procedure to associate gene expression levels
with survival.Stat.Med.,21,3097–3104.
Lehmann,E.L.and Romano,J.P.(2005) Generalizations of the familywise error rate.
Ann.Stat.,33,1138–1154.
Lin,D.Y.(2005) An efﬁcient Monte Carlo approach to assessing statistical signiﬁcance
in genomic studies.Bioinformatics,21,781–787.
Lin,D.Y.and Wei,L.J.(1989) The robust inference for the Cox proportional hazards
model.J.Am.Stat.Assoc.,84,1074–1078.
Press,W.H.,Flannery,B.P.,Teukolsky,S.A.and Vetterling,W.T.(1980) Numerical
Recipes:The Art of Scientiﬁc Computing.Cambridge University Press,NY.
Storey,J.D.(2002) A direct approach to false discovery rates under dependence.
J.R.Stat.Soc.Ser.B,64,479–498.
Storey,J.D.(2003) The positive false discovery rate:a Bayesian interpretation and the
qvalue.Ann.Stat.,31,2013–2035.
Storey,J.D.et al.(2004) Strong control,conservative point estimation,and simultan
eous conservative consistency of false discovery rates:a uniﬁed approach.J.R.
Stat.Soc.Ser.B,66,187–205.
Storey,J.D.and Tibshirani,R.(2001) Estimating false discovery rates under
dependence,with applications to DNA microarrays.Technical Report 2001–28,
Department of Statistics,Stanford University,CA.
Tusher,V.et al.(2001) Signiﬁcance analysis of microarrays applied to the ionizing
radiation response.Proc.Natl Acad.Sci.USA,98,5116–5121.
van der Laan,M.J.et al.(2004) Augmentation procedures for control of the generalized
familywise error rate and tail probabilities for the proportion of false positives.
Stat.Appl.Gene.Mol.Biol.,3,15.
Westfall,P.H.and Young,S.S.(1993) Resamplingbased Multiple Testing:Examples
and Methods for Pvalue Adjustment.Wiley,NY.
APPENDIX
It sufﬁces to show that the conditional joint distribution of
ð
~
UU
1
‚...‚
~
UU
m
Þ given the data,D,is asymptotically identical to
the unconditional joint distribution of (U
1
,...,U
m
).As discussed
in Section 3.2,(U
1
,...,U
m
) is asymptotically normal with means
and covariances that can be consistently estimated by ^m
m
j
and ^s
s
jj
0
for
1 j,j
0
m,respectively.
Given the data,^m
m
ij
and U
ij
are constants,so that
~
UU
j
¼
^m
m
j
þ
P
n
i¼1
e
i
ðU
ij
^m
m
ij
Þ for j ¼ 1,...,m are weighted sums of
IID
N(0,1) random variables,e
1
,...,e
n
.Hence,ð
~
UU
1
‚...‚
~
UU
m
Þ is
normal with means
Eð
~
UU
j
j DÞ ¼ ^m
m
j
þ
X
n
i¼1
ðU
ij
^m
m
ij
ÞEðe
i
Þ ¼ ^m
m
j
and covariances
covð
~
UU
j
‚
~
UU
j
0
j DÞ ¼
X
n
i¼1
ðU
ij
^m
m
ij
ÞðU
ij
0
^m
m
ij
0
Þvarðe
i
Þ ¼ ^s
s
jj
0
which concludes the proof.
In fact,e
i
’s can be generated from any distribution with mean
0 and variance 1,such as Uð
ﬃﬃﬃ
3
p
‚
ﬃﬃﬃ
3
p
Þ.In this case,the condi
tional joint distribution of ð
~
UU
1
‚...‚
~
UU
m
Þ given D is approximated
by the same distribution by the central limit theorem.
S.H.Jung and W.Jang
1736
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο