Sample Size Re-estimation

highpitchedteamSecurity

Nov 30, 2013 (3 years and 11 months ago)

81 views


1

Sample Size Re
-
estimation

1.

Introduction

Despite the best effort, some of the crucial information used to design a confirmatory trial
is not available, or is available but with a high degree of uncertainty, at the design stage.
Th
is could involve
the initia
l estimates of within
-

or between
-
patient variation,
a control
group event rate for a binary outcome,
the treatment effect

desired to be detected
, the
recruiting pattern,
or

patients’ compliance, which all impact the ability of the clinical trial
to addres
s its primary objective (Shih, 2001). There are many reasons why
reliable

information about these types of parameters is not available or is barely available. An
obvious
reason

is that the new product has not been studied in the manner intended in the
ta
rget patient population. This situation is not uncommon with HIV treatment
, for
example,

when a new product is to be studied in a confirmatory trial as part of a drug
cocktail, and the cocktail contains some newly approved anti
-
retroviral medications.
An
other reason for lack of information
could be
that
changing
medical practice
might
significantly affect
an

event rate. Thus, estimated event rates based on historical data
might no longer reflect rates in the current medical environment. A third reason
c
ould be
a conscious decision by the sponsor to minimize the phase II program, resulting in less
than ideal information at the time when the confirmatory trial is being planned.

Mehta and Patel (2005) discussed a situation where a sponsor prefers starting o
ut
with a small initial commitment of patients but is willing to factor in the possibility that
the sample size might need to be increased during the course of the trial. The latter could

2

be accomplished by taking an interim assessment of the treatment ef
fect and re
-
assessing
whether the initial plan of a smaller sample size remains a viable option.

There are also situations where the primary study objective is to collect enough
patient exposure data to assess the safety of a pharmaceutical product. Becau
se of patient
dropout, the total exposure time or the number of subjects with the minimum exposure
duration might
at the end of the trial
be less than planned if the current dropout pattern
continues. As a result, there is a need to increase the sample si
ze based on the dropout
pattern observed while the trial is ongoing.

Offen et al (2006) discussed yet another situation where assumptions are made
about the correlations among multiple co
-
primary endpoints when determining the
sample size. Ideally, assump
tions about the correlations should be based on data from
similar trials conducted previously.
However, sufficiently similar trials to determine
sample size dependably might not exist.

When there is uncertainty about the assumptions made at the design sta
ge, it
may
be
prudent to check the validity of th
ose

assumptions using interim data from the study.
If the assumptions a
ppear

erroneous, one
might be able to

make mid
-
course adjustments
to
improve the

chance
that the trial will
reach a definitive conclusi
on. One
such
mid
-
course adjustment is to modify the sample size. In other words, the original sample size
could be revised based on estimates derived from interim data. We will focus on this
aspect of mid
-
course adjustment in this paper.

In this paper

w
e will focus on sample size re
-
estimation (SSR) for phase III and
phase IV studies. The discussion is relevant to both continuous and binary endpoints
even though the basis for SSR might differ for those two cases. A condensed summary

3

of the highlights f
rom this paper was published in an executive summary of the PhRMA
Adaptive Working Group (Gallo et al, 2006). A general discussion of the methodology is
given in the next section. Recommendations related to operational
aspects

of SSR are
given in Section

3. In Section 4, we briefly discuss two examples in which sample size
adjustment was proposed as part of the original designs. In Section 5, we will discuss
some evolving issues related to SSR that could benefit from further research.

2.

Methodology

There
are many sample size modification rules
, each with
a companion decision rule
(test). The sample size modification rule interacts with the decision rule to determine the
operating characteristics of the resulting design. Only calculations specific to each

particular design can ensure that the design controls the type I error rate at the desir
ed

level.
As theoretical development has progressed, s
imulation ha
s

played an increasingly
large role i
n

verifying that the desired operating characteristics are main
tained.

2.1

Fully sequential and group sequential methods

For monitoring of serious safety or efficacy events, there may be value in evaluating
whether or not to continue a trial each time an event is observed. Wald (1947) devised
the sequential probabili
ty ratio test (SPRT) which can be applied
in such a
case.
Siegmund (1985) gives a more recent treatment of fully sequential methods. As an
example of continuous monitoring, consider the REST trial (
see Section 4, Example
2)
which evaluated the safety of
a vaccine for rotavirus. A previous
rotavirus
vaccine
had
been

withdrawn from the market due to an increased incidence of a serious event,
intussusception. Normally, intussusception occurs in about one of 500 infants, and thus

4

even a moderate increase in

incidence will require a very large study to detect. The
REST trial allowed evaluation of a stopping boundary following every case of
intussusception to determine if there
were

a sufficient evidence of an increased risk to
justify stopping the trial.

For

most trials, the continuous monitoring required by fully sequential methods is
excessive. Thus, group sequential methods have been developed. Under a group
sequential design, interim analyses are
performed
after pre
-
planned numbers of patients
have been
enrolled and followed for a
specified
period of time.

At the time of an interim
analysis, pre
-
specified rules are applied to decide whether to stop the trial for a definitive
finding, stop the trial due to futility for ever developing a definitive finding
, or continuing
the trial until the next planned analysis. As a result, the method can affect the size of a
trial if sufficient evidence is available at an interim analysis to come to a conclusion. An
extensive review of group sequential methods is given

by Jen
nison and Turnbull (1999).

Another method of using
a
group sequential design to adjust trial size is to
p
erform

interim analys
e
s
based upon the
amount of statistical information
that
has
accrued
,

for example
,

after a fixed number of endpoints has be
en observed

in a
design
requir
ing

a number of events at the final analysis that is sufficient to detect a clinically
meaningful treatment effect.
I
f the overall
event
incidence is low, one can enroll more
patients or lengthen the follow
-
up period to obtai
n the needed number of events. On the
other hand, if the low overall incidence is a result of a large treatment effect, the trial
might
be stopped at an interim analysis based on conclusive early results.


5

2.2

Blinded sample size re
-
estimation

Blinded sample s
ize re
-
estimation uses interim
data

without unblinding treatment
assignment to
provide an updated
estimate
of
a nuisance parameter

in order to update the
sample size for the trial

based on the estimate
. Nuisance parameters mentioned in this
context are us
ually the variance for continuous outcomes or the underlying event rate for
binary outcomes. Gould (2001) reviews methods of this type and comments that they are
reasonably comparable in performance and also compare favorably with methods that
utilize
unb
linded estimates. Kieser and Friede (2003) considered blinded sample size re
-
estimation using a blinded estimate of variance for a continuous endpoint. Blinded
sample re
-
estimation is generally well accepted by regulators (ICH E
-
9, 1999). Mehta
and Tsia
tis (2001) note that, when viewed in terms of the information on the primary
endpoint, the recruitment objective remains fixed under
an

information
-
based design and
the trial could be considered “non
-
adaptive” from the information perspective even
though t
he sample size might differ from what was originally targeted in the protocol.

2.3

Unblinded sample size re
-
estimation

We will discuss methods summarized by Posch, Bauer and Brannath (2003), who
conducted an extensive review of adaptive trial design methods.
Gould (2001) also
reviewed blinded and unblinded sample
-
size re
-
estimation methods and compared them
as noted above. While these methods generalize in useful ways to other trial adaptations,
we focus primarily on SSR here. The basic adaptation strategy i
s that one or more
interim analyses may be planned, and at the time of any interim analysis the sample size
may be changed based on unblinded interim results or other factors such as external
information. The usual approach is to attempt to adjust sample
size to provide desired

6

power under
a
certain assumption of the treatment effect. Each approach needs to
address two crucial questions: 1) how to do this and retain a desired
t
ype I error rate, and
2) what treatment effect should the trial be powered for
at the time of the interim
analysis?

2.3.1

Type I error control

Combination tests are commonly used to control type I error rate in adaptive designs with
SSR. The first combination test discussed combines test statistics from a pre
-
defined,
fixed number of stag
es of a trial in a pre
-
defined manner. However, the sample size that
is used to generate the test statistic for each stage is defined based on the results of the
previous stages. Rules are put into place at the beginning of the trial for each stage based

on the combined statistics through that stage. These include rules for stopping the trial
for futility, stopping for superiority, or continuing to the next stage.

Group sequential designs are a simple case of combination tests where the sample
size does

not change as a function of interim results. This is most easily seen when
comparing sample means for two samples. Differences in means from each stage of the
trial are combined as independent increments from the results of each stage, with weights
bein
g proportional to the sample size for each stage.

Expanding on this example, some propose to use these same pre
-
defined weights
for combining mean differences from different stages while allowing the sample size for
each stage to vary based on results from

the previous stages. By using the pre
-
defined
weights the null distribution is preserved for the interim and final test statistics. Since the
weighting of the normal statistics will not, in general, be proportional to the sample size
for that stage, the

method does not use the sufficient statistics (the unweighted mean

7

difference and estimated standard deviation from combined stages) for testing, and is
therefore less efficient (Tsiatis and Mehta, 2003). Additional discussion on efficiency
can be found
in Burman and Sonesson (2006) and Jennison and Turnbull (2006a).

Usually, combination tests are stated in terms of methods for combining p
-
values
from different stages of a trial. A common method is Fisher’s combination test. Another
method for combining
p
-
values was proposed by Lehmacher and Wassmer (1999)
,

who
apply the inverse standard normal distribution to each stage p
-
value to get a standardized
normal statistic. These normal statistics are then combined through a weighted sum to
obtain a combined s
tandardized normal statistic. The combination test p
-
value is the p
-
valu
e for this combined statistic.

Although we discussed combination tests with normal random variables and a
known variance, the results can generalize to other cases by using asymptotic
s. While
combination tests may combine test statistics using methods other than the above, the
comment on lack of efficiency is applicable to the general case.

An alternative method of designing trials with combination tests is to pre
-
specify
the sample s
ize, the futility rule, and the superiority rule only for the first interim
analysis, and then to determine recursively at the time of each interim analysis what the
sample size, futility rule and superiority rule are for the next stage. This is equivalen
t to
assigning a weight for the first two stages of the trial at the beginning, and at the time of
each interim analysis dividing the weight for the subsequent stage into two parts for the
following two stages


until the point where it is decided to assig
n all of the remaining
weight to a stage and stop the trial at the subsequent analysis.


8

2.3.2

Sample size adaptation methods

Consider a difference of weighted normal means with a pre
-
planned sequence of
analyses. At a given stage, we make several assumptions:

1.

We assume some fixed parameter value for the underlying mean difference.

2.

We assume a known common variance for observations within each group.

3.

We assume the cutoffs for decision making at future analyses for the trial.

4.

We assume sample sizes for remainin
g stages in the trial.

5.

There is a desired power for the remainder of the trial conditional on the current
results.

Given assumptions 1
-
4 and a test statistic for data through the current stage, one can
compute the conditional power for a positive finding a
t each of the future planned
analyses. By further assuming
that
the proportion of future sample size is fixed for each
interim, we can set the overall conditional power by varying the final sample size.
Different choices have been considered for the unde
rlying mean difference to use in
computing the conditional power and determining the future sample size. Proschan and
Hunsberger (1995) and Cui, Hung and Wang (1999) suggested using the observed
estimate of the parameter at the time of the interim analysi
s. This has been criticized as
inefficient by, for example, Jennison and Turnbull (2003). Liu and Chi (2001) and
Posch, Bauer and Brannath (2003) suggest a fixed and predetermined value that does not
depend on observed results. The latter authors demons
trated that for a two
-
stage design
this approach (along with a fixed maximum sample size) could improve the expected
sample size over a group
sequential procedure.

Bauer and Konig (2006) investigated

9

SSR based on the conditional power approach, and demons
trated that mid
-
trial sample
size recalculation based on an interim estimate might lead to an overly large price to be
paid in average sample size in relation to the gain in overall power.

As a result, they feel
that using the estimated effect size for sa
mple size reassessment appears not to be a
recommendable option, but instead, using the original effect size from the planning phase
would usually be a useful option.

2.4

Optimized designs

Jennison and Turnbull (2006a, 2006b) consider expected sample size and
power for
group sequential and adaptive designs as a function of underlying treatment difference.
In the first article they suggested that using an interim observed treatment effect to adjust
sample size and get a desired conditional power could be very i
nefficient when compared
to a group sequential design. In the second they compared pre
-
planned group sequential
versus pre
-
planned adaptive trials using the expected sample size to achieve a given
unconditional power for a fixed treatment difference of in
terest
.

The

work of Jennison and Turnbull, as well as
research by Lokhnygina (2004)
and Banerjee and Tsiatis (2005)
,

has shown that it is possible to develop adaptive designs
which are optimal in terms of minimizing a mixture of expected sample sizes over
the
specified range of values of treatment effect sizes. Jennison and Turnbull (2006a, 2006b)
also found that optimal adaptive designs had at most a minimal improvement in expected
sample size compared to optimal group sequential designs. T
hus, both

flex
ibility
and
efficiency should be considered when considering a design with
SSR
.


10

3.

Recommendations

SSR techniques
offer potential for
improv
ing

program efficiency by allowing
mid
-
course
sample size
adjustment when assumptions made at the planning stage
may
ha
ve

been
unreliable
. During protocol planning, it should be considered on a routine basis whether
the
se techniques should be used in the trial. Potential advantages of SSR techniques do
need to be balanced against possible procedural or logistic concerns,

such as those which
might affect trial integrity (e.g., does data need to be unblinded for the re
-
estimation, and
if so who will be involved in review and decision making? might observers seeing how
the sample size is changed be able to infer information
about treatment effects which
could have some potential to compromise trial integrity?). If an SSR approach is utilized,
this should be implemented in a manner that to the extent feasible minimizes such
concerns. Particular decisions regarding how to con
sider and implement SSR techniques
will of course depend on the details of particular situations; nevertheless, some general
recommendations and desirable characteristics are described below.

3.1


Planning

The need for sample size re
-
estimation should be a
nticipated as much as possible during
trial planning.
This applies not only to assumptions used in sample size calculation which
might turn out to have been initially erroneous, but also to the possibility that changes
might occur in the external environme
nt while the trial proceeds which are relevant
to
sample size. For example, during a long
-
term trial advances in background therapy
might lead to a lower event rate than had previously been observed. Allowing for
potential SSR should by no means be viewe
d as a substitute for good up
-
front decision

11

making; rather, this should supplement proper initial planning by being realistic about
possible limitations of assumptions or anticipating background environmental changes.

The
plan for
re
-
evaluat
ing
sample siz
e should be described in the
trial
protocol.

This will help ensure that the plan has received enough initial consideration so that it is
sound, and will enhance the credibility of the trial with regard to any actions that
may be

implemented. This applie
s to issues such as timing, methodology, and processes of
decision making and implementation. In particular, there should be consideration of what
restrictions

should be placed on patient numbers or trial duration, and these should be
specified in the pro
tocol (for example, maximum and/or minimum values,
perhaps
to
ensure that the trial addresses other objectives, such as collecting sufficient safety data).

3.2

Adjustments based upon nuisance parameters

Mis
-
specification of nuisance parameters (e.g. varian
ce for a normally distributed
outcome or underlying response rate for binary outcomes), can have a large impact on
power and sample size. No matter what source of information is the basis for pre
-
trial
assumptions, it is only the data from the trial itsel
f that will verify or refute these
assumptions. As this can often be addressed in a non
-
controversial manner, it is
recommended that re
-
estimation
on this basis should
be considered
routinely
.

It is usually preferable for SSR
based on
updated information
about nuisance
parameters to be performed on a blinded basis.
There is certainly potential for
this t
o be
done either blinded or unblinded, and it might seem that there would be advantages to
doing this unblinded, since the accuracy of parameter estimates

produced from blinded
data
is
confounded with assumptions about the magnitude of treatment effects.

12

Nevertheless, in most situations the preference will be for
blinded SSR
,
particularly
for
variance for a continuous variable. Considerations affecting th
e decision on blinded
vs.

unblinded SSR include the following:



Procedures using blinded data generally have good operating characteristics,
whether these utilize “raw” estimates from the pooled data, or adjust the raw
estimates based on some assumption abo
ut the magnitude of treatment effect
(Kieser and Friede, 2000).



Using blinded data tends to minimize tendencies toward significance level
inflation (though this issue is mostly of concern in small studies).



Unblinded estimation requires some party to have
access to unblinded results,
which can be controversial and have implications for the integrity of the trial.

Even if there were an independent Data Monitoring Committee (DMC) for the
trial constituted for other purposes, SSR may not be an activity which
they
consider within their purview, and the safety monitoring
of such a board
might be
performed
most objectively when
it

do
es

not also address issues such as sample
size.



Even if an independent DMC would make this determination based upon
unblinded data,
the trial integrity issue is not fully settled, since individuals with
access to aggregate data and who know the
nuisance
parameter value
determined
by
the DMC (or who can infer it from the sample size change) may be able to
determine the estimated treatme
nt effect.


13

In cases where there is a large amount of uncertainty about both the treatment effect and a
nuisance parameter, there may be more motivation to consider unblinded SSR, and to try
to overcome any associated procedural issues. This issue is likel
y to arise more
frequently as regards underlying event rate than for variances. If at an interim point the
pooled event rate were much lower than expected, then this might indicate either that the
underlying population rate is unexpectedly low, or that th
e treatment effect is larger than
expected. If there is
much

uncertainty concerning the treatment effect, one could
consider
using
a
n SSR

procedure
,

such as that described by Gould (1995)
,

in combination
with a group sequential design that allows for earl
y stopping for efficacy. The latter can
allow
stop
ping

the trial early in case the low event incidence is due to an excepti
onally
large treatment effect.

3.3

Adjustments based upon assumptions regarding treatment effects

Sample size changes may be conside
red
appropriate if
new external information changes
the relevant treatment effect used in sample size calculations. For example, introduction
of a
lternate
treatment for the same indication may alter the perception of what constitutes
a clinically or comme
rcially relevant effect size; or perhaps updated knowledge of the
safety profile of a treatment
might
affect the perception of the clinically relevant effect on
the basis of risk
-
benefit considerations. This type of approach can be sensible, and
generally

should not introduce much difficulty statistically or procedurally.

Section 2.3.2 cited references describing methods for sample size changes which
use interim data and condition on a fixed pre
-
determined value for the treatment effect,
utilizing appropri
ate adaptive methodology to maintain the type I error. While sound
statistically, caution should be taken in the implementation of such methods because:


14



This necessarily requires an unblinded review of trial data, and th
e

process
must be properly managed
(see Section 3.4.3).



Observers who are aware of the methodology used and the resulting sample
size modification might be able to infer information about the treatment effect
estimate
, raising
some potential concerns about trial integrity.

SSR m
ethods were
also described in Section 2.3.2
which

condition on an interim
treatment effect estimate as if it were the actual parameter value of interest. These
methods can be more problematic, for reasons including:



The
y

can be highly inefficient: sample size calcula
tion is particularly sensitive
to the effect size used (i.e., it generally varies with the square), but interim
treatment effect sizes can be highly variable and potentially too unreliable to be
used directly for re
-
estimation purposes; this is related to
the previously cited
arguments put forth by Jennison and Turnbull (2003) and by Tsiatis and Mehta
(2003).



Conceptually, the treatment effect size used in sample size calculation is not
simply an expectation of the magnitude of effect
. W
hile the assumed ef
fect
size should be feasible based upon what is known about a treatment, this is a
value that also should reflect clinical
perspectives and perhaps
marketing
realities
. Simply allowing a (variable) point estimate to determine a value for
this type of calcu
lation may not adequately address the practical needs of the
situation.


15

As mentioned earlier, group sequential methods can
address
sample size
determination
,
as

the actual total sample size will be variable because the trial may terminate at interim
analys
es. While such methods require review of unblinded data by a D
MC
, and
continuation decisions can convey some limited information to observers about treatment
effects, procedural conventions and statistical methods governing these are well
-
established, so
that these methods can often be implemented with less difficulty.
Approaches such as those using spending function methodology allow flexibility in
implementing
designs with
desired sample size distributions under particular
assumptions.

These considerati
ons lead to the following recommendation:

Before
implementing methods which modify sample size based upon interim treatment effect
estimates, it should be strongly considered whether the sample size determination
objectives can be better achieved, statist
ically and procedurally, using
either 1)
an
appropriate group sequential scheme

or 2) an adaptive scheme that does not utilize the
interim observed effect for re
-
estimating sample size
.

3.4

Logistics

3.4.1

Frequency

We recommend that SSR
should be applied as infreq
uently within a trial as
is felt to
meet
its objectives; in many trials, a single time should suffice.

It is not a goal of SSR to treat
sample size as a “moving target”
; i
f the initial information used to design
a

trial is

inadequate, then generally it is

advisable to wait until there is information sufficient to
adjust the sample size well, and then do so. For example, if we are re
-
sizing based upon

16

updated information about a variance, then it should be reasonably
straightforward
to
quantify how much da
ta will be required to determine the variance sufficiently precisely
so that modification of the sample size can be addressed.

Exceptions may arise in situations where there is a good deal of uncertainty
associated with a parameter. In such cases, there
m
ay

be advantages
for trial
plan
ning
purposes to
updat
e

the sample size earlier in the trial, knowing that this may not be the
final number and that it will be re
-
considered later on.

Also, f
or some adaptive group
sequential procedures, at each look potent
ial actions based may include stopping for
success or futility, or updating the timing or sample size of the next look and/or of the
trial. For such procedures, a possible update of the maximum sample size (i.e., if a
stopping criteria is not met at some
point) could be obtained at each interim analysis.

3.4.2

Timing

Timing for implementation of
SSR

procedures will depend on specific details of the trial,
such as the relative values of the enrollment rate and follow
-
up time, and the amount of
uncertainty in the
initial assumptions. If a minimum sample size has been pre
-
specified,
then it will often be sensible to make this determination shortly before reaching that
minimum enrollment.

This type of approach would allow collection of the maximum
amount of informat
ion that could be used for the re
-
estimation without the potential
disruption to the trial that could be caused by stopping, then re
-
starting, enrollment.

For SSR
using
unblinded
data
, timing decisions may also take into account
whether there are unblinded

looks at the data for other purposes during the study (e.g.,
safety monitoring by a DMC). In such cases it will often be advisable to time the re
-
estimation so that it coincides with one of these interim analyses.


17

3.4.3

Analysis and decision process

S
SR

based
upon blinded review of data usually does not engender much concern about
the data analysis, review, and decision processes, and can generally be performed by trial
personnel.
As alluded to previously, i
f the re
-
estimation is to be based upon unblinded
dat
a, then it may present some challenges to implement the processes of analysis and
decision making in a manner that minimizes risks to the integrity of the trial. The
following recommendations should be followed, and are of particular importance in trials
with registration potential or which may be supportive in a regulatory submission:



Review of unblinded data
for SSR purposes

should be performed by individuals

who
are not involved in trial activities, and unblinded results should
otherwise
remain
confiden
tial
.



If sponsor perspective is
considered
necessary, then access to
unblinded
interim
results should be limited to a minimum number of sponsor representatives (as
above, not otherwise involved in the trial), and should only involve results
relevant for th
e particular decision.



Where possible, steps should be taken to limit the information about interim
results which could be inferred by observers
. F
or example, full statistical details
of the re
-
estimation plan might be withheld from the trial protocol and

documented elsewhere; or re
-
estimation might be jointly based upon more than
one consideration to mask the contribution of any particular piece of information,
e.g., both a variance and a treatment effect estimate; etc.


18



As an alternative, it should be str
ongly considered whether an appropriate group
sequential scheme as a method of sample size determination could better meet the
study objectives, as operational concerns are less and statistical behavior is often
superior (Tsiatis and Mehta, 2003).

4. Case

Studies

In this section, we will look at two examples where the opportunity to resize the study
was discussed at the design stage.

Example 1

Two doses of a new drug will be compared to a placebo in a trial using a parallel group
design. The underlying di
sease is a seasonal one
, and t
he primary endpoint is percent
change from baseline for a continuous measurement. Based on data obtained from a
study performed in a different setting, the standard deviation of the primary endpoint is
estimated to be around
20%.

Using
the above initial estimate for variability, it is de
termined

that 90 patients
per group (270 total) are needed to have 90% power to detect a 10% difference between a
dose group and placebo. Since recruitment is anticipated to be difficult and t
he
true
variability is
uncertain
, it is decided up
-
front
to consider revising the sample size, based
upon
a blinded
interim
estimate of variability
. Because the underlying disease is
seasonal, there will be a long pause in enrollment between seasons, by w
hich time data
from about 100 patients will have become available; this data will be used for the re
-
estimation.


19

If the variability based on the pooled interim data is between 15% and 25%, there
w
ould

be no change to the sample size or analysis strategy.
If th
is

variability is below
15%, then the final sample size would be reduced
, using the original sample
size
formula
and replacing 20% w
ith the new variability estimate.

On the other hand, if the variability
exceeds 25%, the sample size will not be incre
ased
, but
the main comparison would
be
modified to
combine

the low and high dose

treatment groups
and
compar
e

this pooled
group with placebo.

Example 2

The trial is to study the risk for intussusception (IT) of oral RotaTeq
TM

in infants who are
6 to 12 wee
ks old at the time of study enrollment. RotaTeq
TM

is to be given in 3 oral
doses every 4 to 10 weeks.

The hypothesis is that RotaTeq
TM

will not increase the risk of IT relative to
placebo within 42 days after any dose. To satisfy this hypothesis, the stu
dy is designed to
meet two criteria. First, during the study
,

there are pre
-
defined unsafe boundaries for the

vaccine/placebo case ratio, for both 1
-
42 days following any dose and 1
-
7 days following
any dose. This criterion is checked by a Data and Safet
y Monitoring Board (DSMB) on a
regular basis. If the unsafe boundaries are crossed at any evaluation time, the trial will be
stopped. Second, at the end of the study, the upper bound of the 95% confidence interval
estimate of the relative risk of IT of R
otaTeq
TM

to placebo must be


10. In addition to
the DSMB, there is a Safety Endpoint Adjudication Committee to evaluate each potential
IT case.

The initial sample size is set at 60,000, allocated equally between the vaccine and
the placebo group. After
60,000 infants are enrolled and treated, the criteria will be

20

evaluated. If it is determined that the results are inconclusive, then 10,000 more infants
will be enrolled and treated. The goal of the design is to provide a high probability that a
safe vac
cine would meet the study criteria and that the study w
ould

be stopped early if
the vaccine has an increased risk for IT. The statistical operating characteristics of the
trial were evaluated using Monte Carlo simulations.

5. Evolving Issues

As
we have d
escribed
, there are various motivations for re
-
estimatin
g

sample size during
a study, a variety of methods that can be considered, and
issues

of different types that
must be considered
and balanced against one another in making decisions to implement a
par
ticular procedure.
The decision rule
used
to adapt the sample size should come with a
clear understanding o
f

the operating characteristics of the rule
,

as well as methods to
provide point and interval estimates for the parameters of interest

in the study
.

Another area of ongoing research i
nvolves

estimation of treatment effects and the
calculation of p
-
values following SSR based on observed treatment effects. Estimation
can be a challenge if the rule for SSR is not clearly specified in advance. How to ad
dress
issues related to possible bias in the estimates and p
-
values requires further investigation.

In theory, SSR could lead to an increase or a decrease in sample size
, though in
practice
researchers
often
focus
only
on
possible
increas
es
.
One reason is

that trials often
have other objectives in addition to their primary objective of efficacy determination; a
notable one involves
safety consideration.
A
confirmatory trial
may need to produce
a
certain amount of exposure data
, so that
there
may not be
in
centive to reduce the sample
size even if SSR suggests that a smaller sample size
would suffice
for efficacy

21

evaluation. However, if
an adequate amount of
safety data
exists
from other trials, it
might be worthwhile to consider the possibility of reducing

the sample size of the
confirmatory trial. One advantage of this approach is the ability to study the safety of the
drug in a more heterogeneous patient population.

In this paper, we did not discuss SSR in the context of Bayesian designs
. The

paradigm u
nderlying Bayesian designs is
different (i.e.,
to continue accumulating and
synthesizing information,
so
the
learning
process
may not
stop at the end of a trial
)
.
Also, B
ayesian trials
historically
have only infrequently been used in confirmatory
settings
, where the need to ensure definitive answers
has traditionally been addressed
through the frequentist approach
.

Nevertheless
, it will be interesting to see if
there is
progress in development of
SSR
procedures in Bayesian designs.


22

References


Banerjee,
A., and Tsiatis, A.A. (2006). Adaptive two
-
stage designs in phase II clinical
trials.
Statistics in Medicine

(in print).


Bauer, P., Konig, F. (2006). The reassessment of trial perspectives from interim data


a
critical view.
Statistics in Medicine

25:23
-
36.


Brannath, W., Posch, M., Bauer, P. (2002). Recursive combination tests.

Journal of the
American Statistical Association
97:236
-
244.


Burman, C
-
F., and Sonesson, C. (2006). Are flexible designs sound?
Biometrics

(in
print).


Cui, L., Hung, H.M.J., Wan
g, S.J. (1999).
Modification of sample size in group
sequential clinical trials.
Biometrics

55:853
-
857.


Gallo, P., Chuang
-
Stein, C., Dragalin, V., Gaydos, B., Krams, M., Pinheiro, J. (2006).
Adaptive designs in clinical drug development


An executive sum
mary of the PhRMA
working group.
Journal of Biopharmaceutical Statistics

(in print).


Gould, A.L. (1995). Planning and revising the sample size for a trial.
Statistics in
Medicine

14:1039
-
1051.


Gould, A.L. (2001). Sample size re
-
estimation: recent develo
pments and practical
considerations.
Statistics in Medicine

20:2625
-
2643.


ICH E
-
9 Expert Working Group. (1999). Statistical principles for clinical trials (ICH
Harmonized Tripartite Guideline E
-
9).
Statistics in Medicine

18:1905
-
1942.


Jennison, C., Turnb
ull, B. (1999).
Group Sequential Methods with Applications to
Clinical Trials
. Chapman and Hall/CRC, Boca Raton.


Jennison, C., Turnbull, B. (2003).
Mid
-
course sample size modification in clinical trials
based on the observed treatment effect.
Statistics
in Medicine

22:971
-
993.


Jennison, C., and Turnbull, B.W. (2006a). Adaptive and non
-
adaptive group sequential
tests.
Biometrika

93 (in press)


Jennison, C., and Turnbull, B.W. (2006b). Efficient group sequential designs when there
are several effect sizes
under consideration.
Statistics in Medicine

25 (in press).


Kieser, M., Friede, T. (2000). Re
-
calculating the sample size in internal pilot study
designs with control of the type I error rate.
Statistics in Medicine

19:901
-
911.



23

Kieser, M., Friede, T. (200
3). Simple procedures for blinded sample size adjustment that
do not affect the type I error rate.
Statistics in Medicine

22:3571
-
3581.


Lehmacher, W., Wassmer, G. (1999). Adaptive sample size calculations in group
sequential trials.
Biometrics

55:1286
-
12
90.


Liu, Q., Chi, G.Y. (2001). On sample size and inference for two
-
stage adaptive designs.
Biometrics

57
:172
-
177.


Lokhnygina, Y. (2004).
Topics in design and analysis of clinical trials
. NC State
Department of Statistics Ph.D. dissertation (with Anastas
ios Tsiatis as the advisor).


Mehta, C., and Tsiatis, A.A. (2001). Flexible sample size considerations using
information
-
based interim monitor.
Drug Information Journal

35(4):1095
-
1112.


Mehta, C.R., Patel, N.R. (2005) Adaptive, group sequential and decis
ion theoretic
approaches to sample size determination. Submitted for publication.


Offen, W., Chuang
-
Stein,

C
., Dmitrienko, A., Littman, G., Maca, J., Meyerson, L.,
Muirhead, R., Stryszak, P., Boddy, A., Chen, K., Copley
-
Merriman, K., Dere, W.,
Givens, S.
, Hall, D., Henry, D., Jackson, J.D., Krishen, A., Liu, T., Ryder, S., Sankoh,
A.J., Wang, J., Yeh, C.H. (2006). Multiple co
-
primary endpoints: Medical and statistical
solutions.
Drug Information Journal

(to appear).


Posch, M., Bauer, P., Brannath, W. (
2003).
Issues in designing flexible trials.
Statistics in
Medicine

22:953
-
969.


Proschan, M.A., Hunsberger, S.A. (1995). Designed extension of studies based on
conditional power.
Biometrics

51
:1315
-
24.


Shih, W.J. (2001). Sample size re
-
estimation


journe
y for a decade.
Statistics in
Medicine

20:515
-
518.


Siegmund, D. (1985).
Sequential Analysis: Tests and Confidence Intervals
. Springer.


Tsiatis, A.A., Mehta, C. (2003). On the inefficiency of the adaptive design for monitoring
clinical trials.
Biometrika

20:367
-
378.


US Food and Drug Administration (2005).
Guidance for Clinical Trial Sponsors on the
Establishment and Operation of Clinical Trial Data Monitoring Committees

(Draft).
Rockville MD: FDA. (Available at
http://www.fda.gov/cber/gdlns/clintrialdmc.htm
)


Wald, A. (1947). Sequential Analysis. Dover Publications, New York.